Connect with us

AI

Major Outlets and Social Platforms Opt Out of Apple’s AI Data Scraping, Highlighting Growing Tensions Over Web Content Use

Published

on

To look at this article again, go to My Profile and then click on Saved stories.

Leading Websites Decline Participation in Apple's AI Data Collection

A few weeks shy of three months since Apple introduced an opt-out feature for its AI development program, several key media organizations and social networks have decided to withdraw their participation.

WIRED has verified that a number of organizations including Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, USA Today network, along with WIRED's own parent entity, Condé Nast, are choosing to withhold their data from being used in Apple's AI development efforts. This move signals a marked change in attitudes towards the automated programs that have been exploring the internet for years. As these programs increasingly become pivotal in gathering data for AI training, debates over intellectual property rights and the direction of the internet have intensified.

Apple has introduced an enhancement to its web-crawling tool, named Applebot-Extended, which provides website proprietors with the capability to instruct Apple not to utilize their data for AI development purposes. In a blog post detailing its functionality, Apple describes this feature as "controlling data usage." The initial version of Applebot, launched in 2015, was designed to scour the internet to support Apple's search functionalities, including Siri and Spotlight. However, the scope of Applebot has broadened recently. The information it gathers is now also employed in the development of basic models for Apple's artificial intelligence projects.

Apple spokesperson Nadine Haija states that Applebot-Extended serves to honor the rights of content creators. It doesn't halt the primary Applebot from scanning the site, which could affect the visibility of the site's content in Apple's search offerings. Rather, it blocks the utilization of this data for the development of Apple's extensive language models and various AI-driven initiatives. Essentially, it's a bot designed to modify the function of another bot.

Website owners have the ability to prevent Applebot-Extended from accessing their content by modifying a specific text document on their sites, referred to as the Robots Exclusion Protocol, or robots.txt. This document has been instrumental in directing the behavior of web-crawling bots for years, and has recently become a focal point in the broader debate on the training of AI technologies. A number of site owners have taken steps to update their robots.txt documents to restrict access to AI bots from OpenAI, Anthropic, and several other leading AI companies.

The robots.txt file enables website proprietors to selectively allow or deny access to automated crawlers. Although there's no enforceable law requiring bots to follow the directives in this file, it's widely accepted practice to do so. However, adherence isn't always guaranteed. For instance, a recent investigation by WIRED uncovered that the AI company Perplexity had been covertly harvesting data from websites, blatantly disregarding the instructions laid out in robots.txt.

Applebot-Extended is a relatively new arrival on the scene, and as a result, not many websites have taken measures to restrict its access. A recent investigation by Originality AI, an AI-detection company based in Ontario, Canada, examined 1,000 popular websites last week and discovered that around 7 percent—mainly those belonging to the news and media sector—had implemented blocks against Applebot-Extended. Following this, Dark Visitors, a service monitoring AI agents, conducted a similar analysis on a different set of 1,000 high-traffic sites and noted that about 6 percent had restrictions in place for the bot. Combining the outcomes of these studies indicates that a significant portion of website operators are either indifferent to Apple's methods of AI training or might not be aware that they have the capability to block Applebot-Extended.

In a recent study performed this week, investigative reporter Ben Welsh discovered that a little over 25% of the news sites he examined (294 out of 1,167 mostly English-speaking, US-centered outlets) are preventing access to Applebot-Extended. In contrast, Welsh's findings also revealed that a majority, 53%, of the news sites in his study are obstructing OpenAI's bot. Last September, Google launched its specialized AI bot, named Google-Extended, which is now being blocked by almost 43% of these websites, indicating that Applebot-Extended might not yet have caught significant attention. However, according to what Welsh shared with WIRED, there has been a "gradual" increase in these numbers since he began his investigation.

Welsh is actively overseeing a project that tracks the strategies news organizations employ when dealing with significant AI tools. He notes that there's a noticeable split in how publishers are handling access for these automated systems. "It's not clear why each media outlet has chosen its path. It's evident that numerous outlets have entered into agreements where they receive compensation in return for allowing these AI agents access—perhaps that plays a role," he mentions.

In the previous year, an article by The New York Times revealed Apple's efforts to secure artificial intelligence agreements with various publishing entities. Following this, rivals such as OpenAI and Perplexity have disclosed collaborations with numerous media organizations, social media platforms, and well-known internet sites. Jon Gillham, the creator of Originality AI, remarked, "It's evident that many of the world's major publishers are adopting a deliberate strategy. In certain instances, it appears there's a commercial motive at play—possibly, reserving the data until there's a formal partnership deal."

There's evidence supporting the theory proposed by Gillham. For instance, websites owned by Condé Nast previously prevented OpenAI’s web crawlers from accessing their content. However, following a recent partnership announcement with OpenAI, these restrictions were lifted. (Condé Nast chose not to publicly comment on this matter.) In another case, Buzzfeed's representative, Juliana Clifton, informed WIRED that Buzzfeed, which also owns the Huffington Post and currently blocks Applebot-Extended, systematically blocks any AI-driven web-crawling technology unless a financial partnership agreement has been established with the technology's provider.

Due to the requirement for manual updates to robots.txt and the constant introduction of new AI agents, maintaining an updated block list poses a challenge. "It's hard for individuals to determine what they should block," states Gavin King, the creator of Dark Visitors. Dark Visitors provides a service that automatically refreshes a website's robots.txt for its users, with King noting that a significant number of his clientele are publishers worried about copyright issues.

The robots.txt file may appear to be a niche concern reserved for web experts, yet its significant impact on digital publishers in the era of artificial intelligence places it firmly within the realm of interest for media leaders. WIRED has discovered that CEOs from two leading media organizations are personally involved in making decisions about which bots are restricted.

Various sources have made it clear that they prohibit the use of AI scraping technologies due to the absence of formal agreements with the entities behind these tools. "Across all Vox Media platforms, we are preventing access to Applebot-Extended, similar to our approach with numerous AI scraping instruments where there's no business arrangement with the counterpart," explains Lauren Starke, the Senior Vice President of Communications at Vox Media. "Our stance is firm on safeguarding the worth of our content."

Some individuals opt to explain their rationale in terms that are both unclear yet direct. "At this juncture, the team concluded it was pointless to grant Applebot-Extended permission to our material," stated Gannett's chief communications officer, Lark-Marie Antón.

In the midst of legal action against OpenAI for copyright violations, The New York Times has expressed its disapproval of the default inclusion mechanism used by Applebot-Extended and similar services. Charlie Stadtlander, the NYT's director of external communications, emphasized that using their content for commercial gain without explicit, prior consent violates both legal standards and the publication's service terms. He further stated that The Times continues to identify and block unauthorized bots, underscoring the point that copyright laws remain enforceable regardless of any technical barriers to access. Stadtlander highlighted the stance that copyright infringement should not be an issue that content creators are forced to address through opting out.

There's uncertainty regarding Apple's progress in finalizing agreements with publishers. However, should such deals come to fruition, the implications of any agreements related to data licensing or sharing might be detectable in robots.txt files prior to any official announcement.

"Gillham expresses amazement at how a highly significant technology of our time is evolving, and the competition over its training data is unfolding right before our eyes on a rather obscure text document, available publicly for everyone to witness."

Recommended for You …

Delivered daily: A selection of our top stories, curated personally for your inbox

A faulty update from CrowdStrike led to a global computer malfunction

The Major Headline: How Quickly Could the Atlantic Ocean Fracture?

Introducing the age of excessive online consumption

Additional Content from WIRED

Evaluations and Tutorials

© 2024 Condé Nast. All rights reserved. Purchases made through our website may result in WIRED receiving a share of the sale as part of our Affiliate Partnerships with retail outlets. The content on this website is protected by copyright and cannot be copied, distributed, transmitted, stored, or used in any form without the explicit written consent of Condé Nast. Advertisement Choices

Choose a global website


Discover more from Automobilnews News - The first AI News Portal world wide

Subscribe to get the latest posts sent to your email.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

SUBSCRIBE FOR FREE

Advertisement
Sports29 mins ago

Lando Norris Faces Uphill Battle in F1 Title Race After Unexpected Q1 Exit at Azerbaijan Grand Prix

Sports39 mins ago

McLaren Challenges FIA over Controversial Yellow Flag Incident at Azerbaijan Grand Prix

Sports1 hour ago

Leclerc Secures Fourth Consecutive Baku Pole Without the ‘Magic Answer

AI2 hours ago

Voices of the Future: Amazon’s Audible Launches AI Voice Clone Trial for Audiobook Narrators

Sports2 hours ago

Lewis Hamilton Faces Familiar Qualifying Struggle in Baku: Tyres Leave Him Cold on Azerbaijan Grid

Politics2 hours ago

Sir Ed Davey Advocates Wealth Tax Over Pension Cuts at Lib Dem Conference

Sports3 hours ago

Shake-Up in Baku: Dramatic Penalties Reshape Azerbaijan GP Starting Grid

Politics3 hours ago

Scandal Unfolds: PM Keir Starmer Under Investigation for Undisclosed Gifts as Labour Donor Covers High-End Expenses

Sports3 hours ago

Verstappen Struggles with “Unpredictable” Red Bull in Baku Qualifying: Trails Behind Perez for First Time in 2024

Politics3 hours ago

Lib Dems Champion Free School Meals for Children in Poverty and Propose National SEND Body

Sports3 hours ago

Williams F1’s Cooling Fan Fiasco: Albon’s Qualifying Quirk at Azerbaijan GP Explained

Sports3 hours ago

Bizarre Pit Lane Blunder: Williams’ Oversight with Albon’s Airbox Fan Explained at Baku Qualifying

Business4 hours ago

Driving Success in the Fast Lane: Mastering Trends, Technology, and Tactics in the Automotive Industry

Politics4 hours ago

David Lammy’s Diplomatic Crucible: Assessing the Foreign Secretary’s Mettle at the UN and Beyond Amid US Election Tensions

Sports4 hours ago

George Russell Avoids F1 Grid Penalty for Yellow Flag Incident in Baku, Receives Reprimand Instead

Politics4 hours ago

West Stands Firm: Lammy Asserts No Intimidation by Putin Amid Debates on Arming Ukraine with Long-Range Missiles

Politics4 hours ago

West Stands Firm Against Putin as UK Debates Arming Ukraine with Long-Range Missiles

Business5 hours ago

Macroscope View: The Bond Boom and Equity Skittishness – A Reality Check for Overheated Stock Markets

Politics2 months ago

News Outlet Clears Sacked Welsh Minister in Leak Scandal Amidst Ongoing Political Turmoil

Moto GP4 months ago

Enea Bastianini’s Bold Stand Against MotoGP Penalties Sparks Debate: A Dive into the Controversial Catalan GP Decision

Sports4 months ago

Leclerc Conquers Monaco: Home Victory Breaks Personal Curse and Delivers Emotional Triumph

Moto GP4 months ago

Aleix Espargaro’s Valiant Battle in Catalunya: A Lion’s Heart Against Marc Marquez’s Precision

Moto GP4 months ago

Raul Fernandez Grapples with Rear Tyre Woes Despite Strong Performance at Catalunya MotoGP

Sports4 months ago

Verstappen Identifies Sole Positive Amidst Red Bull’s Monaco Struggles: A Weekend to Reflect and Improve

Moto GP4 months ago

Joan Mir’s Tough Ride in Catalunya: Honda’s New Engine Configuration Fails to Impress

Sports4 months ago

Leclerc Triumphs at Home: 2024 Monaco Grand Prix Round 8 Victory and Highlights

Sports4 months ago

Leclerc’s Monaco Triumph Cuts Verstappen’s Lead: F1 Championship Standings Shakeup After 2024 Monaco GP

Sports4 months ago

Perez Shaken and Surprised: Calls for Penalty After Dramatic Monaco Crash with Magnussen

Sports4 months ago

Gasly Condemns Ocon’s Aggressive Move in Monaco Clash: Team Harmony and Future Strategies at Stake

Business4 months ago

Driving Success: Mastering the Fast Lane of Vehicle Manufacturing, Automotive Sales, and Aftermarket Services

Cars & Concepts2 months ago

Chevrolet Unleashes American Powerhouse: The 2025 Corvette ZR1 with Over 1,000 HP

Business4 months ago

Shifting Gears for Success: Exploring the Future of the Automobile Industry through Vehicle Manufacturing, Sales, and Advanced Technologies

AI4 months ago

Revolutionizing the Future: How Leading AI Innovations Like DaVinci-AI.de and AI-AllCreator.com Are Redefining Industries

Business4 months ago

Driving Success in the Fast Lane: Mastering Market Trends, Technological Innovations, and Strategic Excellence in the Automobile Industry

Tech4 months ago

Driving the Future: Exploring Top Innovations in Automotive Technology for Enhanced Safety, Efficiency, and Connectivity

Business4 months ago

Hong Kong’s Ambitious Leap: The City’s Strategic Roadmap to Becoming a Global Innovation and Tech Hub

V12 AI REVOLUTION COMMING SOON !

Get ready for a groundbreaking shift in the world of artificial intelligence as the V12 AI Revolution is on the horizon

SPORT NEWS

Business NEWS

Advertisement

POLITCS NEWS

Chatten Sie mit uns

Hallo! Wie kann ich Ihnen helfen?

Discover more from Automobilnews News - The first AI News Portal world wide

Subscribe now to keep reading and get access to the full archive.

Continue reading

×