Connect with us

AI

OpenAI Strikes Back: How Publisher Partnerships Are Redefining AI Data Access Battles

Published

on

To review this article again, navigate to My Profile and select View saved stories.

The Effort to Halt OpenAI's Data Harvesting Shows Signs of Deceleration

Determining the ultimate impact of the recent agreements between AI firms and content creators is still premature. However, one immediate success for OpenAI is evident: Major news platforms are not obstructing its data-gathering bots as frequently as before.

The surge in generative artificial intelligence has triggered a frenzy for data acquisition, leading to a subsequent wave of data protection efforts by numerous news outlets. These publishers have taken measures to stop AI from scraping their content for use as training data without permission. When Apple introduced its new AI tool this summer, many leading news organizations quickly chose to block Apple's web scraping efforts by using the Robots Exclusion Protocol, also known as robots.txt. This is a file that enables website owners to manage how bots interact with their sites. With the constant emergence of new AI bots, maintaining control over data feels akin to an endless game of whack-a-mole.

OpenAI's GPTBot is not only the most recognized name in its field but also the most likely to be blocked compared to rivals such as Google AI. An examination by Originality AI, an AI detection company based in Ontario, of 1,000 leading news websites showed a significant increase in the use of robots.txt to block OpenAI's GPTBot shortly after its launch in August 2023. The number of blocks continued to rise until the fall, followed by a more moderate increase from November 2023 through April 2024. At its highest, over one-third of these sites had implemented blocks, although this figure has since decreased to about one-fourth. Among a subset of the most influential news sites, more than half still block the bot, though this is a decrease from earlier in the year when nearly 90 percent were blocking it.

In May, the announcement of a licensing agreement between Dotdash Meredith and OpenAI led to a noticeable decrease. This downturn continued towards the end of May after Vox revealed a similar partnership, and experienced another decline in August following an agreement by WIRED’s parent company, Condé Nast. It seems that the movement towards more blocking has, at least temporarily, come to a halt.

The decline in restrictions is logically expected. As companies form collaborations and allow their information to be shared, the need to heavily guard this data decreases. Consequently, they're likely to adjust their robots.txt files to enable web scraping. Engage in sufficient partnerships, and it's almost inevitable that the fraction of websites preventing scraper access will reduce. Some publishers, such as The Atlantic, removed restrictions on OpenAI’s web crawlers immediately after revealing their collaboration. Meanwhile, entities like Vox, which disclosed its partnership in late May, lifted the barriers for GPTBot on its sites by the end of June, taking anywhere from a few days to several weeks to do so.

The robots.txt file, while not legally enforceable, has traditionally served as a guide for how web crawlers should behave. Throughout the history of the internet, website operators have generally expected compliance with this file. However, when an investigation by WIRED earlier in the year suggested that the AI company Perplexity might be disregarding robots.txt directives, Amazon's web services division initiated an inquiry to check if Perplexity had breached its policies. Disregarding robots.txt is generally frowned upon, which probably explains why numerous leading AI firms, including OpenAI, clearly mention their adherence to it for web crawling guidelines. Jon Gillham, the CEO of Originality AI, views this adherence as particularly crucial for OpenAI’s efforts to forge new partnerships. "OpenAI seems to regard any blockage as a significant obstacle to their future goals," Gillham noted.

Up to this point, OpenAI has successfully negotiated agreements with a dozen publishers. Although the majority have modified their robots.txt files to accommodate, a select few, including Time magazine, have not, choosing instead to continue restricting GPTBot's access. (Time magazine did not provide a comment to WIRED regarding their decision to keep GPTBot restricted.) Nonetheless, these arrangements render the issue moot, OpenAI's representative Kayla Wood explains, as the company shifts away from relying on traditional data scraping techniques for what it considers "publicly available" data. "We utilize direct feeds," she states.

At the same time, a handful of prominent news organizations have allowed OpenAI's web crawler access without officially announcing any collaborative agreements, according to data journalist Ben Welsh in a conversation with WIRED. Welsh, who monitors the blocking of leading AI bots by news platforms through a unique set of metrics, initially observed a minor decrease in blocking instances a few weeks prior. Among those he noted for unblocking were Alex Jones' Infowars, known for its conspiracy theories, and The Onion, a revived source of satirical comedy.

Is this indicative of undisclosed agreements between these platforms and OpenAI, or are they in the process of discussions with the organization? "Absolutely not," states Onion CEO Ben Collins, attributing the lifting of the block to the company's recent shift to a different web hosting service and content management system the previous month. "Clearly, we have no dealings with the so-called Plagiarism Machine."

Infowars remained silent when approached for a statement. Meanwhile, OpenAI has verified that it maintains no collaborative relationship with Infowars.

The initial surge in efforts to prevent OpenAI's bots from accessing content seems to have subsided for now, but it remains uncertain if this period of quiet will persist. Gillham anticipates that there could be future increases in blocking attempts, especially if publishers start to view such actions as a strategic move in negotiations. He questions, "Is blocking OpenAI the first step in getting them to negotiate? Will that strategy prompt them to engage?" This situation has been illuminating: Publishers initially reacted to the emergence of AI scraping bots by collectively trying to block them, but OpenAI's efforts to form partnerships have since tempered that universal enthusiasm.

Check Out What You Could Discover…

Direct to your email: A selection of the most fascinating and peculiar tales from the archives of WIRED.

Elon Musk poses a threat to national security

Conversation: Meredith Whittaker Aims to Challenge Capitalist Norms

What's the solution for a challenge like Polestar?

Occasion: Be part of The Grand Interview happening on December 3 in San Francisco

Additional Content from WIRED

Evaluations and Manuals

© 2024 Condé Nast. All rights reserved. Purchases made through our website may result in WIRED receiving a share of the sale as part of our Affiliate Partnership agreements with retail partners. The content on this website is protected and cannot be copied, shared, distributed, or used in any manner without explicit written consent from Condé Nast. Advertisement Choices

Choose a global website


Discover more from Automobilnews News - The first AI News Portal world wide

Subscribe to get the latest posts sent to your email.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

SUBSCRIBE FOR FREE

Advertisement
Cars & Concepts2 hours ago

Audi Widens E-Tron GT Recall Over Persistent Battery Issues, Mirroring Porsche Taycan’s Troubles

Cars & Concepts3 hours ago

Mini Electrifies Performance with John Cooper Works Cooper and Aceman EVs

Cars & Concepts3 hours ago

Revolutionizing the Road: THK’s LSR-05 Prototype Unveils In-Wheel Motors, Advanced Electric Brakes, and Four-Wheel Steering at Paris Auto Show

Moto GP3 hours ago

Jorge Martin’s Resilient Comeback: From MotoGP Doubts to Championship Ambitions

Moto GP4 hours ago

Fabio di Giannantonio Set to Race in Australia Despite Shoulder Concerns: ‘Not at 100% but Ready to Compete

Moto GP4 hours ago

Recovery in Progress: Oliveira’s Absence Continues as Savadori Steps In for Phillip Island MotoGP

Moto GP5 hours ago

Quartararo’s Quest: Navigating Phillip Island’s Challenges as MotoGP Season Speeds By

F16 hours ago

Button Backs Norris: Can Lando Dethrone Verstappen in F1 Title Race?

F16 hours ago

Exclusive: Ferrari Trento CEO Embraces Lando Norris’ Celebratory ‘Spike’ Amid F1’s High-Octane Finale

Automakers & Suppliers9 hours ago

Unveiling Excellence: Lamborghini’s High-Performance Automobiles and Cutting-Edge Innovations in the Luxury Car Market

AI11 hours ago

The Rise of Nonconsensual AI ‘Nudify’ Bots on Telegram: A Deep Dive into a Growing Digital Nightmare

AI11 hours ago

Questionable Convictions: The Controversial AI That Swayed Murder Trials and the Scrutiny That Followed

Politics12 hours ago

Chancellor Reeves Reconciles with P&O Ferries Amid £1bn Investment Controversy

Politics12 hours ago

Starmer’s Reset: Pledges of Growth and Regulatory Overhaul at Major Investment Summit

Politics13 hours ago

Armed Forces Exodus Feared Over Labour’s VAT Policy on Private Education, Shadow Defence Secretary Claims

Automakers & Suppliers13 hours ago

Unleashing Power and Precision: Ferrari’s Latest Supercar Innovations Redefine Luxury and Performance

Politics13 hours ago

Health Secretary Warns Against Using Weight Loss Drugs for Cosmetic Purposes Amid Sluggish NHS Rollout

Politics13 hours ago

Former Scottish First Minister Alex Salmond Dies at 69; Heart Attack Confirmed as Cause

Politics3 months ago

News Outlet Clears Sacked Welsh Minister in Leak Scandal Amidst Ongoing Political Turmoil

Moto GP5 months ago

Enea Bastianini’s Bold Stand Against MotoGP Penalties Sparks Debate: A Dive into the Controversial Catalan GP Decision

Sports5 months ago

Leclerc Conquers Monaco: Home Victory Breaks Personal Curse and Delivers Emotional Triumph

Moto GP5 months ago

Aleix Espargaro’s Valiant Battle in Catalunya: A Lion’s Heart Against Marc Marquez’s Precision

Moto GP5 months ago

Raul Fernandez Grapples with Rear Tyre Woes Despite Strong Performance at Catalunya MotoGP

Sports5 months ago

Verstappen Identifies Sole Positive Amidst Red Bull’s Monaco Struggles: A Weekend to Reflect and Improve

Moto GP5 months ago

Joan Mir’s Tough Ride in Catalunya: Honda’s New Engine Configuration Fails to Impress

Sports5 months ago

Leclerc Triumphs at Home: 2024 Monaco Grand Prix Round 8 Victory and Highlights

Sports5 months ago

Leclerc’s Monaco Triumph Cuts Verstappen’s Lead: F1 Championship Standings Shakeup After 2024 Monaco GP

Sports5 months ago

Perez Shaken and Surprised: Calls for Penalty After Dramatic Monaco Crash with Magnussen

Sports5 months ago

Gasly Condemns Ocon’s Aggressive Move in Monaco Clash: Team Harmony and Future Strategies at Stake

Business5 months ago

Driving Success: Mastering the Fast Lane of Vehicle Manufacturing, Automotive Sales, and Aftermarket Services

Mobility Report5 months ago

**”SkyDrive’s Ascent: Suzuki Propels Japan’s Leading eVTOL Hope into the Global Air Mobility Arena”**

Cars & Concepts3 months ago

Chevrolet Unleashes American Powerhouse: The 2025 Corvette ZR1 with Over 1,000 HP

Cars & Concepts5 months ago

Porsche 911 Goes Hybrid: Iconic Sports Car’s Historic Leap Towards Electrification Revealed on May 28

Business5 months ago

Shifting Gears for Success: Exploring the Future of the Automobile Industry through Vehicle Manufacturing, Sales, and Advanced Technologies

Cars & Concepts5 months ago

Seat Leon (2024): Die Evolution des Spanischen Bestsellers – Neue Technik, Bewährtes Design

AI5 months ago

Revolutionizing the Future: How Leading AI Innovations Like DaVinci-AI.de and AI-AllCreator.com Are Redefining Industries

V12 AI REVOLUTION COMMING SOON !

Get ready for a groundbreaking shift in the world of artificial intelligence as the V12 AI Revolution is on the horizon

SPORT NEWS

Business NEWS

Advertisement

POLITCS NEWS

Chatten Sie mit uns

Hallo! Wie kann ich Ihnen helfen?

Discover more from Automobilnews News - The first AI News Portal world wide

Subscribe now to keep reading and get access to the full archive.

Continue reading

×