Connect with us

AI

Amazon Probes Perplexity AI Amid Allegations of Web Scraping Abuse and Ignoring Robots Exclusion Protocol

Published

on

To go back to this article, navigate to My Profile and then look for the saved stories section.

Andrew Couts and Dhruv Mehrotra

Amazon Probes Claims Against Perplexity for Alleged Scraping Misconduct

The cloud computing arm of Amazon is conducting a probe into Perplexity AI, following allegations that the AI-focused startup may have breached Amazon Web Services' policies by harvesting data from websites that have sought to block such actions, according to a report by WIRED.

A spokesperson from AWS, speaking to WIRED anonymously, verified the company's probe into Perplexity. Earlier, WIRED uncovered that the startup, supported financially by Jeff Bezos' family fund and Nvidia and currently valued at $3 billion, seems to use material from websites it wasn't supposed to access, as they were protected by the Robots Exclusion Protocol, a widely recognized standard on the web. Although the protocol itself doesn't carry legal weight, the terms of service typically do.

The Robots Exclusion Protocol, a longstanding internet norm, requires the addition of a simple text file (such as wired.com/robots.txt) to a website to specify which pages are off-limits to automated software and search engine spiders. Although entities employing scraping tools can opt to disregard this standard, it has generally been honored by most. An Amazon representative informed WIRED that those using AWS for web crawling are obligated to comply with the robots.txt guidelines.

"A spokesperson stated that AWS's policies strictly forbid clients from engaging in unlawful activities using their services. Additionally, it is the responsibility of the clients to adhere to these policies as well as all relevant legislation."

Investigations into Perplexity's operations were sparked by a report from Forbes on June 11, which alleged that the startup had appropriated at least one article from them. Further inquiries by WIRED corroborated these allegations, uncovering more instances of content scraping and plagiarism linked to Perplexity’s AI-driven search chatbot. To prevent Perplexity from accessing its content, Condé Nast, the parent company of WIRED, implemented a block on Perplexity’s web crawler on all its sites through a robots.txt file. Nonetheless, WIRED discovered that the startup was still able to access its sites through an unpublished IP address—44.221.181.252—having visited Condé Nast digital properties potentially hundreds of times over the last three months, presumably to continue scraping content from these sites.

The device linked to Perplexity seems to be extensively scanning news platforms that prohibit robots from retrieving their information. Representatives from The Guardian, Forbes, and The New York Times have also noticed the IP address frequently accessing their systems.

WIRED tracked down the IP address to an Elastic Compute Cloud (EC2) instance running on AWS. This investigation was initiated following our inquiries about whether the use of AWS's infrastructure to harvest data from websites that prohibit such actions breached the company's service agreement.

In a recent interaction, Perplexity's Chief Executive Officer, Aravind Srinivas, addressed the inquiries from WIRED, initially criticizing the questions for showcasing a profound misunderstanding of both Perplexity's operations and the fundamental workings of the Internet. Subsequently, in a conversation with Fast Company, Srinivas clarified that the hidden IP address seen scraping content from Condé Nast's websites and a dummy site set up by WIRED was actually being used by an external service provider specializing in web crawling and indexing. He declined to reveal the identity of this service provider, attributing his refusal to a confidentiality agreement. When questioned about the possibility of instructing the third-party service to cease scraping WIRED's content, Srinivas's response was, “It’s complicated.”

By Kelly Clancy

Authored by Jaina Grey

Authored by David

Authored by Kate Knibbs

Sara Platnick, representing Perplexity, informed WIRED that the company addressed Amazon's questions by Wednesday, describing the investigation as a normal process. Platnick mentions that Perplexity did not alter its operations following Amazon's queries.

Platnick explains that PerplexityBot, which operates using AWS, adheres to the guidelines set by robots.txt. She assures that the operations controlled by Perplexity do not breach any AWS service agreements. Nonetheless, Platnick notes that there are rare instances where PerplexityBot will not follow robots.txt, specifically when a user directly inputs a URL into the system.

"Platnick explains that entering a particular URL doesn't initiate the process of web crawling. Instead, the agent operates as if it's representing the user, fetching the URL directly. This mechanism is equivalent to the user manually visiting a website, copying the article's content, and pasting it into the platform themselves."

This explanation of how Perplexity operates supports WIRED's discovery that its chatbot occasionally disregards robots.txt rules.

Digital Content Next represents the digital content sector as a trade organization, with membership from prominent companies such as The New York Times, The Washington Post, and Condé Nast. In the previous year, this group proposed preliminary guidelines for managing generative AI technologies to safeguard against possible infringements of copyright. Jason Kint, the CEO, mentioned to WIRED that should the accusations facing Perplexity hold any truth, the firm would be breaching several of these established guidelines.

Kint believes that AI firms ought to operate under the principle that they are not entitled to repurpose publishers' material without consent. He further notes that if Perplexity is bypassing the terms of service or robots.txt directives, this should serve as a major warning sign indicating potentially unauthorized activities.

Recommended for You…

Direct to your email: Dive into Will Knight's Fast Forward for the latest progress in artificial intelligence.

Delving into the largest undercover operation ever conducted by the FBI

The WIRED AI Elections Initiative: Monitoring over 60 worldwide electoral events

Ecuador finds itself completely at the mercy of dry conditions

Be confident: These are the top mattresses available for online purchase

The text provided does

Vittoria Elliott

Dmitri Alperovitch

Cameron Dell

Matthew Burgess

Joseph Cox

Additional Content from WIRED

Evaluations and Tutorials

© 2024 Condé Nast. All rights reserved. Purchases made through our website may result in WIRED receiving a share of the sale, as part of our Affiliate Agreements with retail partners. Reproduction, distribution, transmission, caching, or any other form of utilization of the material found on this website is strictly prohibited without explicit prior written consent from Condé Nast. Advertising Choices.

Choose a global website


Discover more from Automobilnews News - The first AI News Portal world wide

Subscribe to get the latest posts sent to your email.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

SUBSCRIBE FOR FREE

Advertisement
Moto GP32 mins ago

Francesco Bagnaia Poised for Victory at Misano 2 as Ducati Eyes Historic Milestones

F155 mins ago

Jos Verstappen’s Handshake Deal with Mercedes: Will Max Switch Teams in 2026?

Moto GP1 hour ago

Champion Riders Gabor Talmacsi and Giancarlo Fisichella Endorse Hungary’s Balaton Park Ahead of 2025 MotoGP Debut

F11 hour ago

Red Bull F1 Overhaul: Lambiase Promoted Amid Major Team Restructuring

Moto GP2 hours ago

Jack Miller Returns to Pramac Yamaha for 2025 MotoGP Season, Completing the Grid Line-Up

F12 hours ago

McLaren’s ‘Mini DRS’ Under FIA Scrutiny: Flexi-Wing Debate Reignited After Piastri’s Baku Triumph

Moto GP2 hours ago

**Title:** “2025 MotoGP Rider Market Shake-Up: The Biggest Losers and Missed Opportunities

F12 hours ago

Max Verstappen Criticizes FIA’s Radio Swear Ban: ‘Are We Five-Year-Olds?

Moto GP3 hours ago

Jack Miller Reflects on ‘Bleak’ Summer and Revels in Pramac Yamaha Deal for 2025 MotoGP Season

F13 hours ago

Mercedes Unveil Strategic Pit Lane Start for Hamilton in Baku Amid Anticipation of Major F1 Upgrades

Moto GP3 hours ago

Francesco Bagnaia Chooses Neutral Ground Amid Valentino Rossi and Marc Marquez Controversy

F13 hours ago

**Lewis Hamilton Condemns FIA President’s Swearing Clampdown Comments as Racially Insensitive**

Moto GP4 hours ago

Yamaha Confirms V4 Engine Development for MotoGP with Potential 2025 Debut

F14 hours ago

Resilient Hamilton Vows to ‘Give It Absolutely Everything’ After Azerbaijan Setback Ahead of Singapore GP

Moto GP4 hours ago

Fabio Quartararo Criticizes Yamaha’s Disorganized Test Team Amid Strategic Shifts and New Partnerships

F14 hours ago

New Audi F1 Contender Sparks Speculation as Bottas Stays Tight-Lipped on Future

Moto GP5 hours ago

Brad Binder Praises ‘Radical’ 2025 KTM MotoGP Prototype: ‘Quite Different’ to Current Model

F15 hours ago

Charles Leclerc Unveils Ferrari’s Internal Debate Over McLaren’s Controversial Rear Wing

Politics2 months ago

News Outlet Clears Sacked Welsh Minister in Leak Scandal Amidst Ongoing Political Turmoil

Moto GP4 months ago

Enea Bastianini’s Bold Stand Against MotoGP Penalties Sparks Debate: A Dive into the Controversial Catalan GP Decision

Sports4 months ago

Leclerc Conquers Monaco: Home Victory Breaks Personal Curse and Delivers Emotional Triumph

Moto GP4 months ago

Aleix Espargaro’s Valiant Battle in Catalunya: A Lion’s Heart Against Marc Marquez’s Precision

Moto GP4 months ago

Raul Fernandez Grapples with Rear Tyre Woes Despite Strong Performance at Catalunya MotoGP

Sports4 months ago

Verstappen Identifies Sole Positive Amidst Red Bull’s Monaco Struggles: A Weekend to Reflect and Improve

Moto GP4 months ago

Joan Mir’s Tough Ride in Catalunya: Honda’s New Engine Configuration Fails to Impress

Sports4 months ago

Leclerc Triumphs at Home: 2024 Monaco Grand Prix Round 8 Victory and Highlights

Sports4 months ago

Leclerc’s Monaco Triumph Cuts Verstappen’s Lead: F1 Championship Standings Shakeup After 2024 Monaco GP

Sports4 months ago

Perez Shaken and Surprised: Calls for Penalty After Dramatic Monaco Crash with Magnussen

Sports4 months ago

Gasly Condemns Ocon’s Aggressive Move in Monaco Clash: Team Harmony and Future Strategies at Stake

Business4 months ago

Driving Success: Mastering the Fast Lane of Vehicle Manufacturing, Automotive Sales, and Aftermarket Services

Cars & Concepts2 months ago

Chevrolet Unleashes American Powerhouse: The 2025 Corvette ZR1 with Over 1,000 HP

Business4 months ago

Shifting Gears for Success: Exploring the Future of the Automobile Industry through Vehicle Manufacturing, Sales, and Advanced Technologies

AI4 months ago

Revolutionizing the Future: How Leading AI Innovations Like DaVinci-AI.de and AI-AllCreator.com Are Redefining Industries

Business4 months ago

Driving Success in the Fast Lane: Mastering Market Trends, Technological Innovations, and Strategic Excellence in the Automobile Industry

Mobility Report4 months ago

**”SkyDrive’s Ascent: Suzuki Propels Japan’s Leading eVTOL Hope into the Global Air Mobility Arena”**

Tech4 months ago

Driving the Future: Exploring Top Innovations in Automotive Technology for Enhanced Safety, Efficiency, and Connectivity

V12 AI REVOLUTION COMMING SOON !

Get ready for a groundbreaking shift in the world of artificial intelligence as the V12 AI Revolution is on the horizon

SPORT NEWS

Business NEWS

Advertisement

POLITCS NEWS

Chatten Sie mit uns

Hallo! Wie kann ich Ihnen helfen?

Discover more from Automobilnews News - The first AI News Portal world wide

Subscribe now to keep reading and get access to the full archive.

Continue reading

×