AI
Exposed: How AI Tools Illicitly Train on Children’s Images Without Consent
To review this article again, go to My Profile and then select Saved stories.
Authored by Vittoria Elliott
Human Rights Watch Report Reveals AI Systems Covertly Utilize Children's Photos from Brazil
A recent publication by Human Rights Watch on Monday revealed that more than 170 photos and private information of Brazilian children have been utilized without their awareness or permission in an open-source dataset for AI training purposes.
According to the report, the visuals were collected from materials uploaded online from as early in the mid-1990s up to 2023, a period when most internet users wouldn't have expected their online content to be utilized for AI training purposes. Human Rights Watch has highlighted that Common Crawl, a data archive, accumulated personal information and images of children. Subsequently, the web addresses connecting to this content were incorporated into LAION-5B, a database instrumental in training AI for emerging tech companies.
"Hye Jung Han, a researcher specializing in children's rights and technology at Human Rights Watch, who also discovered these images, points out that the initial breach of privacy occurs when photos are collected and incorporated into these datasets. Subsequently, AI technologies are trained using this data, enabling them to generate lifelike depictions of minors. Han emphasizes that the development of this technology poses a risk to any child with photos or videos online, as ill-intentioned individuals could potentially exploit these images, altering them for their purposes."
The LAION-5B dataset, developed from the Common Crawl—a web-scraped information archive provided for academic purposes—has played a pivotal role in training numerous artificial intelligence applications, notably including Stability AI's image creator, Stable Diffusion. This dataset, which originates from the German nonprofit LAION, is freely available and boasts a collection of over 5.85 billion image-caption pairs as stated on its official site. Following concerns raised by Human Rights Watch, LAION has removed access to the specific images they highlighted.
The photographs of kids discovered by investigators originated from mommy blogs, alongside other individual, maternity, or parenting-oriented blogs, in addition to snapshots from YouTube videos that had garnered only a few views, appearing to have been posted for sharing among close relatives and acquaintances.
"Considering the surroundings in which they were shared, they anticipated a degree of privacy," states Hye. "A majority of these photos couldn't be located on the internet by conducting a reverse image search."
Nathan Tyler, a representative for LAION, has confirmed the organization's proactive measures. “Following a Stanford review which uncovered references to unlawful material on the internet within LAION-5B, we've removed it,” he mentions. Tyler also notes that LAION is actively collaborating with entities such as the Internet Watch Foundation, the Canadian Centre for Child Protection, Stanford, and Human Rights Watch to eliminate any illegal content that has been identified.
YouTube's policies prohibit data scraping with specific exceptions; however, these activities appear to contradict those rules. "Unauthorized scraping of YouTube content clearly breaches our Terms of Service," states YouTube representative Jack Maon, "and we persist in combating such misuse."
Authored by Matt
Crafted by David Robson
Authored by Joseph
Crafted by Boone Ashworth
In December, a study conducted by scholars at Stanford University uncovered that the AI training dataset known as LAION-5B included materials depicting child sexual abuse. The issue of explicit deepfakes is increasingly prevalent, even within American schools, where they are utilized to harass peers, with girls being particularly targeted. Hye expresses concerns that aside from employing children's images to create CSAM, the database might inadvertently expose confidential details like personal whereabouts or health-related information. In 2022, an artist based in the United States discovered her likeness within the LAION collection and recognized it as originating from her confidential medical documents.
"Kids shouldn't have to worry about their pictures being misused or turned into weapons against them," Hye expresses her concern. She fears that her discovery might only scratch the surface. According to her, the data her team analyzed represents a mere fraction—less than one ten-thousandth—of the entirety of LAION-5B. She believes it's probable that comparable images from across the globe could have been included in the dataset.
In a recent campaign from Germany, an advertisement employed AI-created deepfakes to alert parents about the dangers of sharing their kids' photos on the internet, emphasizing that these pictures could lead to bullying or be manipulated into child sexual abuse material (CSAM). However, this initiative falls short of tackling the problem of pre-existing images or those that have been online for many years.
Tyler points out that deleting links from the LAION dataset doesn't actually erase the content from the internet. These pictures remain accessible and can be utilized, regardless of whether they're not available via LAION. "This presents a broader and extremely troubling problem, and as a volunteer-led nonprofit, we are committed to contributing our efforts towards addressing it."
Hye argues that it is the duty of governmental bodies and regulatory agencies to shield both children and their families from such abuses. Presently, the legislative body in Brazil is reviewing proposed regulations aimed at overseeing the generation of deepfakes, while in the United States, New York's representative, Alexandria Ocasio-Cortez, has introduced the DEFIANCE Act. This proposed law would grant individuals the right to pursue legal action if they can demonstrate that a deepfake resembling them was produced without their consent.
"Hye believes it's unfair to expect children and their parents to take on the burden of safeguarding youngsters from a technology that's inherently undefendable. She argues it's not a responsibility they should bear."
Update: June 10, 2024, 5:20 pm EST: WIRED has provided a correction stating that LAION has deleted the links to the photographs, and has added information to clarify that the images were originally collected by the data repository Common Crawl, which subsequently informed the links featured in LAION-5B.
Suggested for You …
Direct to your email: Receive Plaintext—Steven Levy's in-depth perspective on technology
Delving into the largest undercover operation ever conducted by the FBI
The WIRED AI Elections Initiative: Monitoring over 60 international electoral events
Ecuador finds itself completely at the mercy of a severe drought.
Be confident: Here are the top mattresses available for online purchase
Luca Zorloni
Journalist: Pa
Profile: Steven
Caroline Haskins
Morgan Meaker
Paresh Dave
Name: Joel Khalili
Name: Louise Matsakis
Additional Content from WIRED
Analysis and Instructions
© 2024 Condé Nast. Reserved rights. Purchases made through our site involving products may result in revenue for WIRED due to our Affiliate Agreements with retail partners. Content from this website is not to be copied, shared, transmitted, or utilized in any form without explicit consent from Condé Nast. Choices in Advertising.
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.