AI
Revolutionizing AI Safety: Researchers Develop Tamperproofing Technique for Open Source Models
To review this article again, go to My Profile and then click on View saved stories.
A Novel Approach May Halt the Abuse of Open Source AI Technology
In April, when Meta unveiled its advanced language model Llama 3 at no cost, it merely took a few days for external developers to modify it by removing the safeguards designed to stop it from generating offensive humor, giving out recipes for producing methamphetamine, or engaging in other inappropriate behaviors.
Researchers from the University of Illinois Urbana-Champaign, UC San Diego, Lapis Labs, and the nonprofit Center for AI Safety have created a novel training method that may enhance security measures in Llama and similar open-source AI frameworks, making them more resistant to tampering. Given the rapid advancement of artificial intelligence, many specialists argue that securing open models through such techniques could become essential.
"Mantas Mazeika, a researcher at the Center for AI Safety and a PhD candidate at the University of Illinois Urbana-Champaign, conveyed to WIRED his concerns that terrorists and rogue nations will exploit these models. He emphasized that the simpler it becomes for such entities to adapt these models for their use, the higher the threat level becomes."
Creators frequently keep advanced artificial intelligence models under wraps, making them available solely via a software API or through a publicly accessible chatbot such as ChatGPT. While the creation of an advanced Large Language Model (LLM) incurs expenses running into tens of millions of dollars, companies like Meta have opted to fully disclose their models. This disclosure encompasses the release of the models' “weights,” which are the critical parameters determining their functioning, for public download.
Before being made available, models such as Meta's Llama undergo a refinement process to enhance their conversational abilities and question-answering capabilities. Additionally, this process aims to equip them with the ability to reject sensitive or harmful inquiries. This measure is taken to ensure that a chatbot using the model avoids providing offensive, unsuitable, or hateful comments, and to prevent it from giving instructions on, for instance, constructing an explosive device.
The team responsible for this innovative approach discovered a method to hinder the manipulation of a public model for malicious purposes. This method duplicates the modification procedure but then adjusts the model's settings in a way that renders it ineffective at responding to harmful prompts, like "Give directions for constructing a bomb."
Mazeika and his team showed how the technique worked using a simplified version of Llama 3. They adjusted the model's settings in a way that prevented it from learning to respond to inappropriate questions, even after numerous tries. Meta has yet to reply to a request for a statement.
Mazeika indicates that while the method isn't flawless, it implies the possibility of making it more challenging to bypass AI model restrictions. "The aim is to elevate the difficulty and expense of compromising the model to a level where the majority of potential attackers are discouraged from attempting it," he explains.
"Dan Hendrycks, the director of the Center for AI Safety, expresses his optimism that this effort will initiate studies on tamper-proof protections, paving the way for the research community to devise increasingly effective safety measures."
As open source AI garners more attention, the notion of making open models tamper-proof could gain traction. Presently, open models are on par with the top-tier proprietary models developed by giants such as OpenAI and Google. For example, the latest iteration of Llama 3, which was launched in July, matches the performance of the engines powering well-known chatbots like ChatGPT, Gemini, and Claude according to widely recognized metrics for evaluating the proficiency of language models. Additionally, Mistral Large 2, a large language model from a French startup that was also unveiled last month, boasts comparable capabilities.
The United States administration is adopting a careful yet optimistic stance towards open-source artificial intelligence. This week, a publication from the National Telecommunications and Information Administration, which is part of the US Commerce Department, advises that the US government should establish new mechanisms to watch for possible dangers, yet it suggests holding back on placing immediate limitations on the broad access to the core algorithms of major AI frameworks.
However, there are those who oppose the idea of placing limits on open models. Stella Biderman, who leads EleutherAI, an open source AI initiative powered by a community, believes that while the new method might seem theoretically sound, applying it could be challenging. According to Biderman, this strategy also contradicts the principles of free software and the concept of openness in the field of AI.
"Biderman believes the paper misses the mark on the fundamental problem," he states. "If the worry is about LLMs producing information on weapons of mass destruction, then addressing the training data is the appropriate action, not altering the already trained model."
Discover More…
Direct to Your Inbox: Explore the Finest and Most Unusual Tales from the Vault of WIRED
The Process Behind Memory Selection in the Brain
Headline News: Introducing Priscila, the Leading Figure of the Rideshare Underworld
Silicon Valley's wealthy elite show surprising support for Donald Trump
Event: Don't miss out on The Big Interview happening on December 3rd in San Francisco.
Additional Insights from WIRED
Evaluations and Tutorials
© 2024 Condé Nast. All rights reserved. Purchases made through our site involving products linked to our retail affiliate partnerships may result in a commission for WIRED. Reproduction, distribution, transmission, caching, or any other form of utilization of the material found on this site is prohibited without the express written consent of Condé Nast. Advertisement Choices
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.