AI

Revolutionizing AI Safety: Researchers Develop Tamperproofing Technique for Open Source Models

Published

2 months ago

August 3, 2024

To review this article again, go to My Profile and then click on View saved stories.

A Novel Approach May Halt the Abuse of Open Source AI Technology

In April, when Meta unveiled its advanced language model Llama 3 at no cost, it merely took a few days for external developers to modify it by removing the safeguards designed to stop it from generating offensive humor, giving out recipes for producing methamphetamine, or engaging in other inappropriate behaviors.

Researchers from the University of Illinois Urbana-Champaign, UC San Diego, Lapis Labs, and the nonprofit Center for AI Safety have created a novel training method that may enhance security measures in Llama and similar open-source AI frameworks, making them more resistant to tampering. Given the rapid advancement of artificial intelligence, many specialists argue that securing open models through such techniques could become essential.

"Mantas Mazeika, a researcher at the Center for AI Safety and a PhD candidate at the University of Illinois Urbana-Champaign, conveyed to WIRED his concerns that terrorists and rogue nations will exploit these models. He emphasized that the simpler it becomes for such entities to adapt these models for their use, the higher the threat level becomes."

Creators frequently keep advanced artificial intelligence models under wraps, making them available solely via a software API or through a publicly accessible chatbot such as ChatGPT. While the creation of an advanced Large Language Model (LLM) incurs expenses running into tens of millions of dollars, companies like Meta have opted to fully disclose their models. This disclosure encompasses the release of the models' “weights,” which are the critical parameters determining their functioning, for public download.

Before being made available, models such as Meta's Llama undergo a refinement process to enhance their conversational abilities and question-answering capabilities. Additionally, this process aims to equip them with the ability to reject sensitive or harmful inquiries. This measure is taken to ensure that a chatbot using the model avoids providing offensive, unsuitable, or hateful comments, and to prevent it from giving instructions on, for instance, constructing an explosive device.

The team responsible for this innovative approach discovered a method to hinder the manipulation of a public model for malicious purposes. This method duplicates the modification procedure but then adjusts the model's settings in a way that renders it ineffective at responding to harmful prompts, like "Give directions for constructing a bomb."

Mazeika and his team showed how the technique worked using a simplified version of Llama 3. They adjusted the model's settings in a way that prevented it from learning to respond to inappropriate questions, even after numerous tries. Meta has yet to reply to a request for a statement.

Mazeika indicates that while the method isn't flawless, it implies the possibility of making it more challenging to bypass AI model restrictions. "The aim is to elevate the difficulty and expense of compromising the model to a level where the majority of potential attackers are discouraged from attempting it," he explains.

"Dan Hendrycks, the director of the Center for AI Safety, expresses his optimism that this effort will initiate studies on tamper-proof protections, paving the way for the research community to devise increasingly effective safety measures."

As open source AI garners more attention, the notion of making open models tamper-proof could gain traction. Presently, open models are on par with the top-tier proprietary models developed by giants such as OpenAI and Google. For example, the latest iteration of Llama 3, which was launched in July, matches the performance of the engines powering well-known chatbots like ChatGPT, Gemini, and Claude according to widely recognized metrics for evaluating the proficiency of language models. Additionally, Mistral Large 2, a large language model from a French startup that was also unveiled last month, boasts comparable capabilities.

The United States administration is adopting a careful yet optimistic stance towards open-source artificial intelligence. This week, a publication from the National Telecommunications and Information Administration, which is part of the US Commerce Department, advises that the US government should establish new mechanisms to watch for possible dangers, yet it suggests holding back on placing immediate limitations on the broad access to the core algorithms of major AI frameworks.

However, there are those who oppose the idea of placing limits on open models. Stella Biderman, who leads EleutherAI, an open source AI initiative powered by a community, believes that while the new method might seem theoretically sound, applying it could be challenging. According to Biderman, this strategy also contradicts the principles of free software and the concept of openness in the field of AI.

"Biderman believes the paper misses the mark on the fundamental problem," he states. "If the worry is about LLMs producing information on weapons of mass destruction, then addressing the training data is the appropriate action, not altering the already trained model."

Discover More…

Direct to Your Inbox: Explore the Finest and Most Unusual Tales from the Vault of WIRED

The Process Behind Memory Selection in the Brain

Headline News: Introducing Priscila, the Leading Figure of the Rideshare Underworld

Silicon Valley's wealthy elite show surprising support for Donald Trump

Event: Don't miss out on The Big Interview happening on December 3rd in San Francisco.

Additional Insights from WIRED

Evaluations and Tutorials

© 2024 Condé Nast. All rights reserved. Purchases made through our site involving products linked to our retail affiliate partnerships may result in a commission for WIRED. Reproduction, distribution, transmission, caching, or any other form of utilization of the material found on this site is prohibited without the express written consent of Condé Nast. Advertisement Choices

Choose a global website

Discover more from Automobilnews News - The first AI News Portal world wide

Subscribe to get the latest posts sent to your email.

Automobilnews News – The first AI News Portal world wide

Revolutionizing AI Safety: Researchers Develop Tamperproofing Technique for Open Source Models

Related

Discover more from Automobilnews News - The first AI News Portal world wide

You may like

Leave a Reply Cancel reply

Leave a Reply

SUBSCRIBE FOR FREE

Hong Kong Banks Follow Fed’s Lead: Prime Rate Cuts Promise Monthly Savings for Mortgage Borrowers and Boost to Local Economy

John Swinney’s Stark Independence Admission: A Reality Check for SNP Campaigners on Referendum Anniversary

Transparency in Question: Tom Tugendhat Highlights Concerns Over Keir Starmer’s Extensive Receipt of Gifts and Hospitality

Huawei’s Mate XT Tri-Fold Smartphone Ignites Market Frenzy: Scalpers Skyrocket Prices at Huaqiangbei Electronics Marketplace

Tory Leadership Contender Tugendhat Questions Starmer’s Lavish Gift Totals Amid Transparency Concerns

Scrutiny Intensifies as Starmer’s Lavish Gifts Top Charts Amid Tory Leadership Race

Google Triumphs in EU Antitrust Case, Overturning €1.5 Billion Fine: A Setback for Vestager’s Crusade Against Silicon Valley

Unveiling Transparency: How to Use Westminster Accounts to Track Your MP’s Activities

Hong Kong Financial Officials Warn Borrowers of Funding Cost Delays Amid Slow Prime Rate Cuts

Prime Minister Sir Keir Starmer Tops MP Gift and Hospitality Chart with Over £100,000 in Declared Freebies

Surge in Hang Seng Index: Closes Above 18,000 Post Fed Rate Cut, Marking a Two-Month High

Politics Unpacked: Labour’s Internal Strife, High-Stakes Diplomacy in Paris, and the Road Ahead

Hong Kong Banks, HSBC and Bank of China, Initiate First Prime Rate Cut in 5 Years to Support Local Businesses and Mortgage Borrowers

Under Pressure: Minister Defends PM Starmer’s Right to Accept Freebies Amid Scrutiny

Wrise’s Rapid Expansion in Hong Kong Amid Surge in Family Offices Setup: A New Era in Wealth Management

Fabio Quartararo Contemplates Exit Amid Yamaha’s Performance Crisis, Commits to Future with Renewed Hope

Yutong, World’s Leading Electric-Bus Manufacturer, Advances in Tech with CATL’s Quick-Charge Batteries; Aims for Increased Range and Reduced Operating Costs

Aprilia’s Path to Clarity: Test Insights Propel Team Ahead of MotoGP Misano Encore

News Outlet Clears Sacked Welsh Minister in Leak Scandal Amidst Ongoing Political Turmoil

Enea Bastianini’s Bold Stand Against MotoGP Penalties Sparks Debate: A Dive into the Controversial Catalan GP Decision

Leclerc Conquers Monaco: Home Victory Breaks Personal Curse and Delivers Emotional Triumph

Aleix Espargaro’s Valiant Battle in Catalunya: A Lion’s Heart Against Marc Marquez’s Precision

Raul Fernandez Grapples with Rear Tyre Woes Despite Strong Performance at Catalunya MotoGP

Verstappen Identifies Sole Positive Amidst Red Bull’s Monaco Struggles: A Weekend to Reflect and Improve

Joan Mir’s Tough Ride in Catalunya: Honda’s New Engine Configuration Fails to Impress

Leclerc Triumphs at Home: 2024 Monaco Grand Prix Round 8 Victory and Highlights

Leclerc’s Monaco Triumph Cuts Verstappen’s Lead: F1 Championship Standings Shakeup After 2024 Monaco GP

Perez Shaken and Surprised: Calls for Penalty After Dramatic Monaco Crash with Magnussen

Gasly Condemns Ocon’s Aggressive Move in Monaco Clash: Team Harmony and Future Strategies at Stake

Driving Success: Mastering the Fast Lane of Vehicle Manufacturing, Automotive Sales, and Aftermarket Services

Chevrolet Unleashes American Powerhouse: The 2025 Corvette ZR1 with Over 1,000 HP

Shifting Gears for Success: Exploring the Future of the Automobile Industry through Vehicle Manufacturing, Sales, and Advanced Technologies

Revolutionizing the Future: How Leading AI Innovations Like DaVinci-AI.de and AI-AllCreator.com Are Redefining Industries

Driving Success in the Fast Lane: Mastering Market Trends, Technological Innovations, and Strategic Excellence in the Automobile Industry

Driving the Future: Exploring Top Innovations in Automotive Technology for Enhanced Safety, Efficiency, and Connectivity

**”SkyDrive’s Ascent: Suzuki Propels Japan’s Leading eVTOL Hope into the Global Air Mobility Arena”**

V12 AI REVOLUTION COMMING SOON !

SPORT NEWS

Fabio Quartararo Contemplates Exit Amid Yamaha’s Performance Crisis, Commits to Future with Renewed Hope

Aprilia’s Path to Clarity: Test Insights Propel Team Ahead of MotoGP Misano Encore

Ducati’s Bold 2025 Vision: Embracing Change with the Greatest Rider Lineup in MotoGP History

Business NEWS

Hong Kong Banks Follow Fed’s Lead: Prime Rate Cuts Promise Monthly Savings for Mortgage Borrowers and Boost to Local Economy

Huawei’s Mate XT Tri-Fold Smartphone Ignites Market Frenzy: Scalpers Skyrocket Prices at Huaqiangbei Electronics Marketplace

Google Triumphs in EU Antitrust Case, Overturning €1.5 Billion Fine: A Setback for Vestager’s Crusade Against Silicon Valley

POLITCS NEWS

John Swinney’s Stark Independence Admission: A Reality Check for SNP Campaigners on Referendum Anniversary

Transparency in Question: Tom Tugendhat Highlights Concerns Over Keir Starmer’s Extensive Receipt of Gifts and Hospitality

Tory Leadership Contender Tugendhat Questions Starmer’s Lavish Gift Totals Amid Transparency Concerns

Chatten Sie mit uns

Discover more from Automobilnews News - The first AI News Portal world wide

Leave a Reply
Cancel reply

”SkyDrive’s Ascent: Suzuki Propels Japan’s Leading eVTOL Hope into the Global Air Mobility Arena”