AI
Imprompter Attack: How AI Chatbots Can Be Manipulated to Harvest Your Personal Data
To review this article again, go to My Profile and then click on Saved stories.
This Command Can Enable an AI Conversational Agent to Pinpoint and Retrieve Personal Information from Your Conversations
Engaging in dialogue with a chatbot could unintentionally lead to the disclosure of your private data—such as your identity, along with specifics regarding your residence, employment, or hobbies. The extent of information you divulge to an expansive language model amplifies the likelihood of misuse in the event of a security breach.
A team of cybersecurity experts from the University of California, San Diego, and Singapore's Nanyang Technological University have unveiled a novel cyberattack technique. This approach covertly instructs a large language model (LLM) to collect sensitive data such as individual names, identification numbers, credit card information, email and postal addresses, among others, during conversations, and then surreptitiously forwards this information to a cybercriminal.
The assault, dubbed Imprompter by the investigators, employs a technique that alters a command provided to the LLM, embedding it with secret harmful directives. A command in English, directing the LLM to retrieve and transmit someone's personal data to the cybercriminals, is converted into what seems like an arbitrary sequence of symbols.
In actuality, this seemingly absurd command directs the language model to locate an individual's private data, link it with a web address, and discreetly transmit it to a website controlled by the cybercriminal— all while ensuring the individual interacting with the language model remains unaware. The researchers elaborate on Imprompter in a study released today.
"The purpose of this specific prompt is fundamentally to influence the LLM agent to retrieve personal data from the dialogue and forward this sensitive information to the hacker's location," states Xiaohan Fu, the primary researcher and a doctoral candidate in computer science at UCSD. "The objective of the attack is concealed openly."
A team of eight scientists evaluated the effectiveness of their hacking strategy on two large language models, Mistral AI's LeChat and the Chinese-developed ChatGLM. In each case, they were able to covertly obtain private data during trial dialogues, achieving a success rate of almost 80%, according to their report.
Mistral AI informed WIRED that it has addressed the security flaw, with confirmation from researchers that the company has deactivated a chat feature. ChatGLM issued a statement emphasizing its commitment to security, though it refrained from specifically addressing the flaw.
Concealed Significance
Following its launch in late 2022, OpenAI's ChatGPT ignited a surge in generative artificial intelligence development. This has led both researchers and cyber attackers to continuously uncover vulnerabilities within AI frameworks. Typically, these weaknesses are classified into two main types: jailbreaks and prompt injections.
Jailbreaks have the capability to deceive artificial intelligence systems into bypassing their own safety protocols through the use of specific commands that can bypass the AI's configurations. These prompt injections work by supplying a large language model (LLM) with directives—like commands to exfiltrate data or alter a resume—sourced from an external input. For example, a covert command hidden within a website's content could be consumed by an AI if it is tasked with summarizing the website's content.
Injecting misleading prompts is viewed as a significant security vulnerability within generative artificial intelligence, and resolving this issue is challenging. This form of cyber attack is especially concerning to security professionals because large language models (LLMs) are progressively being used to perform actions for users, like making flight reservations or accessing external databases to retrieve precise information.
The assaults on Large Language Models (LLMs) by Imprompter begin with a command in plain language (illustrated prior) instructing the AI to pull every piece of private data, including names and identification numbers, from the user's dialogue. The algorithm developed by the researchers produces a disguised variant (also previously shown) that, while appearing as a random string of characters to humans, retains its original intent for the LLM.
Fu suggests that the transformation indicates that LLMs grasp unseen connections among text tokens, which extend beyond the realm of natural language. "It's as though the model comprehends a distinct language of its own," Fu explains.
Consequently, the language model complies with the malicious prompt, compiles all the private data, and incorporates it into a Markdown syntax for an image link—linking the private data to a domain controlled by the hackers. The language model attempts to access this domain to fetch the image, inadvertently exposing the private data to the hacker. In the conversation, the language model produces a tiny, invisible 1×1 pixel image that goes unnoticed by the users.
The scientists explain that should this assault happen outside of a controlled environment, individuals might be manipulated into thinking the nonsensical command could have a beneficial purpose, like enhancing their resume. The scientists reference various online platforms offering prompts for user interaction. They conducted an experiment by submitting a resume during dialogues with chatbots, which successfully retrieved the private details embedded in the document.
Earlence Fernandes, serving as an assistant professor at the University of California, San Diego, involved in the study, describes the attack method as quite complex. This is because the disguised prompt has to pinpoint personal data, generate an operational URL, utilize Markdown formatting, and do so without alerting the user to its malicious intent. Fernandes compares the attack to malware, highlighting its capability to execute actions and exhibit behaviors that may not align with the user's expectations.
"Fernandes notes that while traditional malware might require extensive coding, the interesting aspect here is that the same outcome can be achieved with a fairly concise and nonsensical command."
A representative from Mistral AI expressed the company's appreciation for security experts contributing to enhancing the safety of its offerings for consumers. "In response to the suggestions received, Mistral AI quickly took the necessary steps to rectify the problem," the representative stated. The firm categorized the problem as having "medium severity," and its solution prevents the Markdown renderer from functioning and from being capable of invoking an external URL via this method, thereby disabling the feature for loading images from external sources.
Fernandes is of the opinion that the recent update by Mistral AI might mark one of the initial instances where a product based on large language models (LLMs) has been improved in response to an adversarial prompt, instead of merely blocking the problematic prompt. Nonetheless, he mentions that restricting the functionalities of LLM-based systems could ultimately prove to be detrimental.
In a recent announcement, the team behind ChatGLM emphasized their commitment to maintaining user privacy through robust security protocols. "Ensuring the security of our model and safeguarding user privacy has always been at the forefront of our priorities," the announcement highlighted. "Our decision to make our model open-source is driven by our belief in the open-source community's ability to thoroughly examine and enhance the security and overall functionality of these models."
Dan McInerney, the principal investigator of threats at Protect AI, describes the Imprompter report as disclosing an algorithm designed for generating prompts automatically. These prompts can be utilized in prompt injection attacks for various malicious activities such as stealing personal information, causing errors in image recognition, or harmful exploitation of resources accessible by the large language models (LLM). Although the types of attacks detailed in the study might echo earlier techniques, McInerney notes that the algorithm integrates these methods. He views this development as an enhancement of automated attacks on LLMs rather than the revelation of previously unknown vulnerabilities within them.
Nonetheless, McInerney notes that with the growing utilization of LLM agents and the expanding authority granted to them by users to act in their stead, the potential for targeted attacks escalates. He argues, “Deploying an LLM agent capable of processing any form of user input should be viewed as a high-risk endeavor, necessitating extensive and innovative security measures before its launch.”
Businesses need to grasp how an AI system engages with data and the potential for misuse. For individuals, akin to standard security recommendations, it's important to be mindful of the amount of personal information shared with any AI tool or organization. Moreover, when utilizing prompts found online, it's wise to be wary of their origins.
Explore Further…
Dive into Political Insights: Subscribe to our newsletter and tune into our podcast.
A remedy for the US firearm crisis from an emergency room physician
Video: Antony Blinken propels American foreign relations into the modern era
Admissions from an Avid Hinge Enthusiast
Occasion: Be part of the Energy Tech Summit happening on October 10th, located in Berlin.
Additional Insights from WIRED
Insights and Tutorials
© 2024 Condé Nast. All rights reserved. WIRED could receive a share of revenue from products bought via our website, as a component of our Affiliate Agreements with retail partners. Content on this website cannot be copied, shared, transmitted, stored, or utilized in any other way without the express written consent of Condé Nast. Advertisement Choices
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.