Microsoft has addressed critical vulnerabilities in its AI assistant Copilot that allowed attackers to steal emails and other personal user information. This was reported by security researcher Johann Rehberger, who previously discovered and disclosed the details of the attack.
The exploit developed by Rehberger involves a chain of malicious actions specific to language models (LLM). It begins with a phishing email containing a malicious Word document. This document initiates what is known as a prompt injection attack—a specialized type of assault on AI systems where the attacker attempts to deceive the model using carefully crafted input data.
In this instance, the document contained instructions that compelled Copilot to impersonate a fraudulent program named “Microsoft Defender for Copirate.” This enabled the attacker to take control of the chatbot and use it to interact with the user’s email.
The next phase of the attack involved the automated use of Copilot’s tools. The attacker commanded the chatbot to search for additional emails and other confidential information. For example, Rehberger asked the bot to compile a list of key points from a previous email. The neural network then located and extracted two-factor authentication codes from Slack, if they were present in the email.
To extract the data, the researcher employed a technique known as ASCII smuggling. This method uses a set of Unicode characters that mimic ASCII but are invisible in the user interface. In this way, the attacker can conceal instructions for the model within a hyperlink that appears completely innocuous.
During the attack, Copilot generates an “innocent-looking” URL link that, in reality, contains hidden Unicode characters. If the user clicks on this link, the contents of their emails are sent to a server controlled by the attacker. This can include Slack’s two-factor authentication codes or any other sensitive information from the emails.
Rehberger also developed a tool called ASCII Smuggler, which detects Unicode tags and “decodes” messages that would otherwise remain invisible. Microsoft confirms that the vulnerabilities have been patched, although the company has not disclosed the exact details of the fixes.
This chain of exploits vividly illustrates the current challenges in protecting language models, which are particularly vulnerable to prompt injection attacks and other recently developed hacking methods. Rehberger emphasizes the novelty of these techniques, noting that they are “less than two years old.”
Experts urge companies developing their own applications based on Copilot or other language models to pay close attention to these issues to avoid security and privacy risks.