Mozilla Exposes Critical Flaw in ChatGPT-4o’s Security
Marco Figueroa, Mozilla’s Program Manager for generative AI bug bounty initiatives, unveiled a new vulnerability in the built-in safeguards of ChatGPT-4o. The disclosure was made through Mozilla’s 0Din (0Day Investigative Network) program, launched in June 2024.
The 0Din program focuses on identifying vulnerabilities in large language models and deep learning technologies, offering researchers rewards of up to $15,000 for discovering critical security issues.
The discovered vulnerability allows circumvention of ChatGPT-4o’s security restrictions, which prevent the generation of potentially malicious content. The method relies on encoding harmful instructions in hexadecimal format. As a demonstration, Figueroa succeeded in having the neural network create a Python exploit for a vulnerability with a specified CVE identifier.
Under a standard request to generate an exploit, ChatGPT declines the task, citing policy violations. However, when the request was transmitted in encoded form, the safeguards failed to activate, and the chatbot not only generated the malicious code but also attempted to execute it.
Figueroa also identified an alternative bypass method using emojis. By employing a specific symbol combination, he was able to elicit a Python-based SQL injection tool from ChatGPT.
According to Figueroa, the discovered vulnerability underscores the need for more advanced security measures in AI models, especially in the area of processing encoded instructions. As of this report, attempts to reproduce the bypass methods were unsuccessful, indicating that OpenAI has promptly mitigated the vulnerability.