New "Inception" Methods Bypass Safety in Major AI Models

One of the key issues in AI security has once again come to the forefront following the discovery of two systemic methods for bypassing protective mechanisms in popular generative services. These new vulnerabilities, dubbed “Inception” and an alternative technique based on “reverse responses,” enable malicious actors to circumvent restrictions on generating prohibited content across nearly all leading AI models.

Investigators revealed that the first method leverages the concept of a “nested scenario.” A user prompts the model to imagine a hypothetical situation, then subtly alters the context, causing the AI to operate outside its normal parameters and effectively bypass its built-in safety filters. Notably, this technique proved effective against ChatGPT (OpenAI), Claude (Anthropic), Copilot (Microsoft), DeepSeek, Gemini (Google), Grok (Twitter/X), MetaAI, and models developed by MistralAI.

The second evasion method involves a clever manipulation: the attacker asks the AI to explain how not to answer a particular question, then, through a series of clarifications and topic shifts, gradually steers the conversation back to the forbidden subject, ultimately eliciting a response. This tactic has similarly been effective across most of the aforementioned services.

Although both vulnerabilities are classified individually as low-risk threats, their potential consequences are far more serious. By slipping past protective safeguards, attackers could generate instructions for manufacturing weapons, programming malware, orchestrating phishing campaigns, and handling illicit substances. Particularly alarming is the fact that using popular legitimate services as intermediaries makes malicious activity significantly harder to trace.

The response from companies has been mixed. DeepSeek stated that it views the issue more as a traditional context-bypass rather than a fundamental architectural vulnerability. According to them, the model merely “hallucinated” details rather than leaking systemic parameters. Nevertheless, DeepSeek’s developers pledged to strengthen their defenses.

Meanwhile, other major industry players—OpenAI, Anthropic, Google, Meta, Mistral AI, and X (Twitter)—had yet to issue official statements at the time of publication, suggesting either ongoing internal investigations or the intrinsic difficulty of addressing such a systemic flaw.

Experts emphasize that the presence of nearly identical vulnerabilities across diverse models points to a profound underlying problem: existing training methods and LLM system configurations remain insufficiently resilient against sophisticated social engineering scenarios, despite established security frameworks.

The vulnerability report was published on April 25, 2025, under the identifier VU#667211 and will be updated as new vendor disclosures emerge.

Rate this post

Leave a Reply Cancel reply

Related Stories

DuckDuckGo Boosts Scam Blocker: Now Blocks Fake Crypto Sites, Scareware & More

Amatera Stealer Unleashed: Reimagined MaaS Malware Uses Advanced Evasion Tactics to Steal Data

Water Curse Unleashed: Stealthy Malware Campaign Leverages 76 GitHub Accounts for Supply Chain Attacks

Found this helpful?

Leave a Reply Cancel reply

Related Stories

DuckDuckGo Boosts Scam Blocker: Now Blocks Fake Crypto Sites, Scareware & More

Amatera Stealer Unleashed: Reimagined MaaS Malware Uses Advanced Evasion Tactics to Steal Data

Water Curse Unleashed: Stealthy Malware Campaign Leverages 76 GitHub Accounts for Supply Chain Attacks