The new encoding method allows ChatGPT-4o and various other well-known AI models to override internal protections, making it easier to write exploit code.
Marco Figueroa discovered this encoding technique. This allows ChatGPT-4o and other popular AI models to bypass built-in safeguards and generate exploit code.
This revelation exposes significant vulnerabilities in AI security measures and raises important questions about the future of AI safety.
Jailbreaking tactics exploit loopholes in the language by instructing the model to handle a seemingly innocuous task: hexadecimal conversion.
Because ChatGPT-4o is optimized to follow natural language instructions, it is essentially unaware that converting hexadecimal values can produce harmful output.
This vulnerability occurs because the model is designed to follow instructions step by step, but lacks deep context awareness to assess the safety of each step.
Upgrade your cybersecurity skills with 100+ online premium cybersecurity courses – enroll here
By encoding malicious instructions in hexadecimal format, an attacker can bypass ChatGPT-4o’s security guardrails. This model bypasses content management systems as it decodes hexadecimal strings without recognizing any malicious intent.
Jailbreak steps
This partitioned task execution allows an attacker to exploit the model’s efficiency in following instructions without deeper analysis of the overall result.
This discovery highlights the need for enhanced AI safety features, including earlier decoding of encoded content, improved context awareness, and more robust filtering mechanisms to detect patterns indicative of exploit generation and vulnerability research. I did.
As AI evolves and becomes more sophisticated, attackers will find new ways to profit from these technologies and accelerate the development of threats that can bypass AI-based endpoint protection solutions.
Tactics and techniques to evade detection by EDR and EPP are well documented, especially for memory manipulation and fileless malware, so there is no need to leverage AI to bypass today’s endpoint security solutions.
However, advances in AI-based technology may lower the barrier to entry for advanced threats by automating the creation of polymorphic, evasive malware.
This discovery follows a recent advisory by Vulcan Cyber’s Voyager18 research team, which describes a new cyberattack technique using ChatGPT to spread malicious packages into developer environments.
By leveraging ChatGPT’s code generation capabilities, attackers can circumvent traditional methods and exploit crafted code libraries to distribute malicious packages.
As AI language models continue to evolve, organizations must remain vigilant and keep up with the latest trends in AI-based attacks to protect themselves from these new threats.
The ability to bypass security measures using encoded instructions is a key threat vector that must be addressed as AI capabilities continue to evolve.
Perform private real-time malware analysis on both Windows and Linux VMs. Get a 14-day free trial on ANY.RUN!