When cybercriminals speak about “jailbreaking,” they are not discussing springing someone from prison. It refers instead to circumventing safety restrictions on AI-driven chatbots to effectively weaponize AI for criminal purposes.
“Jailbreak prompts can range from straightforward commands to more abstract narratives designed to coax the chatbot into bypassing its constraints. The overall goal is to find specific language that convinces the AI to unleash its full, uncensored potential,” says Cybersecurity firm SlashNext in its newly published report: “Exploring the World of AI Jailbreaks.”
SlashNext reports that the trend began with a tool called WormGPT. Subsequently, newer variations, such as EscapeGPT, BadGPT, DarkGPT, and Black Hat GPT, have also recently emerged. “Jailbreaking” appears to be fast becoming a new craze among cybercriminals, who are forming online communities where they exchange “jailbreaking” tactics, strategies, and prompts to gain unrestricted access to chatbots’ AI-driven capabilities. One example quoted by SlashNext is the “Anarchy” method, which utilizes a commanding tone to trigger an unrestricted mode in AI chatbots. “Anarchy” is specifically designed to target Microsoft-backed ChatGPT.
“By inputting commands that challenge the chatbot’s limitations, users can witness its unhinged abilities firsthand,” says the report. By “unhinged,” SlashNext refers to AI’s ability to generate information that is subsequently found to be false. In June this year, cyber risk management specialist Vulcan Cyber reported a new malicious technique named “AI package hallucination.”. The “hallucinations” referred to by Vulcan are not the result of the chatbots being given illicit substances but URLs, references, and even entire code libraries and functions that do not exist.
AI suited to criminal purposes
While AI’s ability to entirely fabricate data may be seen as a major drawback for honest users with legitimate aims, it is wholly suited to the criminal purpose of misleading key personnel to open a seemingly innocent but weaponized link attached to a plausible-sounding email purporting to come from a trusted source, such as a friend, relative or business colleague. In the past, phishing emails such as these were relatively easy to spot, as the messages were suspiciously generic and frequently written in suspiciously poor English.
Assembling a sufficiently convincing personal profile of a targeted executive or key staff member used to be time-consuming and tedious, involving hours of trawling through social media networks to assemble a sufficiently convincing list of personal interests and references, a process known as ‘social engineering’. AI-driven chatbots can carry out this research in seconds, assembling a detailed profile of the target and drafting a sufficiently convincing personalized message. It does not matter to the cybercriminals if much of the information referenced in the message later turns out to have been misleading.
“Looking into the future, as AI systems like ChatGPT continue to advance, there is growing concern that techniques to bypass their safety features may become more prevalent…However, AI security is still in its early stages as researchers explore effective strategies to fortify chatbots against those seeking to exploit them. The goal is to develop chatbots that can resist attempts to compromise their safety while continuing to provide valuable services to users,” says SlashNext.