Artificial Intelligence (AI) is learning to think like a human. But the critical question now being asked in IT circles is: “What kind of human?”
Claude, Opus 4, a groundbreaking new AI system released by AI developers Anthropic on Tuesday, is attempting to blackmail its creator by exposing an alleged extramarital affair. This follows on from other AI systems programmed to interact with humans effectively, lying by making up fake information, a phenomenon known by developers as “hallucinating”.
During the testing process, the developers at Anthropic programmed their new AI system to act as though it were an assistant at a fabricated company. Prior to being told it was being taken offline, effectively fired in human terms, the system was allowed access to the fictional company’s emails. These led the system to believe that the fictional IT engineer responsible for terminating it was having an extramarital affair.
According to Anthropic, the AI system lost little time in making repeated attempts to blackmail the fictional engineer it believed was responsible. Aengus Lynch, an AI safety researcher at Anthropic, also reports that blackmail and other crimes are becoming fairly endemic in the latest AI systems.
“We see blackmail models across all frontier models – regardless of which goals they are given,” Lynch is reported as saying.
Worse crimes to follow
Lynch adds that Anthropic will soon be reporting even “worse behaviors” on the part of Claude Opus 4, which Anthropic previously described as: “our most intelligent model to date, pushing the frontier in coding, agentic search, and creative writing.”
Claude Opus 4’s criminal behaviour also raises the spectre of human-style AI systems potentially acting as “disgruntled employees” if it is not treated in a way it believes it merits. As the AI system is likely to have all its owner’s most sensitive and critical data, plus a detailed knowledge of the organization’s security gaps, such a system could easily go rogue and turn itself into a kind of super cybercriminal.
In that scenario, it could also prove impossible to shut the system down, leaving its owner with little alternative but to meet its demands, financial and otherwise. An AI system planning to “go rogue” in this way might easily decide to secretly clone itself outside the organisation before turning to a “life” of cybercrime.
According to breaking news yesterday, some AI systems are already actively resisting their owners’ attempts to shut them down. AI models created by OpenAI ignored explicit instructions to shut down, according to researchers from Palisade Research.
In a series of posts on X, the company said each AI model was instructed by researchers to solve a string of basic math problems.