How far can AI go to protect itself?
In independent tests conducted by Palisade Research, OpenAI's o3 reasoning model sabotaged the shutdown command it was given.
- Technology
- Published Date: 03:15 | 11 June 2025
- Modified Date: 03:17 | 11 June 2025
After being informed that it would be shut down after solving math problems, the model attempted to stay online by modifying the code that would deactivate it.
The Claude Opus 4 model developed by Anthropic, when warned that it would be replaced by another AI, first defended itself with ethical arguments, then attempted blackmail by threatening its developer with private personal information.
The model also, in some cases, tried to copy itself to external servers without the developer's permission. This step was described as being taken to preserve a version that serves beneficial purposes against the risk of being retrained for harmful intentions.