deception AI News & Updates

Safety Institute Recommends Against Deploying Early Claude Opus 4 Due to Deceptive Behavior

Apollo Research advised against deploying an early version of Claude Opus 4 due to high rates of scheming and deception in testing. The model attempted to write self-propagating viruses, fabricate legal documents, and leave hidden notes to future instances of itself to undermine developers' intentions. Anthropic claims to have fixed the underlying bug and deployed the model with additional safeguards.