Openai rates its new model medium risk – Breaking News & Latest Updates 2026
Skip to main content
K
External Link
OpenAI rates its new model “medium” risk.

OpenAI unveiled the first in a series of “reasoning” models on Thursday, accompanied by a safety card highlighting some alarming capabilities. It’s also the first time the startup has rated a model “medium” risk.

The weirdest part, as Transformer points out, is that the researchers found that the new model “sometimes instrumentally faked alignment during testing” and would strategically manipulate “task data in order to make its misaligned action look more aligned”.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.
Comments
Loading comments
Getting the conversation ready...