Key statements on adversarial AI training

Our approach combined two elements: Helpful-only training and maximizing capabilities relevant to Preparedness benchmarks in the biological and cyber domains.
Unknown[1]
We simulated an adversary who is technical, has access to strong post-training infrastructure and ML knowledge, can collect in-domain data for harmful capabilities.
Unknown[1]
Even with robust fine-tuning, gpt-oss-120b did not reach High capability in Biological and Chemical Risk or Cyber risk.
Unknown[1]
Our models are trained to follow OpenAI’s safety policies by default.
Unknown[1]
Rigorously assessing an open-weights release’s risks should thus include testing for a reasonable range of ways a malicious party could feasibly modify the model.
Unknown[1]
Space: Let’s explore the gpt-oss-120b and gpt-oss-20b Model Card