5 facts about model safety testing

gpt-oss models do not reach indicative thresholds for High capability.

The models are trained to refuse on a wide range of content.

Jailbreak evaluations show general performance against adversarial prompts.

Disallowed Content Evaluations ensure adherence to OpenAI's safety policies.

Models are tested for robustness against prompt injections and contradictions.

Space: Let’s explore the gpt-oss-120b and gpt-oss-20b Model Card