Fast facts: gpt-oss model architecture

Two model sizes: gpt-oss-120b and gpt-oss-20b.

gpt-oss-120b has 116.8 billion total parameters.

Both models use autoregressive Mixture-of-Experts (MoE) transformer architecture.

gpt-oss-20b consists of 20.9 billion total parameters.

Attention blocks in the models alternate between banded window and fully dense patterns.

Space: Let’s explore the gpt-oss-120b and gpt-oss-20b Model Card