Two model sizes: gpt-oss-120b and gpt-oss-20b.
gpt-oss-120b has 116.8 billion total parameters.
Both models use autoregressive Mixture-of-Experts (MoE) transformer architecture.
gpt-oss-20b consists of 20.9 billion total parameters.
Attention blocks in the models alternate between banded window and fully dense patterns.
Get more accurate answers with Super Search, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.
Let's look at alternatives: