Quick facts about quantization techniques

Quantization reduces the memory footprint of the models.

Models are post-trained with quantization of the Mixture-of-Experts weights.

Weights are quantized to 4.25 bits per parameter.

Quantizing MoE weights enables the larger model to fit on a single 80GB GPU.

The smaller model can run on systems with as little as 16GB memory.

Space: Let’s explore the gpt-oss-120b and gpt-oss-20b Model Card