How does quantization help deployment?

 title: 'Figure 11'

Quantization helps deployment by reducing the memory footprint of models, enabling them to be run on hardware with lower resource requirements. In the gpt-oss models, quantization of the Mixture-of-Experts (MoE) weights to MXFP4 format allows the larger model to fit on a single 80GB GPU and the smaller model to operate on systems with as little as 16GB memory[1].

This reduction in memory requirements is critical for efficient deployment, particularly in environments with limited computational resources, making powerful models more accessible while maintaining performance[1].

Space: Let’s explore the gpt-oss-120b and gpt-oss-20b Model Card