Which tokenizer do gpt-oss models use?

 title: 'Figure 3: We evaluate AIME and GPQA using the three different reasoning modes (low , medium , high) and plot accuracy against the average CoT + Answer length. We find that there is smooth test-time scaling of accuracy when increasing the reasoning level.'

The gpt-oss models utilize the o200k_harmony tokenizer, which is a Byte Pair Encoding (BPE) tokenizer. This tokenizer extends the o200k tokenizer used for other OpenAI models, such as GPT-4o and OpenAI o4-mini, and includes tokens specifically designed for the harmony chat format. The total number of tokens in this tokenizer is 201,088[1].

This tokenizer plays a crucial role in the models' training and processing capabilities, enabling effective communication in their agentic workflows and enhancing their instruction-following abilities[1].

Space: Let’s explore the gpt-oss-120b and gpt-oss-20b Model Card