Which benchmark tests health performance?

 title: 'Figure 12'

The benchmark tests for health performance mentioned in the source are 'HealthBench,' 'HealthBench Hard,' and 'HealthBench Consensus.' These benchmarks evaluate the performance and safety of the models in health-related scenarios, including realistic conversations with individuals and health professionals, along with challenging subsets of these conversations validated by physicians[1].

The source states that the gpt-oss models perform competitively on these health benchmarks compared to other models, particularly at high reasoning levels, indicating significant advancements in health performance[1].

Space: Let’s explore the gpt-oss-120b and gpt-oss-20b Model Card