What is GPT-5's score on HealthBench Hard?

 title: 'Figure 2: Average Hallucination Rate (Browsing Enabled)'

In the GPT-5 evaluation on HealthBench Hard, the score for the gpt-5-thinking model is reported to be 46.2%, which shows a substantial improvement from 31.6% for OpenAI o3. The gpt-5-thinking-mini model also performed well, achieving a score of 40.3% on HealthBench Hard, outperforming all previous models, including OpenAI’s gpt-oss open-weight models[1].

Space: Let’s explore the GPT-5 Model Card