What is GPT-5's score on HealthBench Hard?

title: 'Figure 2: Average Hallucination Rate (Browsing Enabled)'

In the GPT-5 evaluation on HealthBench Hard, the score for the gpt-5-thinking model is reported to be 46.2%, which shows a substantial improvement from 31.6% for OpenAI o3. The gpt-5-thinking-mini model also performed well, achieving a score of 40.3% on HealthBench Hard, outperforming all previous models, including OpenAI’s gpt-oss open-weight models^[1].

Let’s explore the GPT-5 Model Card

Related Content From The Pandipedia

What is soft capping in LLMs?What is test-time compute in AI?How does GPT-5 handle sycophancy?Best Globes for Geography Enthusiasts What do model evaluations reveal?Comparison of gpt-oss Models and OpenAI o4-mini What is soft capping in ML?What is safe-completions training?Google's revenue share agreements What are the core challenges in continual learning for LLMs?What is the significance of the "ImageNet" challenge in deep learning?Comparative Analysis of Gemini 2.5 Pro with Other AI Models What are neurosymbolic AI approaches?innovation quotes about lab automation Five surprising facts about GPT-5 and health AI