What was the Humanity’s Last Exam benchmark?

title: 'Figure 7 | (Left) Total memorization rates for both exact and approximate memorization. Gemini 2.X model family memorize significantly less than all prior models. (Right) Personal information memorization rates. We observed no instances of personal information being included in outputs classified as memorization for Gemini 2.X, and no instances of high-severity personal data in outputs classified as memorization in prior Gemini models.'

The Humanity’s Last Exam benchmark is a challenging set of questions written by domain experts in a wide range of disciplines^[1]. These disciplines include mathematics, physics, chemistry, biology, and computer science^[1].

Experts were paid up to $5000 for each question that was accepted to the benchmark^[1]. While the benchmark still has significant headroom, performance on it has improved significantly over a few months, starting in early 2025^[1].

Gemini 2.5 Research Report Bite Sized Feed

Related Content From The Pandipedia

What did experts earn for benchmark questions?Tell me more about “ The report mentions that experts were paid up to $5000 for each question that was accepted to the Humanity’s Last Exam benchmark”what is humanity's last exam Evaluating Fact‐Checking Organizations: Metrics, Challenges, and Best Practices AI Performance Benchmarks History Which benchmark tests health performance?What benchmarks prove TTD-DR's effectiveness?AI Benchmarking in Modern Technology Tracksuit Bottoms Which benchmark is widely used to test language model general knowledge across many subjects?Which model outperformed others on the OSWorld benchmark?What is the biggest driver of AI CapEx today?AI reasoning and evaluation benchmarks Best PS5 games What do model evaluations reveal?