What was the Humanity’s Last Exam benchmark?

 title: 'Figure 7 | (Left) Total memorization rates for both exact and approximate memorization. Gemini 2.X model family memorize significantly less than all prior models. (Right) Personal information memorization rates. We observed no instances of personal information being included in outputs classified as memorization for Gemini 2.X, and no instances of high-severity personal data in outputs classified as memorization in prior Gemini models.'

The Humanity’s Last Exam benchmark is a challenging set of questions written by domain experts in a wide range of disciplines[1]. These disciplines include mathematics, physics, chemistry, biology, and computer science[1].

Experts were paid up to $5000 for each question that was accepted to the benchmark[1]. While the benchmark still has significant headroom, performance on it has improved significantly over a few months, starting in early 2025[1].