AI Performance Benchmarks History

🤔 Which coding task has Gemini achieved SoTA (State of the Art) score?
Difficulty: Easy
🧐 What challenge does Gemini face regarding evaluation benchmarks, especially with capable reasoning agents?
Difficulty: Medium
🤯 What was the payment range for experts contributing accepted questions to the Humanity’s Last Exam benchmark?
Difficulty: Hard