AI Discoveries in Vintage Video Games

💰 How much were experts potentially paid for accepted questions in the Humanity’s Last Exam benchmark?
Difficulty: Easy
📈 What key factor is essential for scaling evaluations to match the increasing capabilities of AI systems?
Difficulty: Medium
🤔 What challenge arises with the increasing capabilities of agentic systems in AI research, particularly concerning evaluation benchmarks?
Difficulty: Hard