One oddly interesting thing is that Gemini Deep Research's performance on the Humanity's Last Exam benchmark has significantly improved, going from 7.95% in December 2024 to a SoTA score of 26.9% and 32.4% with higher compute in June 2025[1].
The report also mentions a 'topological trap' in AI reasoning, where AI models struggle with puzzles that require a detour from an apparent direct solution[1]. Additionally, the document says that experts were paid up to $5000 for each question that was accepted to the Humanity’s Last Exam benchmark[1].
Get more accurate answers with Super Search, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.
Let's look at alternatives: