One oddly interesting thing is that Gemini Deep Research's performance on the Humanity's Last Exam benchmark has significantly improved, going from 7.95% in December 2024 to a SoTA score of 26.9% and 32.4% with higher compute in June 2025 [1].

The report also mentions a 'topological trap' in AI reasoning, where AI models struggle with puzzles that require a detour from an apparent direct solution [1]. Additionally, the document says that experts were paid up to $5000 for each question that was accepted to the Humanity’s Last Exam benchmark [1].


Is there any odd and super curious thing?

Follow Up Recommendations

Related Content From The Pandipedia