OpenAI’s DeepResearch can complete 26% of ‘Humanity’s Last Exam’ — a benchmark for the frontier of human knowledge
OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the exam.

Mar 26, 2025 0
Mar 26, 2025 0
Mar 26, 2025 0
Mar 26, 2025 0
Mar 26, 2025 0
Mar 26, 2025 0
Mar 26, 2025 0
Mar 26, 2025 0
Or register with email
Jan 28, 2025 0
Jan 26, 2025 0
Mar 1, 2025 0
Feb 24, 2025 0
Feb 14, 2025 0
Jan 30, 2025 1
Jan 29, 2025 0
Jan 28, 2025 0
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.