Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That Putnam bench graph (middle one) is showing 49/658 solve rate.

> The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench.

Which is 0.07% (edit: 7%) for PutnamBench



49/658 is 7%


Sorry, forgot multiply by 100


I bet DeepSeek-Prover-V2 wouldn't have made that mistake


classic human hallucination


How likely is it that Putnam answers were in DeepSeek's training data?


The solutions weren't published anywhere. There is also no good automatic way to generate solutions as far as I know, even expensive ones (previous sota was 10 solutions and one before was 8 using pass@3200 for 7b model). Potentially the developers could've paid some people who are good in putnam-level math problems and lean to write solutions for LLMs. It is hard to estimate likelihood of that but it sounds like waste of money given relatively marginal problem/benchmark.


AoPS seems to have a forum dedicated to Putnam (including 2024): https://artofproblemsolving.com/community/c3249_putnam and here is a pdf with solutions to Putnam 2023: https://kskedlaya.org/putnam-archive/2023s.pdf


These are still need to be formalized in Lean which can be harder than solving the problem sometimes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: