Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
O3 mini vs. Gemini flash 2.0 in chess (simulateagents.com)
2 points by dinp on Feb 17, 2025 | hide | past | favorite | 1 comment


Source code: https://github.com/don-dp/simulateagents/

Click on 'Play moves' to watch a replay.

I initially planned to run a chess tournament for LLMs but they are not good: besides obvious mistakes, they output incorrect moves, get stuck in loops by repeating the same moves and the smaller models fail to output valid json frequently. I thought the reasoning models like o3 mini might be good, but they are an incremental improvement in chess.

Feedback and suggestions for other games to explore welcome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: