Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Codex builds a working NES Emulator in one hour (github.com/kaonashi-tyc)
6 points by zi2zi-jit 27 days ago | hide | past | favorite | 4 comments
Hi folks! I know NES emulators have been implemented countless times, in practically every language imaginable.

However, having an LLM fully replicate the spec purely from memory—without referencing existing code—is still a significant challenge. It requires the underlying model to have strong anti-hallucination capabilities and solid long-term planning to keep from going astray. Because of this, building an NES emulator makes for an excellent LLM stress test.

Here is how the emulator was built:

Data Gathering: I asked Codex to download the necessary developer manuals and test suites. It was strictly prohibited from searching for reference implementations online.

Development: I instructed Codex to build the emulator until all test suites passed. This process was mostly hands-free; I only chimed in to encourage it to continue when it paused.

First Draft: After just 4-5 prompts, Codex delivered a functional, pure-Python emulator—though it ran at a sluggish 7 FPS.

Optimization: Asking Codex to optimize the app completely on its own didn't work this time. Instead, I had it generate a flamegraph, which identified the PPU update as the bottleneck. I then instructed Codex to rewrite the PPU in Cython without breaking the passing tests.

Overall, I'm incredibly impressed by Codex. I already knew it was capable of the task, but the speed was astonishing. It finished the project in under an hour, using merely 2% of my weekly Pro quota.

While the NES might be a relatively easy system to emulate, I think emulation could serve as a fantastic benchmark for testing future LLMs.



Can you try to vibe code an AI shill detector next?


Quite amazing. This opens doors to many other emulators because now it can replicate quite nicely what is expected as output.


Totally agree. I am looking to build something more complex next, something like PS1 in a different language as test. That would require significant more effort but with the speed of how model gets improved I am optimistic.


It seems the most difficult topic is automating the performance optimizations.

For example: "I've run this task on real hardware and took 5 seconds, keep optimizing and iterating until you achieve similar values"

I'd love seeing a linux emulator running on DART simply because it removes the need for dependencies on each platform.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: