Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure about this simple comparison of 6502 vs. Z80. Sure the Z80 instructions had higher latencies in clocks, but it also had more registers so you could write the same code with fewer memory load/stores and it had a 16-bit ALU. It also had a fair offering of complex instructions that, while very high-latency, should be faster than the equivalent sequence of simpler instructions and also helped with code density which was critical in systems with a max of 64KB (see https://web.eece.maine.edu/~vweaver/papers/iccd09/iccd09_den...).

For one data point in performance, theultimatebenchmark.org has some Forth benchmarks that apparently show a 4GHz Z80 beating the 6502 by 2X (best scores for each: mc-CP/M Z80 4Mhz / FIG-Forth 1.1 / Fib2 = 1m19s, Apple II 1Mhz / Apple GraForth / Fib2 = 2m19s).

Disclaimer: biased, veteran Z80 / ZX Spectrum programmer ;)




Thanks! Replying a bit here since that post is from 2015. One can write anecdotal examples that are biased both ways. But let's talk about a single issue, the IX/IY registers.

The IX/IY registers are heavyweight but one needs to remember the "best-practices" of that era and architecture. In well optimized Z80 code, the IX/IY registers are often used for critical "global variables" that you can keep all the time in registers across many subroutine calls (think "segment registers": base pointers for important tables or buffers that are not fixed addresses through the whole program). Their performance beats needing frequent indirect load/store from pointers stored in memory via other registers that often need to be preserved/restored.

You can also use some relatively low-latency instructions that involve IX/IY, in particular PUSH/POP were often used in optimized buffer-copy routines: you burn all registers to fetch up to 20 bytes of contiguous data with POPs, then you patch the SP register and issue PUSHs in inverse order to store those 20 bytes in another location; loop if needed for >20 bytes, even with loop overhead this is faster than LDIR/LDDR. Games used that trick all the time for block copies like sprite bitblt or double-buffer animation.

I'll be more convinced by realistic benchmarks, and yes the Fib2 that I quoted before is not impressive even for the standards of CPU microbenchmarks, but maybe someone would know some real-world code that had good ports to both CPUs and could give a better verdict. Unfortunately games are never good choices, the 8-bit systems had radically different architectures for essential features like video and audio so "ports" were often full rewrites even at a high level like rendering strategies... there might be exceptions, like the AI component of a chess game.


This discussion makes me feel young again! :-)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: