I love this. My first (toy) compiler was bootstrapped a similar way, though much more ad-hoc (I didn't write a formal grammar until I was well into it). Back then a lot of compilers were written in assembler, so it wasn't such an unusual starting point - I assume I was drawing inspiration from e.g. PDQ (a public domain Pascal implementation for the Amiga) and others that I think I would have seen first.
I started basically by having the compiler do very basic parsing of M68k assembler and just pass everything that looked like assembler straight through.
Then I added support for defining functions, but all the content of functions was still assembler. Then I started adding expression support and things like local variable and types.
I wish I still had the source (lost in a move, long after I stopped doing anything on it) - one of the fun things about it was that I kept the ability to intersperse assembler instructions everywhere - they were treated as normal statements. And the M68k registers were first class variables in the language that could occur in any expression. So e.g. you could write "D0.w = D1.w + a", where D0.w would refer to the low 16 bits of the D0 register, and D1.w the low 16 bits of the D1 register, and "a" would be a variable allocated on the stack.
Basically the features I added were largely guided by what seemed like it'd let me shave more lines off the compiler itself...
You learn a lot about the language you're writing when you bootstrap from asm or try to condence everything into a tiny core - a lot dependencies in the language that are non-obvious becomes a lot clearer.
> one of the fun things about it was that I kept the ability to intersperse assembler instructions everywhere - they were treated as normal statements. And the M68k registers were first class variables in the language that could occur in any expression.
That sounds like so much horror and so much fun at the same time.
It was awesome for slowly migrating the compiler itself from assembler, and also for things like interfacing with the OS - I could write all the glue code inline.
But yes, it was easy to shoot your foot off with it. The main saving grace was that compared to i386, the M68k architecture has plenty of general purpose registers - 8 data registers and 8 address registers (including the stack pointer), so it was reasonably easy to avoid clobbering registers by having some strict rules about which registers were used for what combined with a very simple extra pass to the register allocator that'd mark any registers that were mentioned by name in a function as off limits.
It actually let me defer adding "real" local variables for quite some time since I could simply use the registers.
BBC Micro Basic let you interpose assembler in the middle of the program - although it was a two-pass assembler and you had to call both passes seperately with a small FOR loop if you wanted labels to work.
I started basically by having the compiler do very basic parsing of M68k assembler and just pass everything that looked like assembler straight through.
Then I added support for defining functions, but all the content of functions was still assembler. Then I started adding expression support and things like local variable and types.
I wish I still had the source (lost in a move, long after I stopped doing anything on it) - one of the fun things about it was that I kept the ability to intersperse assembler instructions everywhere - they were treated as normal statements. And the M68k registers were first class variables in the language that could occur in any expression. So e.g. you could write "D0.w = D1.w + a", where D0.w would refer to the low 16 bits of the D0 register, and D1.w the low 16 bits of the D1 register, and "a" would be a variable allocated on the stack.
Basically the features I added were largely guided by what seemed like it'd let me shave more lines off the compiler itself...
You learn a lot about the language you're writing when you bootstrap from asm or try to condence everything into a tiny core - a lot dependencies in the language that are non-obvious becomes a lot clearer.