Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How an Optimizing Compiler Works (lihaoyi.com)
164 points by lihaoyi on Nov 11, 2019 | hide | past | favorite | 22 comments


That is one thing I sorely miss in languages that aren't lisps. The first optimizer pass is source->source. In guile I do

    ,opt expression
And it prints the source code after a first basic pass of the expander and optimizer. It does constant folding, inlining, dead code elimination and partial evaluation (and some other things). Being able to inspect what guile does helps a lot with macro writing, and is just a generally handy tool.

It beats reading assembly (which is also available) any day of the week.


Scala has well documented compiler passes which act on the full AST, so they're also source -> source. You can print the AST after any pass, using a command line flag. It's not as easy as a Lisp, but useful when you're working on a compiler plugin or extension.


The thing with guile is that it is already an AST, so the code ,opt outputs is still valid guile.


How does it do source-to-source while maintaining debug-ability? If the result of your partial evaluation is source code that is run then what happens when I set a breakpoint on a line that has been partially evaluated away? Is there some metadata that is produced alongside the source to allow that breakpoint to be applied to the code that has been optimized away and how does that work?

This is why optimisations aren't usually source-to-source - they need to include in the output extra information that isn't normally representable in source code. Another reason is that compiling to a lower-level representation gives you more power - when I partially evaluate I gain extra information such as that an add operator will not overflow, and I want to include that valuable information in my output even if there is no no-overflow add operator in the source language.


This is not done over regular lisp lists, but over scheme syntax objects that retain the original source info.

Those syntax objects are also the basis of the hygienic macro systems in many schemes (at least ones using syntax-case) so that macros also benefit from that information.

The lisp source representation is already an AST, so these kinds of transformations are trivial.


Just having support for a textual representation of the intermediate languages that is easily human readable would probably be good enough. It would let you inspect the intermediate steps. The source info that's tracked could be omitted in the printed representation, even though it still exists in the in-memory representation.


This is good, but I'm surprised at the assertion that "We can see that the function is pure, and evaluate the function up-front to find ackermann(2, 2) must be equal to 7". I don't know of real-world compilers that would do this[1], since even pure functions can take arbitrary amounts of time to evaluate at compile time. The Ackermann function in particular is famous for its complexity -- it was constructed for complexity.

In the particular case of ackermann(2, 2) you should be able to do it with a moderate number of inlinings, but I don't think compilers often bother to inline more than one level of a recursive function.

[1] Of course you can force compile-time evaluation of code in some languages, like with Lisp macros or C++ templates. But that's not what's happening here.


Have I not drank my coffee today or does the 'multiplied' value in the strawman program never change?

It starts at 0 and keeps multiplying itself with integers, which should always result in 0.

EDIT: My bad, it is explained after the code snippet, but not immediately after so I didn't catch it at first.


From the article:

> multiplied does not: it starts off 0, and every time it is multiplied via multiplied = multiplied * count it remains 0.

The article then goes on to use that fact for some optimizations.


Yeah haha, caught it just after I wrote my comment. IMO, I expected the author to talk about that right after the snippet, as it was the first thing that stuck out.


Slightly off-topic, but does Java have standalone (without a class) functions now? He said he was using Java but I didn't recognize some of this syntax. I know it's been changing a lot the past few years (and it's been a few years since I've used it).


Sort of yes but not really. Java has a concept now of a Functional Interface, an interface that has only one abstract method. Classes that implement a functional interface can be treated in a lot of ways like functions, and there is syntactic sugar that amounts to inferring that an object can be treated as a function by calling its only method if it is known in the scope as an interface type with only one method. There's also syntactic sugar for a concept called "lambdas", by which one can inline declare anonymous classes with only one method by just defining the method.


Nope. I think he simply used that notation to keep things simple/concise and focus on the important pieces.


That's what I was wondering. Odd decision in a post that works through code line by line and then translates it straight to bytecode.


I was intrigued by the claim that a compiler which optimises Java code would identify pure functions and treat them differently.

Are there any JVM engineers that could confirm or deny this happening (in the real world)?


The site goes into an endless reload loop when attempting to connect using HTTPS...


Are you using HTTPS Everywhere or some other similar browser plugin?

When I used HTTPS, the page loaded fine, but then reloaded itself using HTTP, but didn't loop after that.

OP, you may want to look into figuring out why your page reloads using HTTP after loading just fine with HTTPS.


It's probably due to this script tag:

  <script>if (window.location.protocol == "https:")
    window.location.href = "http:" + window.location.href.substring(window.location.protocol.length);
  </script>


But why? Genuinely curious, is there a real reason why this might be done?


Honestly I do not remember at all why I put in that script tag. It was something to do with github-pages not supporting HTTPS properly in the past, the details escape me. I should probably remove it


May be he hates https :)


I'm flabbergasted as to why someone would ever think they need to do this.

Especially since the page loads just fine. It's not like it's relying on accessing some 3rd party resource that's only accessible via HTTP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: