I am not familiar with 'mpweiher's opinions on optimizers, but I do want to make an important distinction: there are optimizations like "this abstraction will get inlined" or "the compiler is aware of this idiom" that are will happen essentially 100% of the time. C++ pioneered the approach of "zero cost abstractions" that depend on a smart compiler to get you good code, with Swift and Rust following its footsteps. This is how you can have a nice thing like an array type whose subscript function doesn't literally turn into a function call, etc.
What happened here in this blog post is the code was relying on a higher-level optimization that compilers aim to hit, but necessarily cannot in every case: things like vectorization, loop unrolling, or branch elimination, are keyed on heuristics that frequently change. In general this is not a big deal, because the compiler will pick something reasonable, but in the hottest sections of your code this kind of thing can kill performance. When people say they can beat a compiler, it's these kinds of places where you'd do it, and in that point I do agree that compiler optimizations can be limited in their benefit.
IMO (which is heavily biased towards "zero-overhead abstractions") this is still a strong showing for smart compilers. In 99% of cases, the compiler gives you what you want with less code, and then you profile to find the spots where it needs a bit of help.
What I came away wondering is whether it would make sense to just check in the asm with the optimizations you like once you finagle it. The author mentions being concerned about regression in future compiler versions (which seems reasonable since it already happened once) and this would solve that problem. It seems like a trade off is that you would not get future optimizations for that architecture, but this seems like a pretty small risk.
I know some projects that use that policy. Here's an example off the top of my head: https://github.com/apple-open-source-mirror/objc4/blob/fd675.... Looks fairly reasonable, with method calls and such, but after inlining and optimization it compiles down to the right handful of instructions that the authors intended it to.
C++ has slowly added some of these for branches, perhaps we will see compiler extensions for loops and such as well. I am sure Rust will probably get some of these at some point too.
What happened here in this blog post is the code was relying on a higher-level optimization that compilers aim to hit, but necessarily cannot in every case: things like vectorization, loop unrolling, or branch elimination, are keyed on heuristics that frequently change. In general this is not a big deal, because the compiler will pick something reasonable, but in the hottest sections of your code this kind of thing can kill performance. When people say they can beat a compiler, it's these kinds of places where you'd do it, and in that point I do agree that compiler optimizations can be limited in their benefit.
IMO (which is heavily biased towards "zero-overhead abstractions") this is still a strong showing for smart compilers. In 99% of cases, the compiler gives you what you want with less code, and then you profile to find the spots where it needs a bit of help.