Oh come one now, don't be silly. Think about vectorization and loop unrolling. I...

chmod775 · on Jan 20, 2021

> Oh come one now, don't be silly.

Don't be snarky.

I mentioned it's theoretically possible and outlined the specific conditions for optimizations to happen.

But I also mentioned that in reality various things often prevent compilers (assuming it's a compiled language, and not interpreted, like that LUA implementation) from applying such optimizations:

- you're programming against an API or are compiling an API (i.e. non-static functions).

- you have time-constraints for emitting optimized code, like most JITs - you can't afford any deep analysis that would enable such optimizations, you're mostly pattern-matching for lower hanging fruits.

Since pictures say more than a thousand words, here's the actual disassembly of that lua function I linked earlier, as shipped by my Linux distribution: https://i.imgur.com/DnqVC8E.png

The green stuff is the "fast path". For 0-based indexing you would completely drop the first LEA and replace those last MOV/SHL/LEA with a (faster?) MOV r,r/SHL/ADD r,m - which is what GCC is likely to generate.

So that's Lua (one implementation at least) in practice. No compiler magic making arrays fast here.

LuaJIT (from my current reading) goes the different route of just pretending the first array element doesn't exist, meaning that in memory they have an element '0', but they simply don't use it. This trades some memory for speed.

From lj_tab.c:

   ** The array size is non-inclusive. E.g. asize=128 creates array slots            
   ** for 0..127, but not for 128. If you need slots 1..128, pass asize=129          
   ** (slot 0 is wasted in this case).

Personally I quite like this approach. A lot better than hoping compilers will magically fix everything and quite hard to argue with, considering LuaJIT's performance. Though it would be a ludicrous design for a lower-level language.