I know this a four day old comment, but based on your post history I think you are probably the best person to ask to be more specific. So, you start out stating “C has its unique advantages”, an assertion I agree with but more for ‘vibes’ than because I can articulate the actual advantages (other than average compilation times). If you see this I would love to hear your list of C’s unique advantages.
Why does having more more registers lead to spilling? I would assume (probably) incorrectly, that more registers means less spill. Are you talking about calls inside other calls which cause the outer scope arguments to be preemptively spilled so the inner scope data can be pre placed in registers?
More registers leads to less spilling not more, unless the compiler is making some really bad choices.
Any easy way to see that is that the system with more registers can always use the same register allocation as the one with fewer, ignoring the extra registers, if that's profitable (i.e. it's not forced into using extra caller-saved registers if it doesn't want to).
Yes, the argument for increasing the number of GPRs is precisely to eliminate the register spilling that in necessary in x86-64 programs whenever a program has already used all available architectural registers, a register spilling that does not happen whenever the same program is compiled for Aarch64 or IBM POWER.
So, let's take a function with 40 alive temporaries at a point where it needs to call a helper function of, say, two arguments.
On a 16 register machine with 9 call-clobbered registers and 7 call-invariant ones (one of which is the stack pointer) we put 6 temporaries into call-invariant registers (so there are 6 spills in the prologue of this big function), another 9 into the call-clobbered registers; 2 of those 9 are the helper function's arguments, but 7 other temporaries have to be spilled to survive the call. And the rest 25 temporaries live on the stack in the first place.
If we instead take a machine with 31 registers, 19 being call-clobbered and 12 call-invariant ones (one of which is a stack pointer), we can put 11 temporaries into call-invariant registers (so there are 11 spills in the prologue of this big function), and another 19 into the call-clobbered registers; 2 of those 19 are the helper function's arguments, so 17 other temporaries have to be spilled to survive the call. And the rest of 10 temporaries live on the stack in the first place.
So, there seems to be more spilling/reloading whether you count pre-emptive spills or the on-demand-at-the-call-site spills, at least to me.
You’re missing the fact that the compiler isn’t forced to fill every register in the first place. If it was less efficient to use more registers, the compiler simply wouldn’t use more registers.
The actual counter proof here would be that in either case, the temporaries have to end up on the stack at some point anyways, so you’d need to look at the total number of loads/stores in the proximity of the call site in general.
> You’re missing the fact that the compiler isn’t forced to fill every register in the first place.
Temporaries start their lives in registers (on RISCs, at least). So if you have 40 alive values, you can use the same one register to calculate them all and immediately save all 40 of them on the stack, or e.g. keep 15 of them in 15 registers, and use the 16th register to compute 25 other values and save those on the stack. But if you keep them in the call-invariant registers, those registers need to be saved at the function's prologue, and the call-clobbered registers need to be saved and restored around inner call sites. That's why academia has been playing with register windows, to get around this manual shuffling.
> The actual counter proof here would be that in either case, the temporaries have to end up on the stack at some point anyways, so you’d need to look at the total number of loads/stores in the proximity of the call site in general.
Would you be willing to work through that proof? There may very well be less total memory traffic for machine with 31 registers than with 16; but it would seem to me that there should be some sort of local optimum for the number of registers (and their clobbered/invariant assignment) for minimizing stack traffic: four registers is way too few, but 192 (there's been CPUs like that!) is way too many.
32 registers have been used in many CPUs since the mid seventies until today, i.e. for a half of century.
During all this time there has been a consensus that 32 registers are better than less registers.
A few CPUs, e.g. SPARC and Itanium have tried to use more registers than that, and they have been considered unsuccessful.
There have been some inconclusive debates about whether 64 architectural registers might be better than 32 registers, but the fact the 32 are better than 16 has been pretty much undisputed. So there are chances that 32 is close to an optimal number of GPRs.
Using 32 registers is something new only for Intel-AMD, due to their legacy, but it is something that all their competition has used successfully for many decades.
I have written many assembly language programs for ARM, POWER or x86. Whenever I had 32 registers, this made it much easier for me to avoid most memory accesses, for spilling or for other purposes. It is true that a compiler is dumber than a human programmer and it frequently has to adhere to a rigid ABI, but even so, I expect that on average even a compiler will succeed to reduce the register spilling when using 32 registers.
The game is deeper than that. Your model is probably about right for the compiler you're using. It shouldn't be - compilers can do better - but it's all a work in progress.
Small scale stuff is you don't usually spill around every call site. One of the calls is the special "return" branch, the other N can probably share some of the register shuffling overhead if you're careful with allocation.
Bigger is that the calling convention is not a constant. Leaf functions can get special cased, but so can non-leaf. Change the pattern of argument to fixed register / stack, change which registers are callee/caller saved. The entry point for calls from outside the current module needs to match the platform ABI you claimed it'll follow but nothing else does.
The inlining theme hints at this. Basic blocks _are_ functions that are likely to have a short list of known call sites, each of which can have the calling convention chosen by the backend, which is what the live in/out of blocks is about. It's not inlining that makes any difference to regalloc, it's being more willing to change the calling convention on each function once you've named it "basic block".
Why is almost no one in this comment thread is willing to face the scenario where the function call has to actually happen, and be an actual function call? The reactions are either "no-no-no-no, the call will be inlined, don't you worry your pretty head" or "well, then the compiler will just use less registers to make less spills" — which precisely agrees with my point that having more registers ain't necessarily all that useful.
> Small scale stuff is you don't usually spill around every call site.
Well duh: it's small, so even just 8 registers is likely enough for it. So again, why bother with cumbersome schemes to extend to 32 registers?
And this problem actually exists, that's why SPARC tried register windows and even crazier schemes on the software side of things had been proposed e.g. [0] — seriously, read this. And it's 30 years old, and IIUC nothing much came out of it so excuse me if I'm somewhat skeptical about "compilers can do better - but it's all a work in progress" claims. Perhaps they already do as best they can for general-purpose CPUs. Good thing we have other kinds processing units readily available nowadays.
This argument doesn’t make sense to me. Generally speaking, having more registers does not result in more spilling, it results in less spilling. Obviously, if you have 100 registers here, there’s no spilling at all. And think through what happens in your example with a 4 register machine or a 1 register machine, all values must spill. You can demonstrate the general principle yourself by limiting the number of registers and then increasing it using the ffixed-reg flags. In CUDA you can set your register count and basically watch the number of spills go up by one every time you take away a register and go down by one every time you add a register.
> Obviously, if you have 100 registers here, there’s no spilling at all.
No, you still need to save/spill all the registers that you use: the call-invariant ones need to be saved at the beginning of the function, the call-clobbered at an inner call site. If your function is a leaf function, only then you can get away with using only call-clobbered registers and not preserving them.
Okay, I see what you’re saying. I was assuming the compiler or programmer knows the call graph, and you’re assuming it’s a function call in the middle of a potentially large call stack with no knowledge of its surroundings. Your assumption is for sure safer and more common for a compiler compiling a function that’s not a leaf and not inlined.
So I can see why it might seem at first glance like having more registers would mean more spilling for a single function. But if your requirement is that you must save/spill all registers used, then isn’t the amount of spilling purely dependent on the function’s number of simultaneous live variables, and not on the number of hardware registers at all? If your machine has fewer general purpose registers than live state footprint in your function, then the amount of function-internal spill and/or remat must go up. You have to spill your own live state in order to compute other necessary live state during the course of the function. More hardware registers means less function-internal spill, but I think under your function call assumptions, the amount of spill has to be constant.
For sure this topic makes it clear why inlining is so important and heavily used, and once you start talking about inlining, having more registers available definitely reduces spill, and this happens often in practice, right? Leaf calls and inlined call stacks and specialization are all a thing that more regs help, so I would expect perf to get better with more registers.
> assuming it’s a function call in the middle of a potentially large call stack with no knowledge of its surroundings.
Most of the decision logic/business logic lives exactly in functions like this, so while I wouldn't claim that 90% of all of the code is like that... it's probably at least 50% or so.
> then isn’t the amount of spilling purely dependent on the function’s number of simultaneous live variables
Yes, and this ties precisely back to my argument: whether or not larger number of GPRs "helps" depends on what kind of code is usually being executed. And most of the code, empirically, doesn't have all that many scalar variables alive simultaneously. And the code that does benefit from more registers (huge unrolled/interleaved computational loops with no function calls or with calls only to intrinsics/inlinable thin wrappers of intrinsics) would benefit even more from using SIMD or even better, being off-loaded to a GPU or the like.
I actually once designed a 256-register fantasy CPU but after playing with it for a while I realised that about 200 of its registers go completely unused, and that's with globals liberally pinned to registers. Which, I guess, explains why Knuth used some idiosyncratic windowing system for his MMIX.
It took me a minute, but yes I completely agree that whether more GPRs helps depends on the code & compiler, and that there’s plenty of code you can’t inline. Re: MMIX Yes! Theoretically it would help if the hardware could dynamically alias registers, and automatically handle spilling when the RF is full. I have heard such a thing physically exists and has been tried, but I don’t know which chip/arch it is (maybe AMD?) nor how well it works. I would bet that it can’t be super efficient with registers, and maybe the complexity doesn’t pay off in practice because it thwarts and undermines inlining.
I recalled there were some new instructions added that greatly help with this. Unfortunately I'm not finding any good _webpages_ that describe the operation generally to give me a good overview / refresher. Everything seems to either directly quote published PDF documents or otherwise not actually present the information in it's effective for end use form. E.G. https://www.felixcloutier.com/x86/ -- However availability is problematic for even slightly older silicon https://en.wikipedia.org/wiki/X86-64
Eh, pretty much nobody uses them (outside of OS kernels?); and mind you, RISC-V with its 32 registers has nothing similar to those, which is why 14-instruction long prologues (adjust sp, save lr and s0 through s12) and epilogues are not that uncommon there.
You can't operate on stack variables without loading them into the registers first, not on RISCs anyway. My main point is that this memory-shuffling traffic is unavoidable in non-leaf functions, so an extremely large amount of available registers doesn't really help them.
No, I don't. I use a common "spill definitely reused call-invariant registers at the prologue, spill call-clobbered registers that need to survive a call at precisely the call site" approach, see the sibling comment for the arithmetic.
Not Tomo’s developer, but my position on the 1 vs. 0 for list-like object indexing goes like…
1) using 0 as the index of the first element in a list-like object ISA holdover from C (most of the earlier languages used either 1 based or flexible base for indexing);
2) in C, 0 is the index due to the manner of C’s array indexing implementation;
3) if holding onto the C semantics (or syntax in some respects) is not an explicit goal of the language, then flexible indexing should be the default (declared at creation point for the list-like object);
4) in flexible default is not appealing to the language designer, and again, maintaining C semantics is not a goal, then 1 based should be the next reasonable default.
For me, when counting things (and most other native English speakers, if not most people in general), the first item counted is object 1. Therefore, 1 should be the index of the beginning of the list-like objects indexing.
I’m not sure how about 0 being ‘None’, but I might find it intuitive if thinking more about it.
Is Pony still an actively developed language? I remember watching several talks while they brought the language up to release, and read several of the accompanying papers. However, I thought with the primary corporate sponsor dropping the language it had gone basically EOL. Which was a pretty large bummer as I was very interested to see how the reference capability model of permissions and control worked at large scale for concurrency control and management (as well as its potential application to other domains).
Dunno, now it feels like the "hot" thing is either manual memory languages like Zig, Odin an Rust or languages with novel type systems like Lean, Koka, Idris, etc... GC'd "systems" languages like Nim, Crystal, Pony, Go, etc... all seem kind of old fashioned now.
Go seems to have some enduring affection and popularity for new projects and companies. I recently felt like a lot of the recent shift was less about GC and more about runtime characteristics (static binaries, lean resource consumption, lack of an in-your-face virtual machine).
It never felt like Nim, Pony, or Crystal were ever that popular that a diminished hype cycle registered as something thematic to me (not that I really intend to disagree with your perspective here).
I was under the impression that parallel and concurrent code was the dominant paradigm for programming tasks currently going in most of the semi-mainstream domains. I am certainly willing to concede that I could just be in a bubble that thinks about and designs for concurrency and parallelism as a first class concern, but it doesn’t seem that way.
I mean one of the large features/touted benefits for Rust is the single mutable XOR multiple immutable semantics explicitly to assist with problems in parallel/concurrent code, all of the OTP languages are built on top of a ridiculously parallel and distributed first ‘VM’. It strikes me as peculiar that these types of languages and ecosystems would be so, apparently, popular if the primary use case of ‘safe’/resilient parallel/concurrent code was not a large concern.
TL;DR — it seems to me that it is less anger from devs at being confused over a Case construct and more an attempt to preemptively soothe any ruffled feathers for devs wanting a traditional Switch.
I think your comment was probably rhetorical, but does address/raise a fairly common issue in designing programming languages. My position on this is that it is less like "WHAT THE ___ is a ------- MATCH STATEMENT?!!! THIS IS SO $%^&@*#%& CONFUSING!! I DON'T KNOW THAT WORD!! I ONLY KNOW SWITCH!!" and instead more like the following (from the language designers POV):
Okay, we want a Case construct in the language, but programmers coming from or preferring imperative syntax and semantics may not like the Case concept. But, they often like Switch, or at least are familiar with it appearing in code, sooooooo: first, we will alter the syntax of the tradition Switch to allow a more comfortable transition to using this functional inspired construct; then second, we wholesale replace the semantics of that Switch with the semantics of a Case. This is underpinned by the assumption the the syntax change is small enough that devs won’t recoil from the new construct, then the larger divergence of semantics will hopefully not produce issues because it is just a small semantic change coated in an most familiar syntax.
Interestingly, the author of TFA seems to be operating under the assumption that the Case construct is an unqualified positive change and sneaking the corresponding semantics into that unfortunate imperative code is a wholly positive goal for the language design.
Without taking a position on the above positivity, I think the maneuvers language designers take while designing syntax and semantics (as exhibited in Swift’s Switch syntax for a Case Expression) is motivated by divergent, and often times strange, priorities and prior assumptions. So, from the 10,000’ view, does enshrining priorities and assumptions, and others like it, as a hard coded facet of the language the right path for languages generally? Should a language seek to be an overall more general framework for programming, leaving a vast majority of the syntax and higher-level semantics to be chosen and instantiated by devs where fit-for-purpose and pros/cons direct its inclusion? Or is the goal for opinionated languages, with or without accompanying sugar to help smooth over differences from other languages, the better path? Is there a ‘happy’ medium where:
1) design goals and forward thinking or experimental syntax/semantics get put in the language as an ‘it’s for your own good’ method for advancing the field as a whole and advancing/optimizing a single dev’s programs in particular;
2) the default position of a language should be as generalized as possible, but with abilities and options for users to specify what advanced, uncommon, or divergent syntax/semantics are utilized in a given program?
We're talking about fallthrough happening by default or not by default. You could call it a "map" construct or a "choose" statement for all I care.
Whether or not you have to write the "case" keyword 10 times is an aesthetic choice.
I don't think this has anything to do with program optimization. On all non-theoretical ISA's I'm aware of, you don't need a JUMP instruction to go to the next instruction. We're debating names.
I'm a Ziguana so my answer to the programming philosophy questions would be that we need a language where the complexity emerges in the code, not in the language itself, and we generally want a shared language that can be read and used by anyone, anywhere. If everyone has their own subset of the language (like C++) then it's not really just one language in practice. If every project contains its own domain specific language, it may be harder for others to read because they have to learn custom languages. That's not to say you should never roll your own domain specific language, or that you should never write a program that generates textual source code, but the vast, vast majority of use cases shouldn't require that.
And, yes, be opinionated. I'm fine with some syntactic sugar that makes common or difficult things have shortcuts to make them easier, but again, if I learned a language, I should generally be able to go read someone's code in that language.
What do you consider "advancing the field as a whole"?
Are you aware of whether Chennowith ever discussed the presence, implied or actual, of more extreme resistance groups/factions operating in the same locations and time periods? I’ve seen some informal work discussing the ‘pressure’ on the incumbent power being supported and made more tenable in comparison to the potential for a more radical approach. I have seen anything widely popularized discussing this outside of ‘How to Blow Up a Pipeline’ which does have some good references and particular examples.
Violent Action only incentivizes the selectorate to not defect. This is something Kuran pointed out decades ago as did Chennowith.
The reality is the only way to affect change is to incentivize elite defection, and that requires organized nonviolent action along with exogenous variables.
I can’t help but be a little depressed by this realization. But to take it a step further, while I think there are some people who are genuinely buying this propaganda, I expect that a chunk of the propaganda aligned side also don’t think there is any point correcting the misleading statements. They benefit from the overall control of their ‘side’ and so just go right along sliding toward the fanatical fringe extreme of their side. On the other ‘side’, many people seem to have decided there is no use attempting to counter message after seeing the failure to move any extremists from their positions (and a failure to get even a milk toast correction from the non fanatics who are aligned). I think that the end result of this pattern is a gradually accelerating move towards the far ends, leaving no one to have any reasonable discourse in the center.
I’m not saying I support the center positions, nor that I don’t support what is often called an extreme position, just that this seems to be a watershed moment globally.
Polarization leaves very little room for reasonable discourse at the poles too. Pure tribalism doesn't care about reason unless that reason is in service of the identity and ideology of the tribe.
What if political discourse was focused on policy not identity and couched in terms of mutual interest instead of party affiliation? There would still be tensions, trade-offs, conflicts and political strategy at play but the discourse would be infinitely more reasonable.
I think this is what we mean when we talk about "center positions": a "value-based realism" that recognizes that society is nothing but the mutual alignment of values and interests. I don't understand why "common sense" has become so unpopular.
> I don't understand why "common sense" has become so unpopular.
IMHO, that's exactly it. You named it. Common sense is actually missing from more and more people. Why that is? I don't know - lack of basic common sense education, family, primary school, too much facebook, tiktok, common sense defined by YT shorts?
It's going to get far worse once the AI generation grows up.
I’m genuinely curious, why is there a transformative requirement for something to be art. I think transformative works can certainly be art, but thanks just a possible characteristic of art. Where does this requirement come from, as in, is it somewhere defined academically, or is this a personal position?
Wikipedia defines it as: Art is a diverse range of cultural activity centered around works utilizing creative or imaginative talents, which are expected to evoke a worthwhile experience,[1] generally through an expression of emotional power, conceptual ideas, technical proficiency, or beauty.[2][3][4]
I like the "evoke a worthwhile experience" idea.
Transformation is a bit ambigious imo. In a certain sense, every experience is at least a little transformative.
I know it wasn’t the whole point of your comment, but I fervently hope the legitimacy of art (of any kind and in any medium) is not conferred by the ‘market’. Plays or shows that end having been seen by under 100 people should still be art (and any recording of them should as well), music made for a very niche audience, games that are played by 10s of people, all of those can be art. A painting made by one person to give to another can be art.
I would prefer to look to the democratization of art as the means and ability for individuals to produce substantial, if small, works at a pace, for an audience, for some reward determined solely by the creator.
At the end of the day, ‘what is art’ and ‘are video games art is a dated sentiment, so I agree, I was just repulsed by the suggestion that the definition/legitimacy of something as art can/should be dictated by ‘The Market’ .
Market was maybe a bad term. I mean more “society at large” and not specifically stuff that makes money.
I am more saying that the idea of caring about “being labeled as art” is not that important anymore. Largely because anyone can make and publish anything nowadays. So a play with 100 viewers is still art, yes, but no one really cares about getting that label.
Thanks for the response. I do like the, largely uncontested, move toward disregarding of the label. It certainly seems to dovetail with a more individualized conception of artistic pursuit that appeals to me.
reply