Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The "erroneous behavior" redefinition for reads of uninitialized variables is really interesting: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...

It does have a runtime cost. There's an attribute to force undefined behavior on read again and avoid the cost:

    int x [[indeterminate]];
    std::cin >> x;


D initializes all variables. If you don't provide an initializer, the compiler inserts the default initializer for it.

But if you really, really want to leave it uninitialized, write:

    int x = void;
where you're not writing that by accident.


> If you don't provide an initializer, the compiler inserts the default initializer for it.

This requires that there is a default. Several modern languages (such as Go) insist on this, it means now your types don't even model reality in this very fundamental way. Who is a person's default spouse? Even where you can imagine a default it's sometimes undesirable to have one, for example we already live in a society where somebody decided default gender is male - and it might look too much like real data, default birthday being 1 January also matches hundreds of thousands of Americans...

The most likely place you go after "Everything has a default" is the billion dollar mistake because you're inclined to just incorporate "or it's default invalid" into the type definition to get your default, and when you do that everywhere needs to have "check it's valid" code added, even if we already just checked that a moment ago.


I don’t care that much about everything having a default (although it’s nice), but if a language insists on a default value for every type for safety, can’t you just use std::optional?


I can't tell if you imagine std::optional is a value (it is not) or if you know it's a templated type but you imagine that somehow it would be OK to redefine all programs so that every type is std::optional<T> of that type instead so as to simplify initialization.

Either way no, that can't work.


> Either way no, that can't work.

Kotlin has explicit nullable types. Rust has no null, but has option types. Both languages work fine.

I think your point was that neither approach could reasonably be retrofitted to C++, do I have that right?


Rust works fine because types are not required to have a default, if you want your type to have a default you implement Rust's Default trait. Stuff which only makes sense if there's a default just depends on that trait and so won't be available, for example you can't core::mem::take your custom Goose type which has no default because core::mem::take<T: Default>. In Rust if we say there's a variable of type Goose and don't initialize it, it's not initialized, and if the compiler can't see why it's initialized before it's used the program is rejected as nonsense because Rust is a safe language and that's an unsafe outcome.

I don't write Kotlin, so I can't speak to the details there.

C++ like Rust does not require that types have a default. In C++ the way you provide a "default" is usually via a zero argument constructor, since the compiler can just call that wherever you asked for an instance of that type and there's no requirement to write such a constructor, or indeed to provide any public constructor at all. So "just use the default" could not work in C++ as it exists today yes.

The other reason C++ can't do anything like this is that it makes a newer C++ with this behaviour behave differently despite no syntactical change. Rust is OK with that, because it has the Edition system to differentiate Rust 2015 code which means one thing from Rust 2024 code which means something else despite having the same text, but in C++ they do not have anything like that, it's not rare for somebody's C++ 17 code to get compiled in C++ 23 and people expect that to work (it doesn't always work but that's what they expect).


I often see arguments like yours. I reject them wholeheartedly. Your argument is pro-poor-design. I tell you: design your software better. Design your software so that you can't have undefined behavior. It's harder, yes. LLMs suck at it, yes. But building well-designed software is a significant part of being a better engineer.


It is easier to design the software so that you don't have confusing behavior when you're not required to include behaviors you don't want. Most things do not need to be nullable. Requiring all things to have a zero value, even when they do not have one, makes it harder to be correct by construction, not easier.


> It is easier to design the software so that you don't have confusing behavior when you're not required to include behaviors you don't want.

It can be easier, but not always. But it is almost always a better design.

Redesigning software so that whole classes of problems simply can't exist is absolutely better than software that needs to handle all kinds of problems. Many of those problems might not even ever happen in real circumstances!

> Requiring all things to have a zero value, even when they do not have one, makes it harder to be correct by construction, not easier.

Don't require a "zero value". Require a "correctly-constructed" value. Sometimes zero is correctly constructed, sometimes not.


I agree that you want to require a correctly-constructed value. The entire point I'm making is that in languages with zero values, this is hard, because that zero value may not be valid for your domain.

Languages with pervasive nullability effectively gives everything a zero value.


> Requiring all things to have a zero value

D default initializes floats to NaN.


That is a way better syntax. I wonder why C++ didn't adopt it.


Because you can't adopt that syntax after the fact. there is 30 years of C++ in the real world, initializing everything by default unless you opt-in will break some performance critical code that should not initialize everything (until it is updated manually - it has to be manual because tools are not smart enough to know where something was intentionally not initialized 100% of the time)

Thus the current erroneous. It means this isn't a bug (compilers used to optimized out code paths where an uninitialized value is read and this did cause real world bugs when it doesn't matter what value is read). It also means the compiler is free to put whatever value they want there - one of the goals was the various sanitizers that check for using uninitialized values need to still work - the vast majority of the time when an uninitialized value is read that is a bug in the code.

There are a lot of situations where a compiler cannot tell if a variable would be used uninitialized, so we can't rely on compiler warnings (it sometimes needs solving the halting problem).


> There are a lot of situations where a compiler cannot tell if a variable would be used uninitialized, so we can't rely on compiler warnings (it sometimes needs solving the halting problem).

It's an explicit choice in C++ to always accept correct programs (the alternative being to always reject incorrect programs†). The committee does not have to stick by this bad decision in each C++ version, of course they aren't likely to stop making the same bad choice, but it is possible to do so.

If you're allowed to take the other side, you can of course (Rust and several other languages do this) reject programs where the compiler isn't satisfied that you definitely always initialize the variable before it's value is needed. Most obviously (but it's pretty annoying, so Rust does not do this) you could insist on the initialization as part of the variable definition in the actual syntax.

† You can't have both, by Rice's Theorem, Henry Rice got his PhD for figuring out how to prove this, last century, long before C++ was conceived. So you must pick, one or the other.


> Because you can't adopt that syntax after the fact.

The `= void` syntax can be because it is currently not valid.

D (unlike C++) always has a default initializer, but does not allow a default constructor. This is sometimes controversial, but it heads off all kinds of problems.

The default initializer for floating point values is NaN. (And for chars it is 0xFF.) The point of this is for the value to not "happen" to work.)


> there is 30 years of C++ in the real world, initializing everything by default unless you opt-in will break some performance critical code that should not initialize everything

...But the change to EB in this case does initialize everything by default?


No it doesn't. It says the value is unspecified but it exists. Sometimes some compilers did initialize everything (this was common in debug builds) before. Some of them will in the future, but most won't do anything difference.

The only difference is some optimizer used to eliminate code paths where they could prove that path would read an uninitialized variable - causing a lot of weird bugs in the real world.


> It says the value is unspecified but it exists.

The precise value is not specified, but whatever value is picked also has to be something that isn't tied to the state of the program so some kind of initialization needs to take place.

Furthermore, the proposal explicitly states that (some) variables are initialized by default:

> Default-initialization of an automatic-storage object initializes the object with a fixed value defined by the implementation

> The automatic storage for an automatic variable is always fully initialized, which has potential performance implications.

> The automatic storage for an automatic variable is always fully initialized, which has potential performance implications.



I don't understand its claim of a "self-documentation trap".

I'm surprised the "= void;" wasn't discussed. People liked it immediately in D, and other alternatives were not proposed.


The syntax is probably fine but I feel that the default kind of sucks; default initialization has mostly fallen out of favor these days.


On a quick read of the paper, I see two surprising things:

1. If there’s no initializer and various conditions are met, then “the bytes have erroneous values, where each value is determined by the implementation independently of the state of the program.

What does “independently” mean? Are we talking about all zeros? Is the implementation not permitted to use whatever arbitrary value was in memory? Why not?

2. What’s up with [[indeterminate]]? I would expect “indeterminate” to mean that the variable has a value that happens to be arbitrary (and may contain sensitive data, etc), not that it turns back into actual UB.


> What does “independently” mean?

It can pick whatever value it wants and doesn't have to care what the program is doing.

Also the value has to stay the same until it's 'replaced'.

> Are we talking about all zeros?

It might be, but probably won't be. What makes you bring up all zeroes?

> Is the implementation not permitted to use whatever arbitrary value was in memory? Why not?

(Edit: probably wrong, also affects other things I said) It can. What suggests it wouldn't be able to?

> 2. What’s up with [[indeterminate]]? I would expect “indeterminate” to mean that the variable has a value that happens to be arbitrary (and may contain sensitive data, etc), not that it turns back into actual UB.

"has a value that happens to be arbitary" would be the default without [[indeterminate]]. Well, it can also error out if the compiler wants to do that.


> It can. What suggests it wouldn't be able to?

"Whatever value was in memory" would be depending on the (former?) state of the program, wouldn't it?


If that's what they're going for, it's way too much weight to hang on a single vague word like that. Trying to define "state of the program" in a detailed way sounds nightmarish. Let's say I'm the implementation. If I go get fresh (but not zeroed) memory from the OS to put my stack on, the garbage in there isn't state of the program, right? If I then run a function and the function exits, is the garbage now state of the program, or is it outside the state of the program? If I want a fixed init value per address, is that allowed as a hardening feature or disallowed as being based on allocation patterns? Does the as-if rule apply, so I'm fine if the program can't know for sure where I got my arbitrary byte values from?

And would that mean there's still no way to say "Don't waste time initializing it, but don't do any UB shenanigans either. (Basically, pretend it was initialized by a random number generator.)"


> Let's say I'm the implementation. If I go get fresh (but not zeroed) memory from the OS to put my stack on, the garbage in there isn't state of the program, right?

I'd argue that once you get the memory it's now part of the state of your program, which precludes it from being involved in whatever value you end up reading from the variable(s) corresponding to that memory.

> If I want a fixed init value per address, is that allowed as a hardening feature or disallowed as being based on allocation patterns?

I'd guess that that specific implementation would be disallowed, but as I'm an internet nobody I'd take that with an appropriately-sized grain of salt.

> And would that mean there's still no way to say "Don't waste time initializing it, but don't do any UB shenanigans either. (Basically, pretend it was initialized by a random number generator.)"

I feel like you'd need something like LLVM's `freeze` intrinsic for that kind of functionality.


> What does “independently” mean?

It means what it says on the tin. Whatever value ends up being used must not depend on the state of the program.

> Are we talking about all zeros?

All zeros is an option, but the intent is to allow the implementation to pick other values as it sees fit:

> Note that we do not want to mandate that the specific value actually be zero (like P2723R1 does), since we consider it valuable to allow implementations to use different “poison” values in different build modes. Different choices are conceivable here. A fixed value is more predictable, but also prevents useful debugging hints, and poses a greater risk of being deliberately relied upon by programmers.

> Is the implementation not permitted to use whatever arbitrary value was in memory?

No, because the value in such a case can depend on the state of the program.

> Why not?

Doing so would defeat the purpose of the change, which is to turn nasal-demons-on-mistake into something with less dire consequences:

> In other words, it is still an "wrong" to read an uninitialized value, but if you do read it and the implementation does not otherwise stop you, you get some specific value. In general, implementations must exhibit the defined behaviour, at least up until a diagnostic is issued (if ever). There is no risk of running into the consequences associated with undefined behaviour (e.g. executing instructions not reflected in the source code, time-travel optimisations) when executing erroneous behaviour.

> What’s up with [[indeterminate]]?

The idea is to provide a way to opt into the old full-UB behavior if you can't afford the cost of the new behavior.

> I would expect “indeterminate” to mean that the variable has a value that happens to be arbitrary (and may contain sensitive data, etc), not that it turns back into actual UB.

I believe the spelling matches how the term was used in previous standards. For example, from the C++23 standard [0] (italics in original):

> When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.

[0]: https://open-std.org/JTC1/SC22/WG21/docs/papers/2023/n4950.p...


> Doing so would defeat the purpose of the change, which is to turn nasal-demons-on-mistake into something with less dire consequences

What nasal demons?

UB is permitted to format your disk, execute arbitrary code, etc. But there’s lots of room between deterministic values and UB. For example, taking a value that does depend on the previous state of the program and calling it the “erroneous” value would give a non-UB, won’t format your hard disk solution. And it even makes quite a lot of performance sense: the value that was already in the register or at that address in memory is available for free! The difference from C++23 would be that using that value would merely be erroneous and not UB.

And I think the word “indeterminate” should have been reserved for that sort of behavior.


> What nasal demons?

Those that result from the pre-C++26 behavior where use of an indeterminate value is UB.

> But there’s lots of room between deterministic values and UB.

That's a fair point. I do think I made a mistake in how I represented the authors' decision, as it seems the authors intentionally wanted the predictability of fixed values (italics added):

> Reading an uninitialized value is never intended and a definitive sign that the code is not written correctly and needs to be fixed. At the same time, we do give this code well-defined behaviour, and if the situation has not been diagnosed, we want the program to be stable and predictable. This is what we call erroneous behaviour.

> And I think the word “indeterminate” should have been reserved for that sort of behavior.

Perhaps, but that'd be a departure from how the word has been/is used in the standard so there would probably be some resistance against redefining it.


Hm, I wonder if this will be a compiler flag too, probably yes, since some projects would prefer to init all variables by hand anyway.


I envy the person that will walk into a c++ codebase and see "[[indeterminate]]" on some place. And then they will need to absolutely waste their time searching and reading what "[[indeterminate]]" means. Or over time they will just learn to ignore this crap and mentally filter it out when looking at code.

Just like when I was learning rust and trying to read some http code but it was impossible because each function had 5 generics and 2 traits.


What is non-obvious about “[[indeterminate]]”? That terminology has been used throughout the standards in exactly this context for ages. This just makes it explicit instead of implicit so that the compiler can know your intent.


I know roughly what indeterminate means in english but it is not obvious to me when I see something like this in code.

So I would have to look it up and be very careful about it since I can break something easily in C++.

This just makes things more difficult from the perspective of using/learning the language.

Similar problem with "unsequenced" and "reproducible" attributes added in c. It sounded cool after I took the time to learn exactly (/s) what it means. But it is not worth the time to learn it. And it is not worth the cognitive load it will put on people that will read the code later imo.


I wonder if you're fine with const, constexpr and volatile also being things. I mean, "const" really doesn't mean what one would naively think (that's what "constexpr" is actually for) and the semantics of "volatile" are also widely misunderstood.


Nope, not only is C++ const not a constant, C++ constexpr isn't a constant either, and C++ constinit isn't a constant, C++ consteval is closest, but it's only available for functions.

    const int a = 10; // Just an immutable variable named a
    constexpr int b = 20; // Still an immutable variable named b
    static constinit int c = 30; // Now it isn't even immutable
For functions const says this function promises it doesn't change things, constexpr says this function is shiny and modern and has no other real meaning (hence "constexpr all the things" memes, you might as well) but consteval does mean that we're promising this must always be evaluated at compile time, so the evaluation is frozen by runtime, however only a function can have this label.

Volatile is a mess because what you actually want are the volatile intrinsics, indeed you might want more (or fewer) depending on the target. If your target can do single bit hardware writes it'd be nice to provide an intrinsic for that, rather than hoping you can write in code REG |= 0x40 and have that write a single bit... which on platforms which do not have this single bit write feature that's going to compile to an unsynchronized read-modify-write which may cause problems. However instead of having intrinsics C's volatile was hacked into the type system instead and C++ tries to keep that.


groans See, and that's why I'm personally fine with [[indeterminate]], etc: all of this is already a finely-splitted hairy mess and I'd rather not see even more keywords introduced if we can just use attributes instead.

And yeah, it would probably be nice to also have some sane intrinsics to provide memory_order_consume semantics... but what can you do.


Const, constexpr etc. Are mandatory to understand at this point. That situation doesn’t justify adding more things imo


Consume is dead. Long live acquire!

But seriously, it is interesting how C++ is completely abandoning the concept. My handwavy understanding is that on some more specialized hardware acquire is substantially more expensive than consume.


If by "more specialized hardware" you mean "everything that is not x86". Its main intended use is (was?) for chained loads and rcu_dereference(), where hardware does not require an explicit memory fence between loads like

    ldr     x8, [x8]    # load a pointer from memory
    ldr     x1, 16[x8]  # load a field through that pointer
or turning the second load into "ldar" — there is quite a visible data dependency between two registers. But compilers usually puts a barrier there anyway.


I'm not saying it doesn't matter. It just clearly doesn't matter enough on modern common non-tso CPUs enough to motivate anybody to add the compiler support. The history of the whole thing is very interesting.


As I understand it, it's the other way around: for x86 (which is TSO) the fence doesn't really matter because the "normal" load is already slowed down, but for ARM it does matter quite a bit, see e.g. [0] (admittedly old blog post) slightly lower on the page. But perhaps we did get to the point where even ARM CPUs and surrounding memory is already performant enough so that spurious fences aren't as noticeable.

[0] https://preshing.com/20140709/the-purpose-of-memory_order_co...


arm64 added a load acquire instruction which I think it is fast enough on actual hardware that might not be worth bothering with consume. If it isn't, then load-relaxed plus atomic_signal_fence might be your best bet. Good luck!


spot on... that difference between evaluation and storage is exactly why C++ is so hard to keep in my head

I thought constexpr was a hard physical constant, but in reality it's a weird hybrid

this visualisation helped me to wrap my head around it - https://vectree.io/c/c-constness-and-evaluation-qualifiers


I mean there's a sibling comment that literally says that the word was chosen to be mysterious and make people look up what it means




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: