Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How many of these 200+ undefined behaviours exist in more modern low-level languages, such as Rust and D?


In Rust it's pretty limited. Here is the full list, from the Rustonomicron:

    Dereferencing null or dangling pointers
    Reading uninitialized memory
    Breaking the pointer aliasing rules
    Producing invalid primitive values:
        dangling/null references
        a bool that isn't 0 or 1
        an undefined enum discriminant
        a char outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]
        a non-utf8 str
    Unwinding into another language
    Causing a data race
Note that all of these are inside of unsafe blocks. Besides unsafe blocks, Rust has no undefined behavior, and the compiler will prevent you from doing any of these things.

https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html


> Note that all of these are inside of unsafe blocks. Besides unsafe blocks, Rust has no undefined behavior, and the compiler will prevent you from doing any of these things.

It's true that unsafe is needed to get these problems, but they can also occur outside of unsafe blocks. See an example here: https://gankro.github.io/blah/only-in-rust/#unbound-lifetime...

Your own code need not use "unsafe" at all, but the program may still crash in your code if you called some function that internally does unsafe things to mess up your memory.

EDIT: I should say that the linked code does not crash, it only reads uninitialized memory. It seems to me like the same hole could be used to make things crash, but I don't have a ready-made example.


The important quote from the article you linked is: "But what happens when we throw some unsafe code at the issue?", so the claim that safe Rust doesn't have UB (that isn't considered a bug in the compiler) still stands. If you mix in unsafe code, bad things may happen, regardless of whether you wrote the unsafe yourself or rely on a library.


> the claim that safe Rust doesn't have UB (that isn't considered a bug in the compiler) still stands

Yes!

> If you mix in unsafe code, bad things may happen

Yes!

My point was only that those bad things may also happen after the unsafe block that is their root cause. This is in reply to parent's "all of these [UBs] are inside of unsafe blocks". They aren't. They are caused by something inside unsafe blocks, but they may also materialize after.


See also https://doc.rust-lang.org/nomicon/working-with-unsafe.html

> unsafe does more than pollute a whole function: it pollutes a whole module. Generally, the only bullet-proof way to limit the scope of unsafe code is at the module boundary with privacy.


Not to belabor the point, but...

> the only bullet-proof way to limit the scope of unsafe code is at the module boundary with privacy

I understand how this is meant in the context of that page, and it is true that modules protect against users messing with modules' internal invariants.

But does that also work the other way around? In the example in https://gankro.github.io/blah/only-in-rust/#unbound-lifetime... it's not the caller messing with the callee's state, it's the callee messing up the caller's state. Do modules help at all here? That is, would the function

    fn foo<'a>(input: *const u32) -> &'a u32 {
        unsafe {
            return &*input
        }
    }
become less dangerous, or maybe impossible to call, when put into a module?


I would argue that this is not the caller making the mistake; it's this function. That is, since this function is safe, any safe code should be able to call it and not generate UB. It's not the caller's fault here, it's the incorrect implementation.


> It's not the caller's fault here, it's the incorrect implementation.

I agree. But then this shows that it's possible to write "safe" Rust code that only calls "safe" external code and still (to a first approximation) have the possibility of undefined behavior showing up at any point. In other words, Rust's famous static, compiler-enforced guarantees are not guarantees at all, more like firm promises.

Nothing wrong with that; it's easy to write memory-corrupting code in other safe-by-default languages like Haskell or OCaml as well. But it seems like Rust's marketing materials do try to suggest otherwise, and many people get wrong impressions (look at the first post in this thread, and the other comments on this article saying that Rust's "safe" code is 100% free from undefined behavior).


I mean, safe code is. Again, it's the unsafe that's at fault here.

Your point about other languages is exactly what I was going to say; unsafe is like an FFI layer. Nobody says that Ruby isn't memory safe because it can call into C code, and if someone messes up the C, well, it's at fault. "It's memory safe except for FFI" is a mouthful, and so people generally let the exceptions slide. Same with Rust.


An important point: if you provide a safe wrapper around some unsafe code it's up to the programmer to ensure the above is true.


>, Rust has no undefined behavior,

Correct me if I'm wrong but I don't think such a strong universal statement can be true (even outside unsafe blocks) because LLVM has corner cases of "undefined behavior". (And since Rust is relies on LLVM, ...)

Maybe it's more accurate to say that Rust minimizes undefined behavior as a design goal, or it doesn't have intentional undefined behavior.


If safe Rust code does not invoke any of the UB corners of LLVM then Rust can claim to be free from UB. I don't know enough to guarantee or verify it, but it's my current understanding that this is the case.


>If safe Rust code does not invoke any of the UB corners of LLVM then Rust can claim to be free from UB.

Sure and I believe that adding a conditional qualification such as "if one does not invoke UB of LLVM" restates my point: one can't make a universal statement that "safe Rust has zero undefined behavior."

E.g., as of this writing, the following "safe Rust" UB issue (3+ years ago) last had comments 21 days ago and I believe it's still open:

https://github.com/rust-lang/rust/issues/10184


If somebody asks you what "cat" does, do you say "It copies its input to its output, unless there's a bug in cat or the C compiler that compiled it or cosmic rays hit the program on disk"?


>If somebody asks you what "cat" does,

Yes I get what you're saying but I'll try to emphasize again that I'm not trying to play semantic games to irritate everyone. (Yes, we can play word games such as "a tank is an armored military vehicle -- unless it is just a cardboard facade to fool Germans that the Allies are invading a different a part of France's coastline or acting as a movie prop for special effects work.") Every "thing" can be defined with endless cumbersome qualifiers that nobody actually says in real life.

That said, I felt the context in this thread warranted a different threshold to qualify Rust's UB because one example of John Regehr 200 UB bullet points is:

  - Demotion of one real floating type to another produces a value outside the range that can be represented (6.3.1.5).
The Rust UB github issue is not exactly the same cast but similar in spirit. Therefore, justinpombrio's comment that "Besides unsafe blocks, Rust has no undefined behavior," doesn't look accurate to me in the context of this UB thread rather than just casual speech about Rust. I can't read the mind of the poster asking the question (chrisdew) to know exactly what his scope of "UB" included but I think the reality of unintentional UB in Rust is relevant in this particular conversation.


I agree. This comes up a lot when discussion C/C++ - is it the compiler's fault, the developers, etc? The reality is it's irrelevant. Rust-the-language is safe but no one uses rust-the-language they use rustc. The end result is that it is possible to have memory unsafe rust code without unsafe blocks.

Rust developers should be aware of this - they're almost always incredibly trivial patterns to avoid, but only if you know about them.


>a conditional qualification such as "if one does not invoke UB of LLVM"

A conditional qualification which is intended to be unconditionally true of safe Rust code, outside bugs in the compiler. The universal statement is totally possible, because your conditional is equivalent to saying "if you write valid code".


>which is _intended_ to be unconditionally true of safe Rust code,

I emphasized "intended" because it seems like we're talking past each other.

You: re-emphasizing Rust's specified design goal.

Me: emphasizing the current state of Rust compiler as reality which makes the statement "safe Rust has no undefined behavior" as not true.

(In other words, I emphasize the unintentional UB whereas you do not.)

>, because your conditional is equivalent to saying "if you write valid code".

If you look at the github issue, "1.04E+17 as u8" is valid safe Rust code which invokes UB.


They didn't even existed in stone age low-level languages like ESPOL, NEWP, PL/8, PL/S, Mesa, Modula-2.

And I am only mentioning the most known ones during the 10 years before C was born until early 80's.

Sure some of them also had safety issues like use-after-free, but not in the same amount as C.


A lot of undefined behaviour on this list is quite specific to the C programming language, so it would be like comparing apples to oranges. Though, one major impression I think you will get after going through this list is that there is no good reason for making those things an undefined behaviour in the first place. For example, "The result of the preprocessing operator ## is not a valid preprocessing token", really?

Regarding those things that actually matter for programmer, following would be well-behaved in Rust (including unsafe blocks): conversion between types, integer arithmetic, pointer arithmetic (there are generally two variants of operations, one that essentially treats pointers as unsigned integers, and another one that behaves like in C with more opportunities for optimization). On the other side of the coin, in Rust mutable references cannot be aliased.


D is memory unsafe by default so it probably has a lot of them.


D runs in @system by default but there are @safe which enforces safety and @trusted (like unsafe in Rust). Those are all attributes you can write libraries in like @nogc etc.


LuaJIT is another modern low-level language. It has some remarkable undefined behavior like the evaluation order for function arguments. Scares me a bit.

https://github.com/LuaJIT/LuaJIT/issues/238


The same is true in C/C++: Function argument evaluation order is unspecified. BUT: that doesn't mean this is undefined behavior, it is just unspecified.

unspecified != undefined behavior

See this Stack overflow answer for an explanation: https://stackoverflow.com/questions/2397984/undefined-unspec...


This would not be what is called undefined behavior per the C standard, but unspecified or implementation-defined behavior. The bad thing about "undefined behavior" is that the implementation is basically allowed to do whatever it wants, while unspecified and implementation-defined behavior still has sane semantics. Unspecified behavior allows the language implementation to choose any one from several possible implementations, implementation-defined behavior requires the language implementation to define the semantics itself.


lua is no way a "low level" language


Are you sure that undefined, instead of merely unspecified?

Ie function arguments are still evaluated in some (possibly changing) order. C++ undefined behavior allows to eg format your hard disk or launch the nukes.


None exist in (safe) Rust.


Sometimes you hit undefined behavior in LLVM when compiling safe Rust, but that is considered a bug in the compiler. https://github.com/rust-lang/rust/issues/10184


By the same logic, you could do something in C that isn't considered undefined behavior but acts in an undefined way when compiled with Clang




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: