Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Union types ('enum types') would be complicated in Go (utcc.utoronto.ca)
28 points by misonic on Dec 7, 2024 | hide | past | favorite | 50 comments


No it wouldn't, as there are garbage collected systems programming languages with them, already 40 years ago, but as usual in Go, we ignore computing history.

Additionally, just having actual enumerations, Pascal / Algo style, not ML style, would already be an improvement over the iota/const hack.


The author of this blog is an internet commenter, not a representative of Go. I don’t see how such a post justifies saying that Go ignores computing history.


This post has nothing to do with Go ignoring computing history. The authors/maintainers of Go can do that just fine on their own.


Past history of most folks on Go community, including this post essay about Go's type system and related "difficulty" of implementation.


I think this is mistaken. Go already has a way to represent 'open' union types (interfaces), so all of these runtime problems have already been solved. What's missing is just the type system support to do exhaustive matching on the members of the union. With the addition of generics, 'all' that would be necessary is to make the following a legal variable definition:

    var foo interface {
        struct { A int } | struct { B string }
    }
It currently fails with the following error:

"cannot use type interface{struct{A int} | struct{B string}} outside a type constraint: interface contains type constraints"


Interfaces aren't bit-packed and they force storing all values as a separate allocation that the interface contains a pointer to (escape analysis may allow this separate value to be on the stack, along with the interface itself). I believe that Go used to have an optimization where values that fit in a pointer were stored directly in the interface value, but abandoned it, perhaps partly because of the GC 'is it a pointer or not' issue. In my view, some of what people want union types for is exactly efficient bit-packing that uses little or no additional storage, and they'd be unhappy with a 'union values are just interface values' implementation.

(I'm the author of the linked to article.)


A separate allocation is not forced. The implementation could allocate a block of memory large enough to hold the two pointers for the interface value together with the largest of the types that implements the interface. (You can't do that with an open interface because there's no upper bound, but the idea here is to let you define closed interfaces.)

In cases where there is a lot of variance in the size of the different interface implementations, separate allocations could actually be more memory efficient than a tagged union. In any case, I'm not sure that memory efficiency is the main reason that people miss Rust-style enums in Go.


The problem with allocating bit-packed storage is that then you are into the issue where types don't agree on where any pointers are. Interface values solve this today because they are always mono-typed (an interface value always stores two pointers), so the runtime is never forced to know the current pointer-containing shape of a specific interface value. And the values that interface values 'contain' are also always a fixed type, so they can be allocated and maintained with existing GC mechanisms (including special allocation pools for objects without pointers and etc etc).

I agree with you about the overall motivation for Rust-style enums. I just think it's surprisingly complex to get even the memory efficiency advantages, never mind anything more ambitious.


> The problem with allocating bit-packed storage is that then you are into the issue where types don't agree on where any pointers are.

The solution to this should be trivial. You just have to extend the gcshape concept to account for the enum discriminator.


The bigger problem is mutability. Any pointers into the bit-packed enum storage become invalid as soon as you change its type. To solve this you can either prohibit pointers into bit-packed enum storage, which is very limiting, or introduce immutability into the language. Immutability is particularly difficult to add to go, where default zero-values emerge in unexpected places (such as the spare capacity of slices and the default state of named return values)


I was saying that you can have a single allocation without using bit packing.

I'm not sure what you're referring to with 'anything more ambitious'.


The problem with that is that in Go, I need to be able to put methods on those things, for reasons possibly unrelated to the interface in question. For that they need to be named. For that you might as well do what has worked since Go 1.0 and just put an unexported method in the interface and declare several instances of that interface in your package.

Honestly interfaces with unexported methods are 90%+ of what people want. It's just not spelled the way they expect. And if you're not going to be happy except at absolutely 100%, a position I can and do respect, there's no point waiting for Go to get any better because I can guarantee you no Go proposal for sum types will fix that you will be forced to have a "nil" value in the sum type, so there's no point in waiting.


You might want to layer some sugar on top to enable naming of the variants. I was just trying to keep my example as close as possible to valid Go code.

Even if you have to manually define separate named structs, you still have the benefit of exhaustivity checking in type switches. That's arguably the other 10% that people want.


The zero would obviously be the zero value of one of the constituents. Coproducts in most of Go are easy: it's interfaces that make it hard.


> The zero would obviously be the zero value of one of the constituents.

If interfaces are used for union types then the obvious zero is nil, not any constituent. Nil is the zero value of interfaces.

It’s perfectly consistent and in line with the rest of the langage.


It's not clear to me that that would satisfy the categorical properties of coproduct that are wanted.


Nobody cares about the categorical properties of the coproduct. But in any case, there's no theoretical issue here, since it would be just as if every enum had a 'Nil' variant implicitly defined.


You also want to support having the same type on both sides:

    var foo interface {
        struct { A string } | struct { B string }
    }
Eg in Rust that would be 'Result<String, String>', where your success happens to be a String and your errors happens to be a String error message.


Forget Result, just allow the type system to express non-nullable object references. Use the same layout, just let the compiler know when something is guaranteed to exist and force null-checking when it isn't

This doesn't cover everything people might want to do with unions, but it covers the billion-dollar mistake and doesn't run against the grain of the entire language (as far as I know)


> doesn't run against the grain of the entire language

Not an expert, but my gut says maybe it runs against zero values? As in, "what's the zero value for a non-nullable reference?" Maybe the answer is something like "you can only use this type for parameters", but that seems very limiting.


Half of the language is already non-nullable and is accomplished by allowing for zero values. Non pointer variables are guaranteed to be never nil.

What is missing is the ability to have pointer variables and have the compiler ensure that it will be never nil. I believe this was a design choice, not some technical limitation.


Like the sibling comment seems to be saying: a non-nil pointer would have to be set to some real (non-nil) pointer value anyway. So having a zero value does not seem to apply?


I've been using `Null[T any] struct { V T, Valid bool }` for this, as the pattern comes from database/sql. Works fine.


The important detail that has bogged down almost every union type discussion is "zero values". What would be the zero value of a union type? If you've written Go you know the entire language is built around zero values, disabling it for some types is not an option.


You could use the first entry, or require an explicit annotation. That doesn't seem like a big issue.


You would be wrong. Any use annotations complicates the language and compiler. As for first entry, this has been discussed many times without ever reaching any consensus, see - https://github.com/golang/go/issues/19412


Couldn't it be just the first enum item? Zero values are somewhat arbitrary anyway.


I don't think this is a good idea. Because zero value changes when you reorder the fields.


Why would it be just the first enum item? How do you even determine how the enum is ordered?


Source order, presumably?


That's what eg Haskell does, eg when you derive comparisons for sum tyes, they compare by source order.


As does Pascal, Modula-2, Ada, among plenty of others.


> At one level we easily do something that looks like a Result type in Go, especially now that we have generics. You make a generic struct that has private fields for an error, a value of type T, and a flag that says which is valid, and then give it some methods to set and get values and ask it which it currently contains. If you ask for a sort of value that's not valid, it panics. However, this struct necessarily has space for three fields, where the Rust enums (and generally union types) act more like C unions, only needing space for the largest type possible in them and sometimes a marker of what type is in the union right now.

That seems better than not having algebraic data types at all.


This is sorely needed to simplify error handling and getting rid of nil pointers panics. Would love to see a linter written for Go after something like this is created to ensure absolutely no naked pointer is ever returned.


I have never encountered a nil pointer panic in Go in ~5 years, after the first few months learning to actually use the language.

It's basically a non-issue IMO. Just stop explicitly ignoring the errors returned from constructors.


I've been using the Go full time since about 2013 and it's a massive issue, and has woken me and my teammates up at night. Specially, when junior engineers are involved. Why hope to catch these issues with code review if the type system/compiler can do it for you instead?

Some of the most common areas that infest the code with nullable pointer types are when you have to deal with de-serializing data a lot. This is due to lack of a common built-in Optional type (I know you can define one easily, but you can't force libraries that you rely on to use that type).

The best we have now is https://github.com/uber-go/nilaway and it's improving with time. It does static analysis, but it has a very difficult job to do, so right now it's super slow to run, and is prone to having false-positives.


> One core requirement for this is what Rust calls an Enum and what is broadly known as a Union type

Union types and enum types are not the same thing, and this misunderstanding invalidates the entire article. An enum type includes a marker which indicates which value it contains. The garbage collector would be able to read this tag value and know.


I don't know what to tell you, there are just multiple ways to use the same words.

What Rust calls enums are called tagged unions in most other languages. Enum usually refers to a simple collection of named values, what Rust calls fieldless enums or unit-only enums.


> are called tagged unions in most other languages

Not sure about that. Swift also calls them enums, in Java and Kotlin (and maybe Scala? I forget) they're sealed classes/interfaces and in PL theory and many typed FP languages they're called "sum types".


Variant records in Pascal, and related linage language.

Enumerations are a sequence of values.


The "tagged" part is important. They are in fact not called "unions" in most languages, only C-derived languages. And in C, unions are untagged. Only later extensions added tagged union types. So I would contend that unqualified "union type" means an untagged union.


It sounds like you don't have exposure to other languages. C defaults to untagged unions but there are plenty of languages where a union is always tagged.

To put it another way, a Rust `enum` is not the same as C's untagged unions but it is still a union.


The article says the garbage collector can't know which variant the union type is. I don't know how to interpret that other than an (incorrect) assumption of untagged unions.


It is written a little confusingly but that's not what they're saying. They were saying you can't implement a union in user code with the current garbage collector because there's no way to tell it which variant your union is:

> let's ask why we can't implement such a union type today using Go's unsafe package to perform suitable manipulation of a suitable memory region.

They are aware that you could add union types to Go - their point is that it would require garbage collector modifications which may be difficult:

> The corollary to all of this is that adding union types to Go as a language feature wouldn't be merely a modest change in the compiler. It would also require a bunch of work in how such types interact with garbage collection, Go's memory allocation systems (which in the normal Go toolchain allocate things with pointers into separate memory arenas than things without them), and likely other places in the runtime.


Wouldn't an enum in go not only be different in what the reflections API sees behind the scenes?

In the sense that it first points to the type, and then to the value, similar to how other newer types are already implemented?

type whatever enum {

  value1 iota

  value2
}

enum would just be a similar shorthand to uint or whatever you want to identify it uniquely.

I don't see much problem implementing this apart from the methods in the GC that have to be touched to trace the object tree correctly.

The comparison handling must be touched in either case, so I don't think an additional pointer to the enum's definition there counts as much work.


Meaning of union and enum changes based on which programming language terminology you use. Enum of Rust is tagged union of C with compiler support. Enum in C is just basic enum of Rust without any fields. The list goes on like this.


Yes, the Go garbage collector would need to support some new memory layouts to make certain kinds of unions efficient. A union between a double, a pointer, and small integer types (as done in languages like JavaScript) might be a good start?


Here is a fairly efficient version of a simple union like that I did a while back https://github.com/tigrannajaryan/govariant


I've thought about this while working with Go more frequently over the past few months.

Metadata. The same way shapes of functions let Go know which generic 'profile' to use for a given thing are metadata. Just like when a reference to something is taken that type is compile time metadata.

Structure. The precise layout of structures and other data fields is also compile time metadata. They don't even need to remain the same between versions of go, or even builds if somehow they're randomized. That isn't how programmer's think (at least any who were also trained in assembly???). When I lay out a struct I do expect undersized fields to get padded, but I expect every field in order, and I'd prefer some way of forcing the issue for precise padding.

However, 'union' of types is just syntax sugar. Give the programmer the above basics and add one more: builtin.*reshape()*. reshape() would allow any similarly shaped structures to replace the type of the reshaped item. E.G. reshape({x, y, z uint64},{x, y, z int64}, A, B) would convert a 192bit chunk of 3 ints of one type to the other. It could also convert anything else similarly.

That's a trivial example, what about some private structure from a library? I'd think the unsafe package's version should allow violation of the private field space, but the normal safe version might force the unexported fields to 'pad' (inaccessible) space. I don't think this would alter garbage collection, as that process likely has to keep it's own track of regions of memory and places that point within them. That's runtime (maybe compiler time sometimes?) metadata which reshape() would have to work with.

Generics even _sort_ of do this already when prefixed with ~type in the list of allowed types; the compiler's allowed to use the passed thing as that type of value and the return it back the same as input... and I really don't see why a reshape function couldn't do the same thing.

There is something else though: reshape() likely needs to consume / claim the resource, since it wouldn't initialize anything. So maybe it needs to return the recast value to be assigned or passed somewhere and further invalidate usage of the variable past that use. Alternately it could take a single value variable and modify the type as part of it's call (also providing the value as it's return would be useful sometimes too).


I don't know what's the point of the post is. Safe unions require compiler work? Sure who objects to that?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: