unions by joshtriplett · Pull Request #1444 · rust-lang/rfcs (original) (raw)

See the mention of unsafe enum in the alternatives section; @retep998 wrote an RFC for that.

I can certainly see the argument for that, given that Rust enums represent tagged unions. However, modeling untagged unions on enums produces some syntactic challenges, though. How do you access a field of a union? Enum normally only supports pattern-matching syntax; since the pattern-matching requires unsafe code, pulling out a field F would require something like this: let f = unsafe { let MyUnion::F(f) = u; f };. That gets uglier if you have structs inside unions inside structs, which FFI interfaces actually use quite often (a struct of simultaneously valid fields, inside a union of many such structs, inside an outer struct containing a tag and other common fields). At that point, pattern matching for field access starts to approach the level of syntactic complexity introduced by using macros to define and access union fields; in particular, it does not naturally chain as well, and it requires mixing reading right-to-left with left-to-right ordering.

I suspect such syntax would also drive people to include more code in the unsafe block than necessary.

By contrast, field access syntax would simplify that to let f = unsafe { u.f }; And even with nested structures and unions, you'd have something like let f = unsafe { s.u.f.x };.

As discussed in the rust-internals thread and mentioned in the alternatives section of this RFC, you could potentially support struct field access syntax with unsafe enum, allowing access to the fields using dotted notation. However, that would make unsafe enum's syntax differ from the obvious expectation someone would have by looking at its definition, and dotted field notation doesn't make as much sense for safe, tagged enums. A new construct doesn't come with those syntactic expectations (and anyone coming from C will expect a union to use struct-like syntax).

An unsafe enum with multiple fields would also adds to the complexity of supporting field access syntax: if one constructor of the unsafe enum looks like foo(u32, f32), how would you name the two sub-fields? u.foo.0 and u.foo.1? Most FFI interfaces name the sub-fields, which would require using a separate struct anyway; in that case, would you have to write u.foo.0.subfieldname?

Writing to fields seems similarly more complicated with enums.

As a minor additional nit, Rust warns by default for enum constructors that start with a lowercase character; many FFI interfaces would end up needing to disable those warnings.

I think the case of defining an inline structure would work better with an RFC for anonymous struct and union types; I'd be quite happy to write such an RFC as well. Many FFI interfaces will want those anyway, for the common case of a struct containing an anonymous union. However, I don't think that should form part of this RFC; I would suggest a followup after resolving this one. In the meantime, it seems simple enough to define a struct (or tuple struct) and make that a field of the union.

All that said, I could live with unsafe enum or similar along with struct field access syntax; it doesn't seem as intuitive to me, but I'd take it over not having native support for unions at all. (Only having pattern matching, by contrast, seems like a major wart for ergonomic union usage.) As mentioned, I don't care that deeply about the syntax for declaring unions; I care a lot more about the syntax and semantics for using them.