Pre-Proposal: Wasm GC Support in the Canonical ABI (original) (raw)

Pre-Proposal: Wasm GC Support in the Canonical ABI

This issue proposes extensions to the Component Model's Canonical ABI for Wasm GC support and describes some of the motivation for particular choices. I am in the process of implementing these extensions in wasm-tools and wasmtime. My goals are to kick off discussion of how best to integrate GC and the canonical ABI, build consensus, and eventually get these extensions merged into the component model spec itself.

Emoji

Most important things first, I propose that we use the 🛸 emoji (an unidentified flying object1) to represent the Wasm GC extension to the component model in the Explainer.md.

The Problem of Choosing Which Core Type to Lower Into

With the introduction of Wasm GC into the component model, a component type can lower to many different core types.

Consider this record:

record point {
    x: f32,
    y: f32,
}

The first choice is whether to lower this to linear memory or GC, so we've already entered the world of component-to-core type lowering being one-to-many, and therefore we must give components some way of choosing which version is desired.

But let's assume we've somehow opted into using GC instead of linear memory. A naive translation of the point record into a core Wasm GC struct type might look something like this:

(type $point (struct (field f32) (field f32)))

Making the fields immutable means that these objects can be passed between components without copying. But if one component's "at rest" representation of a point is mutable, this is forcing the component itself to do a copy. And if both sides need mutability, then both sides are forced to copy, and we end up with two copies instead of the single copy we would otherwise get when using the linear memory version of the canonical ABI!

Alternatively, we could make the fields mutable:

(type $point (struct (field (mut f32)) (field (mut f32))))

But now if our two components don't require mutability, the engine cannot pass these objects between them without copying because it cannot know that they won't actually be mutated after they're sent across the component boundary (at least, not without a prohibitively expensive global program analysis).

Things get even worse when we consider rec groups. To recap, core Wasm types are deduplicated structurally at the granularity of whole rec groups. Therefore, two otherwise-identical types that appear in structurally-different rec groups are not the same types. Repetitions of otherwise-identical types in the same rec group are not the same types. In the following example, $a and $b are the same type, but they are different from $c, $d, and $e, which are all their own unique types:

;; Note: $a is defined within an implicit singleton rec group. (type $a (struct (field i32)))

;; $b is the same type as $a, its rec group is just explicitly ;; written out. (rec (type $b (struct (field i32))))

;; $c is a distinct type from $a because its rec group is different. (rec (type $c (struct (field i32))) (type (array i8)))

;; $d and $e are distinct from $a because their rec group is ;; different. They are distinct from each other because types are ;; effectively nominal within a rec group. (rec (type $d (struct (field i32))) (type $e (struct (field i32))))

If the "at rest" representation inside one of our components is in a non-singleton rec group, then the component is forced to copy from the canonical ABI struct into its "at rest" struct that is identical other than the rec group it is defined within.

To summarize these observations:

There are many core types that a single component type could lower into.
Minimizing copying requires that the objects going into and coming out of the canonical ABI match the "at rest" representation inside each component.

Therefore, in order to minimize copies, the canonical ABI should (as much as possible) avoid prescribing one particular canonical GC type representation for a given component type; components should be able to choose the core GC types that component types lower to.

New Canonical Options

We add a gc canonical option, which switches the canonical ABI over to GC mode from linear memory mode, when present.
To allow lowered component functions' types to match the component's inner core module's "at rest" representation as often as possible, we add a canonical option to specify the core function type used during function lowering:
(canon lower (func f)...(core−typef) ... (core-type f)...(core−typemy_core_function_type))
The core-type option must be present if and only if the gc option is present.
When present, the component function's type and the given core function type are recursively traversed together and it is checked that component type can be lowered to the associated core type.
This allows components to configure which Wasm GC type lowering they desire, minimizing copies in the overall system.

Encoding

(core-type <idx>) canonical option:
gc canonical option:

GC Type Lowerings

We recurse over the component signature and for each argument and then each argument's type's nested fields and elements, we make sure that the corresponding part of the core signature (that was specified by the (core-type ...) canonical option) matches according to the following rules.

We tweak some of the lowerings depending on whether we are lowering into either

a core value type (for direct arguments and returns), or
a struct field or array element storage type (for nested types like the u8 in a list<u8>).

The lowerings are summarized in the following table, more details follow after.

Component Type	Value Type	Storage Type
bool	i32	i8
s8	i32	i8
u8	i32	i8
s16	i32	i16
u16	i32	i16
s32	i32	-
u32	i32	-
s64	i64	-
u64	i64	-
f32	f32	-
f64	f64	-
char	i32	-
string when string-encoding=utf8	(ref null? (array (mut? i8)))	-
string when string-encoding=utf16	(ref null? (array (mut? i16)))	-
string when string-encoding=latin1+utf16	(ref null? (array (mut? i8)))	-
(record ...T)	(ref null? (struct ...(field (mut? T'))))	-
(variant ...)	(ref null? (struct)	-
(list T)	(ref null? (array (mut? T')))	-
(tuple ...T)	(ref null? (struct ...(field (mut? T'))))	-
(flags ...)	i32	-
(enum ...)	i32	-
(option ...)	(ref null? (struct))	-
(result ...)	(ref null? (struct))	-
(own ...)	externref	-
(borrow ...)	externref	-
(future ...)	externref	-
(stream ...)	externref	-
error-context	externref	-

`bool` and `{s,u}{8,16}`

These types have different lowerings depending on whether we are lowering them to a value type (arguments and results) or a storage type (a field inside another component type). When lowered to a value type, these become i32 values. When lowered to a storage type, these become i8 (bool, {s,u}8) and i16 values ({s,u}16).

Example when lowering into values:

;; Component signature. (type $sig (func (param "x" u8) (param "y" s16) (param "z" bool)))

;; Core signature. (core type $sig' (func (param i32 i32 i32)))

Example when lowering into storage types:

;; Component types. (type $tup (tuple bool s8 u16))

;; A valid core type lowering of those component types. (core type $tup' (struct (field i8) (field i8) (field i16)))

`{s,u}{32,64}`, `f{32,64}`, and `char`

These component types are lowered to the same corresponding core type as in the linear memory version of the canonical ABI, regardless whether we are lowering into a value type or a storage type.

`string`

string types are lowered to references to core arrays of code points. When using the UTF-8 encoding, that means (array i8). When using the UTF-16 encoding, that means (array i16). The compact latin1 and UTF-16 hybrid encoding lowers to an array of the raw encoded bytes: (array i8).

The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another array type and may be defined within an arbitrary rec group.

;; Component signature. (type $sig (func (param "s" string)))

;; A valid core lowering, assuming UTF-8 encoding. (core type $string' (array i8)) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullstring'))))

`record` and `tuple`

Component record and tuple types lower to references to core struct types where the component type's fields are point-wise lowered into core storage types that become the core struct's fields.

The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another struct type and may be defined within an arbitrary rec group.

;; Component types. (type $point (tuple s32 s32)) (type triangle(record(field"x"triangle (record (field "x" triangle(record(field"x"point) (field "y" $point) (field "z" $point))) (type sig(func(param"a"sig (func (param "a" sig(func(param"a"triangle)))

;; A valid core type lowering of those component types. (core type $point' (struct (field i32) (field i32))) (core type triangle′(struct(fieldtriangle' (struct (field triangle′(struct(fieldx' (ref $point')) (field y′(refy' (ref y′(refpoint')) (field z′(refz' (ref z′(refpoint')))) (core type sig′(func(param(refsig' (func (param (ref sig′(func(param(reftriangle'))))

Discussion: I have not implemented width- or depth-subtyping for records and tuples, but it is intended that GC lowering can be straightforwardly extended to allow such subtyping in the future. We will, of course, need to take care to ensure that variance is sound.

`variant`

Component variant types are matched against a reference to a struct type with no fields.

When passing a case value, the value must be an instance of a subtype of the specified core struct type. That subtype must have fields that correspond to the case's payload type lowered into a storage type, if any. All case subtypes must be defined in the same rec group, in the same order as the variant's cases.

;; Component type and signature.

(type $animal (variant (case $cat "cat" u32) (case $dog "dog" u16) (case $rabbit "rabbit" u16)))

(type sig(func(param"animal"sig (func (param "animal" sig(func(param"animal"animal)))

;; A valid core lowering of that component type and signature.

(core type $animal' (sub (struct))) (core rec (type cat′(subcat' (sub cat′(subanimal' (struct (field (mut i32))))) (type dog′(subdog' (sub dog′(subanimal' (struct (field (mut i16))))) (type rabbit′(subrabbit' (sub rabbit′(subanimal' (struct (field (mut i16))))))

(core type sig′(param(refsig' (param (ref sig′(param(refanimal')))

Discussion: The requirement that all case subtypes are defined in the same rec group allows discriminating between cases with structurally identical payloads.

Discussion: The way we define the exact shape of each case type above is less flexible than other GC lowerings, for example record lowerings, where an arbitrary type in an arbitrary rec group can be specified and the only requirement upon the specified type is that it structurally match the component type. The core case types cannot all be defined in the same rec group according the above rules, for example. Unlike other types, for sum types we don't have a core type in the (core-type ...) canonical option's specified core function type that the user can use to specify exactly which type a particular case of a variant should be lowered into. Therefore, we define the above rules prescribing the exact types of each lowered case value.

An alternative approach would be to define a new canonical option that allows the user to specify the exact core types that each component case value is lowered into:

(case-core-type my−component−casemy-component-case my−component−casemy-core-type)

The $my-component-case would name a component variant's case and $my-core-type would name a core struct type whose fields match the associated case's payload's lowering and which is a subtype of the struct type that appears in the (core-type ...) canonical option's core function type's corresponding argument or nested type reference location.

(result and option types would probably require defining additional canonical options as well, because they do not have name-able cases).

It should be noted that this approach is still not quite as flexible as record lowering, because all instances of the case value would always be lowered to the same core type, where as two records arguments in the same function call, for example, can be lowered to two different core struct types so long as they both structurally match. In practice, this constraint would probably not be onerous. The primary reason this approach was considered and rejected is its verbosity: every sum type in the transitive type reference closure would require specifying a (case-core-type ...) canonical option.

`list<T>`

Component list<T> types are lowered to a reference to a core array. The array's element type must match the lowering of T into a storage type.

;; Component type. (type $bools (list bool)) (type sig(func(param"x"sig (func (param "x" sig(func(param"x"bools)))

;; A valid lowering to core types. (core type $bools' (array (mut i8))) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullbools'))))

`flags` and `enum`

Component flags and enum types are lowered in the same manner as in the linear memory version of the canonical ABI, regardless whether we are lowering into a value type or a storage type.

`option`

Component option types are lowered to a reference to an empty core struct type. This struct type must not be final.

When passing a none value, the value must be an instance of a subtype of the specified core struct type which does not define any additional fields. This type must be defined as the first type in a rec group consisting of this type followed by the some value's type.

When passing a some value, the value must be an instance of a subtype of the specified struct type that has a field containing option's inner type lowered into a storage type. This type must be defined as the second type in a rec group consisting of the none value's type followed by the some value's type.

If the value is not an instance of one of the two above subtypes, then the call will trap at runtime.

;; Component types. (type $opt (option u32)) (type sig(func(param"x"sig (func (param "x" sig(func(param"x"opt)))

;; A valid core type lowering. (core type $opt' (sub (struct))) (core rec (type opt′−none(subopt'-none (sub opt′−none(subopt' (struct))) (type opt′−some(subopt'-some (sub opt′−some(subopt' (struct (field (mut i32)))))) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullopt'))))

Discussion: Note that we do not represent (option T) component types with (ref null T') core types, where a null reference represents the none value and a non-null reference represents some values. There are two reasons for this:

Scalar types are heap types, so (option u32) could not become (ref null u32) because the latter is invalid. We would have to require the definition of a (struct <math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo stretchy="false">(</mo><mi>f</mi><mi>i</mi><mi>e</mi><mi>l</mi><mi>d</mi><mo stretchy="false">(</mo><mi>m</mi><mi>u</mi><mi>t</mi><mi>u</mi><mn>32</mn><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mi mathvariant="normal">‘</mi><mi>t</mi><mi>y</mi><mi>p</mi><mi>e</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>n</mi><mi>l</mi><mi>o</mi><mi>w</mi><mi>e</mi><mi>r</mi><mi>t</mi><mi>o</mi><mi mathvariant="normal">‘</mi><mo stretchy="false">(</mo><mi>r</mi><mi>e</mi><mi>f</mi><mi>n</mi><mi>u</mi><mi>l</mi><mi>l</mi></mrow><annotation encoding="application/x-tex">s (field (mut u32))) type and then lower to (ref null </annotation></semantics></math>s(field(mutu32)))‘typeandthenlowerto‘(refnulls) instead. This adds undesirable special-casing to the lowering rules.

Using null references to represent none values is not composable. When dealing with (option (option T)), if null references represented none values, then a null value is be ambiguous: does it mean none or some(none)? This ambiguity and composability issue could be resolved with additional special-casing, similar to the previous point, but such special-casing implies additional undesirable complexity.

`result`

Component result types are lowered to a reference to a core struct type without any fields. This struct type must not be final.

When passing a success value, the value must be an instance of a subtype of the specified struct type. This subtype must define fields corresponding to the lowering of the result's success case into a storage type, if any. This type must be defined as the first core type in a rec group consisting ofthis subtype followed by the result's error-case subtype.

When passing an error value, the value must be an instance of a subtype of the specified core struct type. This subtype must define fields corresponding to the lowering of the result's error case into a storage type, if any. This type must be defined as the second core type in a rec group consisting of the result's success-case subtype followed by this subtype.

If the value is not an instance of one of the two above subtypes, then the call will trap at runtime.

;; Component type. (type $res (result u32 (error bool))) (type sig(func(param"x"sig (func (param "x" sig(func(param"x"res)))

;; A valid core type lowering. (core type $res' (sub (struct (field i8)))) (core rec (type res′−success(subres'-success (sub res′−success(subres' (struct (field i8) (field i32)))) (type res′−error(subres'-error (sub res′−error(subres' (struct (field i8) (field i8))))) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullres'))))

`own`, `borrow`, `future`, `stream`, and `error-context`

These component types are lowered into references to core extern heap types.

The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime.

;; Component types and signature.

(type $file (resource (rep i32))) (type $future-bool (future bool)) (type $stream-u32 (stream u32))

(type sig(func(param"a"(ownsig (func (param "a" (own sig(func(param"a"(ownfile)) (param "b" (borrow $file)) (param "c" $future-bool) (param "d" $stream-u32)))

;; A valid core type lowering.

(core type $sig' (func (param (ref extern) (ref null extern) (ref extern) (ref null extern))))

Object. Haha -- get it?? Like GC objects. GET IT!? HAHA WHY AREN'T YOU LAUGHING?!?!?!?!? ↩

Pre-Proposal: Wasm GC Support in the Canonical ABI (original) (raw)