Pre-Proposal: Wasm GC Support in the Canonical ABI (original) (raw)
Pre-Proposal: Wasm GC Support in the Canonical ABI
This issue proposes extensions to the Component Model's Canonical ABI for Wasm GC support and describes some of the motivation for particular choices. I am in the process of implementing these extensions in wasm-tools
and wasmtime
. My goals are to kick off discussion of how best to integrate GC and the canonical ABI, build consensus, and eventually get these extensions merged into the component model spec itself.
Emoji
Most important things first, I propose that we use the 🛸 emoji (an unidentified flying object1) to represent the Wasm GC extension to the component model in the Explainer.md
.
The Problem of Choosing Which Core Type to Lower Into
With the introduction of Wasm GC into the component model, a component type can lower to many different core types.
Consider this record
:
record point {
x: f32,
y: f32,
}
The first choice is whether to lower this to linear memory or GC, so we've already entered the world of component-to-core type lowering being one-to-many, and therefore we must give components some way of choosing which version is desired.
But let's assume we've somehow opted into using GC instead of linear memory. A naive translation of the point
record into a core Wasm GC struct
type might look something like this:
(type $point (struct (field f32) (field f32)))
Making the fields immutable means that these objects can be passed between components without copying. But if one component's "at rest" representation of a point is mutable, this is forcing the component itself to do a copy. And if both sides need mutability, then both sides are forced to copy, and we end up with two copies instead of the single copy we would otherwise get when using the linear memory version of the canonical ABI!
Alternatively, we could make the fields mutable:
(type $point (struct (field (mut f32)) (field (mut f32))))
But now if our two components don't require mutability, the engine cannot pass these objects between them without copying because it cannot know that they won't actually be mutated after they're sent across the component boundary (at least, not without a prohibitively expensive global program analysis).
Things get even worse when we consider rec
groups. To recap, core Wasm types are deduplicated structurally at the granularity of whole rec
groups. Therefore, two otherwise-identical types that appear in structurally-different rec groups are not the same types. Repetitions of otherwise-identical types in the same rec
group are not the same types. In the following example, $a
and $b
are the same type, but they are different from $c
, $d
, and $e
, which are all their own unique types:
;; Note: $a
is defined within an implicit singleton rec
group.
(type $a (struct (field i32)))
;; $b
is the same type as $a
, its rec
group is just explicitly
;; written out.
(rec (type $b (struct (field i32))))
;; $c
is a distinct type from $a
because its rec
group is different.
(rec (type $c (struct (field i32)))
(type (array i8)))
;; $d
and $e
are distinct from $a
because their rec
group is
;; different. They are distinct from each other because types are
;; effectively nominal within a rec
group.
(rec (type $d (struct (field i32)))
(type $e (struct (field i32))))
If the "at rest" representation inside one of our components is in a non-singleton rec
group, then the component is forced to copy from the canonical ABI struct
into its "at rest" struct
that is identical other than the rec
group it is defined within.
To summarize these observations:
- There are many core types that a single component type could lower into.
- Minimizing copying requires that the objects going into and coming out of the canonical ABI match the "at rest" representation inside each component.
Therefore, in order to minimize copies, the canonical ABI should (as much as possible) avoid prescribing one particular canonical GC type representation for a given component type; components should be able to choose the core GC types that component types lower to.
New Canonical Options
- We add a
gc
canonical option, which switches the canonical ABI over to GC mode from linear memory mode, when present. - To allow lowered component functions' types to match the component's inner core module's "at rest" representation as often as possible, we add a canonical option to specify the core function type used during function lowering:
(canon lower (func f)...(core−typef) ... (core-type f)...(core−typemy_core_function_type))
Thecore-type
option must be present if and only if thegc
option is present.
When present, the component function's type and the given core function type are recursively traversed together and it is checked that component type can be lowered to the associated core type.
This allows components to configure which Wasm GC type lowering they desire, minimizing copies in the overall system.
Encoding
(core-type <idx>)
canonical option:gc
canonical option:
GC Type Lowerings
We recurse over the component signature and for each argument and then each argument's type's nested fields and elements, we make sure that the corresponding part of the core signature (that was specified by the (core-type ...)
canonical option) matches according to the following rules.
We tweak some of the lowerings depending on whether we are lowering into either
- a core value type (for direct arguments and returns), or
- a struct field or array element storage type (for nested types like the
u8
in alist<u8>
).
The lowerings are summarized in the following table, more details follow after.
Component Type | Value Type | Storage Type |
---|---|---|
bool | i32 | i8 |
s8 | i32 | i8 |
u8 | i32 | i8 |
s16 | i32 | i16 |
u16 | i32 | i16 |
s32 | i32 | - |
u32 | i32 | - |
s64 | i64 | - |
u64 | i64 | - |
f32 | f32 | - |
f64 | f64 | - |
char | i32 | - |
string when string-encoding=utf8 | (ref null? (array (mut? i8))) | - |
string when string-encoding=utf16 | (ref null? (array (mut? i16))) | - |
string when string-encoding=latin1+utf16 | (ref null? (array (mut? i8))) | - |
(record ...T) | (ref null? (struct ...(field (mut? T')))) | - |
(variant ...) | (ref null? (struct) | - |
(list T) | (ref null? (array (mut? T'))) | - |
(tuple ...T) | (ref null? (struct ...(field (mut? T')))) | - |
(flags ...) | i32 | - |
(enum ...) | i32 | - |
(option ...) | (ref null? (struct)) | - |
(result ...) | (ref null? (struct)) | - |
(own ...) | externref | - |
(borrow ...) | externref | - |
(future ...) | externref | - |
(stream ...) | externref | - |
error-context | externref | - |
bool
and {s,u}{8,16}
These types have different lowerings depending on whether we are lowering them to a value type (arguments and results) or a storage type (a field inside another component type). When lowered to a value type, these become i32
values. When lowered to a storage type, these become i8
(bool
, {s,u}8
) and i16
values ({s,u}16
).
Example when lowering into values:
;; Component signature. (type $sig (func (param "x" u8) (param "y" s16) (param "z" bool)))
;; Core signature. (core type $sig' (func (param i32 i32 i32)))
Example when lowering into storage types:
;; Component types. (type $tup (tuple bool s8 u16))
;; A valid core type lowering of those component types. (core type $tup' (struct (field i8) (field i8) (field i16)))
{s,u}{32,64}
, f{32,64}
, and char
These component types are lowered to the same corresponding core type as in the linear memory version of the canonical ABI, regardless whether we are lowering into a value type or a storage type.
string
string
types are lowered to references to core array
s of code points. When using the UTF-8 encoding, that means (array i8)
. When using the UTF-16 encoding, that means (array i16)
. The compact latin1 and UTF-16 hybrid encoding lowers to an array of the raw encoded bytes: (array i8)
.
The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another array
type and may be defined within an arbitrary rec
group.
;; Component signature. (type $sig (func (param "s" string)))
;; A valid core lowering, assuming UTF-8 encoding. (core type $string' (array i8)) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullstring'))))
record
and tuple
Component record
and tuple
types lower to references to core struct
types where the component type's fields are point-wise lowered into core storage types that become the core struct
's fields.
The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another struct
type and may be defined within an arbitrary rec
group.
;; Component types. (type $point (tuple s32 s32)) (type triangle(record(field"x"triangle (record (field "x" triangle(record(field"x"point) (field "y" $point) (field "z" $point))) (type sig(func(param"a"sig (func (param "a" sig(func(param"a"triangle)))
;; A valid core type lowering of those component types. (core type $point' (struct (field i32) (field i32))) (core type triangle′(struct(fieldtriangle' (struct (field triangle′(struct(fieldx' (ref $point')) (field y′(refy' (ref y′(refpoint')) (field z′(refz' (ref z′(refpoint')))) (core type sig′(func(param(refsig' (func (param (ref sig′(func(param(reftriangle'))))
Discussion: I have not implemented width- or depth-subtyping for
record
s andtuple
s, but it is intended that GC lowering can be straightforwardly extended to allow such subtyping in the future. We will, of course, need to take care to ensure that variance is sound.
variant
Component variant
types are matched against a reference to a struct
type with no fields.
The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another struct
type and may be defined within an arbitrary rec
group.
When passing a case
value, the value must be an instance of a subtype of the specified core struct
type. That subtype must have fields that correspond to the case's payload type lowered into a storage type, if any. All case subtypes must be defined in the same rec
group, in the same order as the variant
's case
s.
;; Component type and signature.
(type $animal (variant (case $cat "cat" u32) (case $dog "dog" u16) (case $rabbit "rabbit" u16)))
(type sig(func(param"animal"sig (func (param "animal" sig(func(param"animal"animal)))
;; A valid core lowering of that component type and signature.
(core type $animal' (sub (struct))) (core rec (type cat′(subcat' (sub cat′(subanimal' (struct (field (mut i32))))) (type dog′(subdog' (sub dog′(subanimal' (struct (field (mut i16))))) (type rabbit′(subrabbit' (sub rabbit′(subanimal' (struct (field (mut i16))))))
(core type sig′(param(refsig' (param (ref sig′(param(refanimal')))
Discussion: The requirement that all
case
subtypes are defined in the samerec
group allows discriminating betweencase
s with structurally identical payloads.
Discussion: The way we define the exact shape of each
case
type above is less flexible than other GC lowerings, for examplerecord
lowerings, where an arbitrary type in an arbitraryrec
group can be specified and the only requirement upon the specified type is that it structurally match the component type. The corecase
types cannot all be defined in the samerec
group according the above rules, for example. Unlike other types, for sum types we don't have a core type in the(core-type ...)
canonical option's specified core function type that the user can use to specify exactly which type a particularcase
of avariant
should be lowered into. Therefore, we define the above rules prescribing the exact types of each loweredcase
value.An alternative approach would be to define a new canonical option that allows the user to specify the exact core types that each component
case
value is lowered into:(case-core-type my−component−casemy-component-case my−component−casemy-core-type)
The
$my-component-case
would name a componentvariant
'scase
and$my-core-type
would name a corestruct
type whose fields match the associatedcase
's payload's lowering and which is a subtype of thestruct
type that appears in the(core-type ...)
canonical option's core function type's corresponding argument or nested type reference location.(
result
andoption
types would probably require defining additional canonical options as well, because they do not have name-able cases).It should be noted that this approach is still not quite as flexible as
record
lowering, because all instances of thecase
value would always be lowered to the same core type, where as tworecord
s arguments in the same function call, for example, can be lowered to two different corestruct
types so long as they both structurally match. In practice, this constraint would probably not be onerous. The primary reason this approach was considered and rejected is its verbosity: every sum type in the transitive type reference closure would require specifying a(case-core-type ...)
canonical option.
list<T>
Component list<T>
types are lowered to a reference to a core array
. The array
's element type must match the lowering of T
into a storage type.
The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another struct
type and may be defined within an arbitrary rec
group.
;; Component type. (type $bools (list bool)) (type sig(func(param"x"sig (func (param "x" sig(func(param"x"bools)))
;; A valid lowering to core types. (core type $bools' (array (mut i8))) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullbools'))))
flags
and enum
Component flags
and enum
types are lowered in the same manner as in the linear memory version of the canonical ABI, regardless whether we are lowering into a value type or a storage type.
option
Component option
types are lowered to a reference to an empty core struct
type. This struct
type must not be final
.
The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another struct
type and may be defined within an arbitrary rec
group.
When passing a none value, the value must be an instance of a subtype of the specified core struct
type which does not define any additional fields. This type must be defined as the first type in a rec
group consisting of this type followed by the some value's type.
When passing a some value, the value must be an instance of a subtype of the specified struct
type that has a field containing option
's inner type lowered into a storage type. This type must be defined as the second type in a rec
group consisting of the none value's type followed by the some value's type.
If the value is not an instance of one of the two above subtypes, then the call will trap at runtime.
;; Component types. (type $opt (option u32)) (type sig(func(param"x"sig (func (param "x" sig(func(param"x"opt)))
;; A valid core type lowering. (core type $opt' (sub (struct))) (core rec (type opt′−none(subopt'-none (sub opt′−none(subopt' (struct))) (type opt′−some(subopt'-some (sub opt′−some(subopt' (struct (field (mut i32)))))) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullopt'))))
Discussion: Note that we do not represent
(option T)
component types with(ref null T')
core types, where a null reference represents the none value and a non-null reference represents some values. There are two reasons for this:
- Scalar types are heap types, so
(option u32)
could not become(ref null u32)
because the latter is invalid. We would have to require the definition of a(struct <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo stretchy="false">(</mo><mi>f</mi><mi>i</mi><mi>e</mi><mi>l</mi><mi>d</mi><mo stretchy="false">(</mo><mi>m</mi><mi>u</mi><mi>t</mi><mi>u</mi><mn>32</mn><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mi mathvariant="normal">‘</mi><mi>t</mi><mi>y</mi><mi>p</mi><mi>e</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>n</mi><mi>l</mi><mi>o</mi><mi>w</mi><mi>e</mi><mi>r</mi><mi>t</mi><mi>o</mi><mi mathvariant="normal">‘</mi><mo stretchy="false">(</mo><mi>r</mi><mi>e</mi><mi>f</mi><mi>n</mi><mi>u</mi><mi>l</mi><mi>l</mi></mrow><annotation encoding="application/x-tex">s (field (mut u32)))
type and then lower to(ref null </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">s</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">i</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">d</span><span class="mopen">(</span><span class="mord mathnormal">m</span><span class="mord mathnormal">u</span><span class="mord mathnormal">t</span><span class="mord mathnormal">u</span><span class="mord">32</span><span class="mclose">)))</span><span class="mord">‘</span><span class="mord mathnormal">t</span><span class="mord mathnormal" style="margin-right:0.03588em;">y</span><span class="mord mathnormal">p</span><span class="mord mathnormal">e</span><span class="mord mathnormal">an</span><span class="mord mathnormal">d</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal" style="margin-right:0.02778em;">er</span><span class="mord mathnormal">t</span><span class="mord mathnormal">o</span><span class="mord">‘</span><span class="mopen">(</span><span class="mord mathnormal">re</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">n</span><span class="mord mathnormal">u</span><span class="mord mathnormal" style="margin-right:0.01968em;">ll</span></span></span></span>s)
instead. This adds undesirable special-casing to the lowering rules.- Using null references to represent none values is not composable. When dealing with
(option (option T))
, if null references represented none values, then a null value is be ambiguous: does it meannone
orsome(none)
? This ambiguity and composability issue could be resolved with additional special-casing, similar to the previous point, but such special-casing implies additional undesirable complexity.
result
Component result
types are lowered to a reference to a core struct
type without any fields. This struct
type must not be final
.
The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime. The referenced type may be an arbitrary subtype of another struct
type and may be defined within an arbitrary rec
group.
When passing a success value, the value must be an instance of a subtype of the specified struct
type. This subtype must define fields corresponding to the lowering of the result
's success case into a storage type, if any. This type must be defined as the first core type in a rec
group consisting ofthis subtype followed by the result
's error-case subtype.
When passing an error value, the value must be an instance of a subtype of the specified core struct
type. This subtype must define fields corresponding to the lowering of the result
's error
case into a storage type, if any. This type must be defined as the second core type in a rec
group consisting of the result
's success-case subtype followed by this subtype.
If the value is not an instance of one of the two above subtypes, then the call will trap at runtime.
;; Component type. (type $res (result u32 (error bool))) (type sig(func(param"x"sig (func (param "x" sig(func(param"x"res)))
;; A valid core type lowering. (core type $res' (sub (struct (field i8)))) (core rec (type res′−success(subres'-success (sub res′−success(subres' (struct (field i8) (field i32)))) (type res′−error(subres'-error (sub res′−error(subres' (struct (field i8) (field i8))))) (core type sig′(func(param(refnullsig' (func (param (ref null sig′(func(param(refnullres'))))
own
, borrow
, future
, stream
, and error-context
These component types are lowered into references to core extern
heap types.
The reference may be either nullable or non-nullable; calls will trap if the reference is null at runtime.
;; Component types and signature.
(type $file (resource (rep i32))) (type $future-bool (future bool)) (type $stream-u32 (stream u32))
(type sig(func(param"a"(ownsig (func (param "a" (own sig(func(param"a"(ownfile)) (param "b" (borrow $file)) (param "c" $future-bool) (param "d" $stream-u32)))
;; A valid core type lowering.
(core type $sig' (func (param (ref extern) (ref null extern) (ref extern) (ref null extern))))
- Object. Haha -- get it?? Like GC objects. GET IT!? HAHA WHY AREN'T YOU LAUGHING?!?!?!?!? ↩