Wasm needs a better memory management story (original) (raw)

Hi all,

after a video call with google last week, I was encouraged to raise a conversation here around issues we at Unity have with Wasm memory allocation.

The short summary is that currently Wasm has grave limitations that make many applications infeasible to be reliably deployed on mobile browsers. Here I stress the word reliably, since things may work on some devices for some % of users you deploy to, depending on how much memory your wasm page needs, but as your application's memory needs grow, the % of users you are able to deploy to can dramatically fall.

These issues already occur when the Wasm page uses only a fraction of total RAM of the device. (e.g. at 300MB-500MB)

These issues have been raised as browser issues, but the underlying theme is recognizing that the wasm spec is not robust enough for mobile deployment to customers.

These troubles stem from the following limitations:

No way to control in a guaranteed fashion when new memory commit vs address space reserve occurs.
No way to uncommit used memory pages.
No way to shrink the allocated Wasm Memory.
No virtual memory support (leading applications to either expect to always be able to grow, or have to implement memory defrag solutions)
If Memory is Shared, then application needs to know the Maximum memory size ahead of time, or gratuitously reserve all that it can.

So basically Wasm memory story is "you can only grab more memory, with no guarantee if the memory you got is a reserve or a commit".

These are not particularly newly recognized issues, the memory model has been the same since MVP, and we have been dealing these ever since early asm.js days, but now that applications are becoming more complex and developers' expectations on what types of applications they want to deploy on which devices is growing, and developers are actually aiming to ship to paying customers, where reliability needs to be near that 100%, we are seeing hard ceilings on this issue in the wild.

Note that listing the limitations above is not implying that fix would be for wasm spec to somehow add support to all of these, but to set the stage that these are the limitations that exist, since their contributed combination is what causes headache to developers.

The way that Wasm VM implementations seem to tackle these issues is to try to be smart/automatic under the hood about reserve vs commit behavior, and esp. around shared vs non-shared memory. However it is still the application developer's responsibility to concretely navigate the app in the low-memory landscape, and this leads to developers needing to "decipher" the VM's behavior patterns around commit vs reserve outside the spec. For an example of the vendor-specific suggestions that this leads to, see https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7 .

On desktop, the Wasm spec memory issues have so far fallen in the "awkward" category at most, because i) all OSes and browsers have completed migration to 64-bit already, ii) desktops can afford large 16GB+ RAM sizes (and RAM sizes are expandable on many desktops), and iii) desktops have large disk sizes for the OS to swap pages out to, so even large numbers of committed pages may not be the end of the world (just "awkward") esp. if they go unused for most parts.

On mobile, none of that is true.

Note that wasm memory64 proposal does not relate or solve to this problem. That proposal is about letting applications to use more than 4GB of memory, but this issue is about Wasm applications not being able to safely manage much smaller amounts of memory on mobile devices. (the opposite is probably true, attempting to deploy wasm64 on mobile devices would cause even more issues)

Currently allocating more than ~300MB of memory is not reliable on Chrome on Android without resorting to Chrome-specific workarounds, nor in Safari on iOS. As per the suggestions in the Chromium thread, applications should either know up front at compile time how much memory they will need, or gratuitously reserve everything that they can. Neither of these suggestions is viable.

Why Wasm requires developers to know the needed memory size at compile time

The Wasm spec says that one can conveniently set initial memory size to what they need to launch, and then grow more when the situation demands it. Setting maximum is optional, to allow for unbounded growth. On paper this suggests that developers might not need to know how much they need at compile time.

Reality is quite different, for the following reasons:

in the wild we have reports that memory allocation success rate can be better when initially allocate K MB, versus if you first allocate less, and later try to grow to K MB. The conversation in https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7 also suggests that.
if shared memory is used, one does need to know an upper bound for the maximum memory usage.
since an application will need to account for the largest memory usage it may need (or it will fail at some point of its lifetime), practically initial == maximum memory.
one cannot set a gratuitous upper bound, since that can fail the allocation,
one cannot probe the largest upper bound that works in practice, since that can suffocate the browser or other JS allocations to fail.

In practice, especially on memory constrained devices, the current spec necessitates developers to somehow "just know" how much memory will be needed.

Why expecting developers to set memory size at compile time is not feasible

With respect to memory usage patterns, there are generally three types of apps/app workloads:

app workloads that use an unknown amounts of memory (AutoCAD/OpenOffice/etc document editors with "bring your own workload")
app workloads that use varying amounts of memory ("game menu needs 100MB, game level 1 800MB, game level 2 400MB, etc.")
app workloads that need a known constant amount of memory,

App developers cannot know the wasm memory size of apps of first type. To enable everyone's work size, they must generally reserve everything they can, and this has problems:

if one sets a huge 4GB max memory size, the VM may not allow that allocation, failing the app from starting even for users that would have only needed 1GB,
if one probes the largest max memory size that VM will accept, such probing can cause the browser to kill the page immediately, because it thinks it is using too much memory, or if it succeeds, it can cause the app to fail later on some other JS memory allocation since the wasm allocation took up all the available memory for the web page. (or it can cause the browser itself to fail, as we saw with Chrome GPU thread on Epic UE4 Zen Garden)
even if an application does find the suitable size to avoid the above issues, after the user unloads their document, the web page is unable to release that memory back to the system. On desktop the thinking may be that it does not matter, but on mobile it is critical to be able to release unused memory back, or the OS will be more eager to kill you when task switching.

App developers of type 2) share much of the above problems that apps of type 1) have, but one might argue they should be expected to be able to find the max needed size throughout their app lifetime and allocate that, but finding that limit can be hard work, and you may not be able to do it with 100% certainty.

Or developers of apps of type 3) might certainly be expected to choose the right needed amount and be happy with it. Initially it sounds like developers who have an app of type 3 can profile their apps to come up with a suitable initial memory size and never grow. However this has issues:

sometimes you don't know if your app certainly is of type 3). Hence you might allocate an initial K MB, but choose a maximum of K+delta MB to account for unexpected growth. This can cause failures to your app when you do need to grow, since the mobile device might fail the growth. (but it might have succeeded had you chosen initial:K+delta in the first place). Same goes for apps of type 2)
because profiling memory usage can be hard, or it may be something developers don't know how to do, application developers may choose to just allocate everything they can to "remove a problem" without being aware of the consequences. We routinely see this in practice, where e.g. on itch.io you can see simple 2D games that run with a 1.5 Gig Wasm heap of which most is unused. There is uncertainty if that is wasted committed memory, or just reservation, because the spec gives no guarantees. Then they complain that web browsers/wasm is crap when their game doesn't work on mobile.

Android app switching is a major Wasm usability pain

The documentation at https://developer.android.com/topic/performance/memory-overview at the very bottom of the page states:

Note: The less memory your app consumes while in the cache, the better
its chances are not to be killed and to be able to quickly resume.

It is a common game development QA test to perform "fast app switching" testing, which can kill game UX and player interest if it does not work. For example if a user is playing a game, then gets a WhatsApp message, they will quickly switch over to WhatsApp, type in a message, and then switch back in to the game, and expect the game to still be running. Or switch over to email, or Instagram, or whatever you have, and come back a few minutes later.

The less memory your application is consuming, the better chances you have that the page will not need to reload. With native applications this prompts the developer to push their memory usage down as much as possible when they are switched out. Mobile devices do not swap memory back to disk (at least like desktops do), but they will kill background apps if they run out of memory.

For wasm apps running in a browser, this means that for an app that has extra gig in their Wasm heap going unused because they cannot release it back to the OS, the browser will become a prime target for being killed out, and when you task switch back to the app page, the page will reload from scratch, killing fast switching.

Safari even kills you on the foreground if you allocate too much - but you have no way of knowing how much that too much is.

Some applications need address space, not memory

Native compiled wasm applications behave very similar to native applications. It is often a need for a native application to reserve a lot of address space in order to get access to a chunk of linearly consecutive memory (when existing memory allocations cannot find a linear block). Wasm applications sometimes need that too. Currently the only way to do that is to .grow() by a large amount. This means that whatever smaller bits of fragmented memory a wasm app has, can go unused, but still be committed in memory. This causes wasm apps to use more committed memory than their native counterparts.

The amount of this overhead depends on the amount of fragmentation that the wasm app causes. Most native applications have not needed to care about this for ages, but for wasm, this can be all of a sudden a huge issue. Note that memory64 proposal again does not resolve this, because it does not bring virtual memory to wasm - just changes the ISA to accept 64-bit addresses (to my best knowledge)

Summarising the problems

Reiterating, the main problems that we currently see:

wasm spec expects developers to need to know the required memory size, which is not feasible for the reasons described above,
wasm apps may need to run with large overallocated memories, leading to browser failures, JS alloc failures, or if lucky, "just" to Android app switching UX problems,
wasm apps consume more memory than native counterparts, because of memory fragmentation, lack of virtual memory, and lack of unmapping memory pages

What can be done about the problem?

In a recent video call with ARM, we discussed the (lack of) adoption of Unity3D on Wasm on ARM mobile devices, and the short summary is that these memory issues are a hard wall for feasibility of Unity3D on Wasm on Android. There have been existing conversations in #1396 and #1300 about how to shrink memory, but no concrete progress.

On the concrete bugs front, if Chrome eventually migrates to 64-bit process on Android, it can help larger than 300MB Wasm applications to work on chrome. (However an issue here may be is that manufacturers are still releasing 32-bit only Android hardware in 2020, because of old inventory stock or what - we have no idea) If Safari fixes their eager page kill behavior, maybe it will help developers gauge the max limits on iPhones. But those will not help the problem that a committed memory page is still a committed memory page, and a mobile device does have to carry it around somewhere.

Besides that, here are some ideas:

Would it be possible to make the commit vs reserve behavior explicit for Wasm? Maybe as a browser coordinated extension if not for the core spec? This would give guarantees to application developers as to what the best practices initial vs maximum vs grow semantics should be. The current situation where one browser vendor recommends to probe the max amount of memory that can be reserved, vs another browser vendor expecting that apps allocate only the minimum needed amount or be killed if they exceed that, strongly suggests that the spec is missing something to connect the expectations together.
Would it be possible to add support for unmapping memory pages from Wasm? Then e.g. Emscripten could implement unmapping of memory pages into its dlmalloc() and emmalloc() implementations, fixing memory commit issues, and the related Safari "high memory consumption" process killing, and Android task switch killing troubles?
Would it be possible to somehow make a softer version of WebAssembly.Memory maximum field? If an app allocates Memory with maximum=4gb, which risks the rest of the browser/JS losing its address space (in 32-bit contexts), then maybe the browser could start reclaiming the highest parts of that reserved address space for its own purposes if the wasm app hasn't .grow()n that memory into its own use yet?

Then if one allocated a Memory with maximum probed to as much as it can go, but then allocated a large regular ArrayBuffer, maybe the browser could just steal some of that maximum back, if the Wasm app hasn't .grow()n into it? Likewise, if there was a .shrink() operation that an app could make use of, then maybe paired with this kind of address space stealing logic, the Wasm app and the rest of the browser could coordinate to "trade" address space, depending on how much of it was actually committed in the wasm heap, vs not actually used.

I hope the impressions here will not be a "this should be left to implementation details", since when I raised these concerns as a browser implementation bug, the message was that maybe the wasm spec should address this. And currently browsers are certainly not providing common enough implementations to enable developers to succeed with Wasm on mobile devices.

Thanks if you read all the way to the end on the long post!