Compiling to WebAssembly with Binaryen (original) (raw)

This page explains how you can write a compiler to WebAssembly using Binaryen. First, let's get some FAQs out of the way.

Why compile to WebAssembly?

WebAssembly is a cross-browser standard for executable code. By compiling to it, you can run your code on the web, without plugins.

As well as working in browsers, WebAssembly also works as a general-purpose cross-platform binary format. For example, you can use standalone WebAssembly runtimes and the upcoming WASI system interface to create CLI applications.

Why compile to WebAssembly using Binaryen?

There are already a few ways to compile to WebAssembly, and more will probably appear. Different approaches can have different benefits and tradeoffs. Specifically, Binaryen aims to be

Binaryen keeps things simple and fast by using an internal IR that is almost identical to WebAssembly. This makes sense for two main reasons:

Therefore even with a single IR in Binaryen we should be able to emit fairly good code. And with just one IR, you can avoid a substantial amount of overhead that most compilers have.

In addition, Binaryen's IR is designed to be lightweight and fast:

Note that we said Binaryen uses WebAssembly as its main IR. Binaryen already has optional support for a more CFG-style input IR as well, for convenience; others may follow, but a fast one-IR path will remain.

What do I need to have in order to use Binaryen to compile to WebAssembly?

Binaryen expects input that is generally compatible with being compiled to WebAssembly. Specifically, you should be able to emit data for it in the following form:

(local.set $temp1 (i32.const 42)) (local.set $temp2 (call $foo (local.get $temp1))) (local.set $temp3 (i32.eqz (local.get $temp2))) (call $bar (local.get $temp3))

;; instead of

(call $bar (i32.eqz (call $foo (i32.const 42))))

Technical details

There are several ways to use Binaryen, which we will describe in this section: from JS, C, and C++.

JS API - binaryen.js

See the binaryen.js-API docs for details. Code samples are in test/binaryen.js.

C API

The C API is in a single header here. That contains everything you need together with the libbinaryen library which is built in lib/.

There is a hello world test which is a good starting point. There is also a kitchen-sink test as well, which should cover practically all the API.

When compiling to Binaryen, you'll probably do something like what you see in those examples, which follows this pattern:

For a complete example of a compiler using the C API, look at the mir2wasm project, which compiles Rust into WebAssembly.

CFG API

As mentioned earlier, Binaryen's native IR is an AST, but it can also receive input in an arbitrary control flow graph, which is a very common case. Binaryen can then "reloop" that code into structured control flow. This works for any CFG, even irreducible ones.

To do so, use the CFG/Relooper part of the C API, as follows:

Note that the relooper output is not optimized by default: you will see redundant blocks and so forth. If you optimize your code (using BinaryenModuleOptimize()) then you should see nice control flow.

See binaryen-c.h and src/CFG/Relooper.h for more technical details, and the test suite for concrete examples. For a full compiler, see the mir2wasm project mentioned earlier, which uses the CFG API in addition to the C API.

Debugging C API usage using tracing

The C API (including the CFG API) have a tracing option that prints out C API commands for each command you issue. The result is a runnable program that does the same things you were doing when you ran the trace. This lets you easily generate a standalone testcase from your compiler, without a dependency on your compiler itself, it will just do the same Binaryen C API calls that you did.

See BinaryenSetAPITracing in binaryen-c.h for more details. There is also an example of this in test/example/c-api-kitchen-sink.c, and check.py checks that tracing outputs a proper program for that by both matching against the known correct output, and also building and running it.

C++ API

C++ is what Binaryen is written in, and you can extend it in that language. Note though that the C API is likely going to be more stable, as internal APIs may change.

Catching fatal errors

The Binaryen tools handle many errors, including some common I/O errors, by immediately exiting. This behavior applies to some of the APIs that users of Binaryen as a library might call. When Binaryen encounters such an error it calls the Fatal type, which performs the early exit. It is possible to change this behavior at compile time by defining THROW_ON_FATAL, in which case Fatal will throw a std::runtime_error that clients can catch.

Running the generated WebAssembly

The end result of compilation is a WebAssembly binary. You can run that in a browser, giving it its imports, and receiving and calling its exports. That's all there is to it.

However, you may also want to use the Emscripten compiler infrastructure. Emscripten lets you "link" with JavaScript libraries to do useful things, like render WebGL, run a browser main loop, handle a filesystem, provide bindings to JS so it's easy to call into your compiled code, etc., as well as a full set of compiled libraries like libc and so forth. To use that, you need to provide Emscripten with your WebAssembly file as well as a metadata file, and call emcc. TODO: There are a few minor details to be finished on the Emscripten side for this to just work, but this is basically what already happens with the LLVM WebAssembly backend and Emscripten; we just need to generalize it a little.