2024-04-27

Thoughts on WASM

WASM has this great veil of misunderstanding around it and I don’t want to try to remove it because in the process I might end up creating more confusion. All I want to do is give you some ideas and experiences I’ve had writing a simple library in C, compiling it to WASM and using it from JavaScript. Because of timing, some soft requirements and pure curiosity I’ve ended up writing fast-solidity-parser. It’s a parser for the Solidity programming language written from scratch with the focus on speed. While the core code is easily portable and this library could in theory be ported to any language that supports FFI the main focus was running it from JavaScript. I’ve gone from never-ever using WASM in my life to learning some interesting stuff about it.

A small primer on WASM might be required. WASM (or Web Assembly) is a stack based low-level programming language aimed at an imaginary machine. It means is that there is no CPU that executes WASM instructions (I’m going to start calling them opcodes from here on out) directly. The idea behind WASM is that it’s a very simple and small instruction set for which you could easily write an emulator or compile it to something else. You can imagine it as the bytecode representation in Java or C#. But as you can surmise from the name, the origin of that standard is web. That didn’t stop people from using it in places that have nothing to do with web, for example Zig uses it to bootstrap compilation of their compiler.

The reason WASM was born is that JavaScript is at it’s core a scripting language. Let’s set the scene. Every big web company knows that lower latency equals higher engagement equals higher profits. Google had a problem, most of their profits come from people browsing the web and seeing their ads. But if the web is going to be so slow that it’s unusable, less people are going to see their ads. The solution? Chrome. Specifically the V8 engine, which over the years started doing incredible things with JavaScript. If you really want, you can even say that web is what it is currently because of V8. It has the ability to take JavaScript and JIT it. To JIT means compiling during runtime to native machine instructions for a given architecture which greatly increases JavaScript performance. But to do a good job of compiling to machine instructions we need to know if a given variable is a number, an object, a string or something completely different. There is a lot of observing, guessing, recompiling if things turn out to be different. What you as the V8 team would want as input is not JavaScript. You want something set in stone and predictable. The solution? As always, a layer of indirection called Web Assembly. You can do a perfect job of compiling WASM into the host machine instructions first try. Plus in the process you don’t need to do anything that JavaScript expects (bound checks etc.).

Why did I call Web Assembly an indirection layer? You need to understand that nobody is programming in Web Assembly directly, but rather they compile their C, C++, Rust, or Zig into WASM. This move all of a sudden gives you the ability to run programs like graphviz, git, sqlite and any other program that was before reserved for native only in the browser. Presume that you want to take a dive into this world, what are the benefits, drawbacks and things you have to think about while exploring? Personally I think that you should be mindful of the following:

Performance
Manual memory management and garbage collection
Interfacing with JavaScript
Porting an exiting program
Debugging
The skill ceiling

Performance

One camp says that speed is something that is the only reason one might want to use WASM. While the other camp says that WASM is actually slower than native JavaScript. So what is true? Both are true, but also both are false at the same time. If you think that the there is only one answer to this question you’re objectively in the wrong. To get more concrete data about it I recommend reading this blog post. But I’m going to give you my couple of cents.

Beginning with startup latency where Web Assembly shines. There is a latency to load the module which means to take the bytecode and translate it to native machine instructions in one step. In my experience this latency is small, so small in fact that in my library I’m synchronously loading the module on the first call to the JavaScript wrapper. For a module of approximately 100KB it takes around 2ms (M2 Pro). And after that time you are ready to rock and roll, full performance, there is no warm up period, there are no stutters. JavaScript on the other hand is a complex language, it takes some effort to parse it. And even after parsing, engines like V8 still run it in interpreted mode first, which means lower performance. Only after some time will it gather enough information to take a shot at JITing it. But there is another problem for JavaScript, JITing is not free. Profiling the library mentioned above I was stunned that JITing around 1000 lines of pure JavaScript took around 40% of time in certain benchmarks.

This leads us to runtime performance. While it’s hard to judge, I think that the ceiling is way higher for WASM. If you know how to use it, you can achieve speedups of 10-50x based on use case. Why? First reason is that JITing only gets you so far. Even if you manage to create a perfect JIT, you still need to respect rules of JavaScript. For example: accessing a property that does not exist on an object should return undefined. This does not come for free, we need to insert an additional check for every .prop access. Now imagine how this sort of thing spreads given all the strange rules in JavaScript. For some of these cases the engine can understand that the property will never be undefined but it’s not guaranteed. It’s only a tool. Meanwhile in WASM world you just do whatever you please. Dereference a NULL pointer? Why not. Index outside the bounds of an array? It’s basically the same thing. Pointer arithmetic gone crazy? Of course! And the kicker is that this is the default operating mode of your CPU anyway. So unless you check array bounds yourself, the instruction being executed are lean. This gives you the ability to write C code and see how that is going to translate to simple instructions. You get some reduction in the amount of instructions executed, I’d say that it’s around 1.5x-2.0 on average. But wait, you said 10x-50x! Have you heard about SIMD my dear friend? WASM gives you access to 128 bit SIMD opcodes which for tasks like text processing can speed your code up by 16x. I’m not going to go into the details of SIMD, how it works and where else you could use it. If you’re curious - good, read about it.

Manual memory management and garbage collection

By default, WASM gives you only two opcodes to manage memory: memory.size and memory.grow. That’s it. If you’re used to calling malloc() and free() left and right you’ll need to BYOM - Bring Your Own Malloc. emscripten is trying to make WASM development easier for new comers by implementing the whole libc and providing a malloc implementation. So if you want to get your feet wet, give it a shot. I’d argue that most applications can be written without using malloc() and free() which could substituted by arena allocators. Whatever you choose you get one benefit. No garbage collector pauses. This might become really useful if you want to optimize one hotspot in your JavaScript application that does a bunch of small memory operations. One single con: 4GB limit. Because WASM is a 32bit architecture and right now there is no 64bit alternative you’re stuck with only 4GB. And before you take this point and run away with it, show me how your JavaScript application handles dealing with 4GB worth of data. One small pro of this con is that pointers are only 4bytes :)

Interfacing with JavaScript

This one is a con. The way WASM and JavaScript interact is by sending strings or byte arrays between each other. Let’s study an example from the parser I’ve mentioned above. After parsing I’ve a resulting AST in WASM memory, I want to build a JavaScript object in WASM and return it. How to do it? I can’t. What I have to do is serialize the entire tree in WASM and deserialize it from JS. This interfacing is the slowest part of WASM. Especially if you decide on serializing to and from using JSON. For speed you’ll need to create a simple binary format. There is a copy every time you want to send something to WASM. And if the result you want is not a clear number/float/array buffer then you have to do some dance of serializing and deserializing. In the example of the parser, half of the time is spent deserializing the tree. There are numerous reasons as to why, for example constant GC stalls because a lot of memory is allocated. To make this as painless as possible you need to find a clear line that you can divide out. A standalone module or library. Do as much you can inside WASM land and as little as possible in JS land. If you want to send some complex object to WASM, good luck. If the schema of the object is not known before hand, don’t even try. Simplest data transforms work the best:

Transform type	In	Out
compression/decompression	bytes	bytes
parsing	string	AST
validation	string	bool
math	float[]	float[]

Porting an exiting program

Imagine that you want to compile git to WASM and run it in the browser. Easy, right? No so fast cowboy. Like I’ve said above, there is no libc. So all of your calls to printf, malloc and fopen will fail. But wait, how can you even call fopen if there is no file system in the browser? The browser won’t let a webpage open a file on the host computer. It would be a catastrophe. If you take the emscripten approach then what they do is to emulate a file system for you. Still, you have to realise that most applications that were written with the idea of being ran in a desktop environment might be cumbersome to port. I presume that some applications compiled with emscripten might take a big performance hit because they make heavy use of libc. And after all this bad, bad, bad it’s still amazing in some sense that you can do it. Imagine telling somebody in the year 2005 that you will be able to run sqlite in the browser window. I believe that WASM is an example of a good level of indirection. This allows you to import and use a vast library of code that has been written and tested before hand. For example, graphivz is from what I’ve read the best graph drawing program out there. It’s like this because quoting some random user from the internet

apparently graph drawing is almost as hard as building a fusion reactor

You’d want to reuse that code that solves a problem that others struggle to solve.
Not to mention that you don’t need to wait for some consortium to agree on supporting zstd in JavaScript. You can just compile the main implementation and use it without problems at near native speed. WASM is the ultimate freedom in this regard.

Debugging

It’s really bad at the moment. All you can do is print debugging and stack traces produced are incomprehensible because DWARF support is in very alpha from what I understand. Maybe in the future you’ll be able to attach the Chrome debugger and step through your C/Rust/Zig sources but at the moment be ready for a lot of logging. Or you could build for two targets, one being WASM and the other being native. You build the native version and attach the debugger to it. But still, it’s not ideal. I’d write some small to medium libraries in WASM but writing a whole application in WASM seems like a big pain in the future when it becomes bigger.

The skill ceiling

If you don’t know what you’re doing and you…
…expect to write the same code as in JS, turn away.
…want to learn something, just compile to your native target.
But if you know what you’re doing and have been around the low-level programming block couple of times, there is value to be mined here. Just like JS, the amount of performance and memory reduction you can squeeze out of your code comes down to two factors. The amount of time you have and your knowledge. But WASM allows you to get even more gains because it does not require that you follow all the rules of JS. It only requires that you follow the basic principles of programming. The heights that WASM allows you to reach are beyond reach of pure JS. Take for example Figma which is universally praised for being a fast in-web application. Their team is vocal about using WASM for things that are performance sensitive. But the bigger point to make about WASM is about realizing the potential. The common myth states that people only use 10% of their brain. Well, for CPUs it’s true. People only use a fraction of a percent of their CPU. WASM allows you to at least try and use a high double digit percent of one core. Which blows everything else out of the water.