WATaBoy: JIT-ing Game Boy Instructions to WebAssembly
WATaBoy: JIT-ing Game Boy Instructions to WebAssembly
WATaBoy is a Game Boy emulator that implements a "JIT-to-Wasm" architecture, where Game Boy (SM83) instructions are compiled into WebAssembly (Wasm) bytecode at runtime. This approach allows the emulator to leverage the browser's own JIT engine to convert Wasm into native machine code, effectively bypassing JIT restrictions on platforms like iOS while achieving performance that exceeds a native interpreter.
Performance Benchmarks
JIT-to-Wasm compilation outperforms both Wasm-based interpreters and native interpreters in CPU-bound emulation tasks. In benchmarks emulating Pokémon Blue, the JIT-to-Wasm implementation was approximately 1.2x faster than a native interpreter and 1.5x faster than an interpreter running in Wasm.
Performance varied across browser engines, with Safari showing the highest performance, followed by Chrome and Firefox. This suggests that the WebKit-only environment of iOS is not a performance bottleneck for this architecture.
The JIT-to-Wasm Architecture
Because WebAssembly uses a Harvard architecture, bytecode cannot be executed directly from memory. Instead, the emulator must interact with the browser's embedder (JavaScript) to compile and link new code.
Wasm Code Generation
WATaBoy uses the wasm-encoder crate in Rust to emit Wasm bytecode. The process involves building a Wasm module containing specific instructions (e.g., an add function) and exporting it so the embedder can access it.
Compiling and Linking
To execute the generated bytecode, WATaBoy follows a three-step process:
- Compile & Instantiate: The bytecode is passed to the JavaScript embedder, which compiles and instantiates it into a new Wasm instance.
- Link: The resulting function is added to the main module's indirect function table.
- Dispatch: The function is called using the
call_indirectWasm instruction, which invokes the function at a specific index in the table.
To support this, the Rust project requires specific LLD flags: --export-table to allow JavaScript access to the indirect function table and --growable-table to allow the table to expand as new JIT-compiled functions are added.
Maintaining Cycle Accuracy
To remain cycle-accurate while using JIT compilation, WATaBoy employs several techniques inspired by GameRoy:
- Interrupt Prediction: Predicting when interrupts will occur to determine when a JIT block must end.
- Interpreter Fallback: Falling back to the interpreter whenever a JIT block might be interrupted.
- Lazy Evaluation: Lazily evaluating non-CPU components accessed via Memory-Mapped I/O (MMIO).
Limitations and Future Work
While the JIT-to-Wasm approach is promising, it has significant limitations compared to native JITs:
- Codegen Tooling: Current Wasm bytecode generation is largely bespoke. The author notes a need for tools similar to DynASM or Cranelift that allow developers to write human-readable WAT that translates to bytecode at compile time.
- Memory Access: Certain low-level optimizations, such as Dolphin's "hardware fastmem," are impossible in Wasm because invalid memory accesses are irrecoverable within the Wasm runtime.
- PPU Bottlenecks: Current profiling shows that the Pixel Processing Unit (PPU) emulation still consumes the majority of runtime, largely due to unimplemented interrupt predictions.
Community Insights
Discussion among developers highlights the trade-off between interpreter overhead and Wasm overhead. As one contributor noted:
WASM overhead is about 20%, interpreter overhead is about 1000%. What's cool here is to have a GameBoy JIT runtime at all.
Other developers suggested that for simpler JITs, JavaScript's eval() or new Function() could be used to generate arithmetic-heavy functions at runtime, though the Wasm approach provides a more robust technical foundation for cross-platform emulation.