Home Fuzzing Microvium Embedded JavaScript Engine
Post
Cancel

Fuzzing Microvium Embedded JavaScript Engine

So I decided to fuzz Microvium. If you haven’t heard of it, Microvium is this tiny JavaScript engine designed for embedded systems - think IoT devices, microcontrollers, stuff where you’ve got maybe 16KB of RAM to work with https://github.com/coder-mike/microvium.

What is Microvium?

Microvium is a JavaScript engine for microcontrollers.It allows you to write your logic in JavaScript, compile it to a compact bytecode format, and run it on tiny devices with minimal overhead.

The architecture is pretty interesting:

  • Compiler (TypeScript/Node.js) - Runs on your dev machine, compiles JS to bytecode
  • Runtime (C) - Tiny VM that runs on the device, executes the bytecode
  • Snapshot format (.mvm-bc files) - The compiled bytecode that gets loaded

The bytecode files are typically 1-10KB, and the C runtime adds maybe 8-16KB to your firmware. For embedded systems, that’s actually pretty reasonable.

Reviewing the Package: What Should We Fuzz?

Before you start fuzzing, you need to understand what you’re fuzzing.I spent some time digging through the Microvium codebase to figure out the attack surface.The codebase is split into two main parts:

  • A TypeScript compiler (src-to-il/) that turns JavaScript into bytecode
  • A C runtime (native-vm/) that executes that bytecode

The C runtime (dist-c/microvium.c) has several key functions:

  • mvm_restore() - Loads bytecode snapshots into memory
  • mvm_call() - Executes exported functions
  • mvm_runGC() - Garbage collection
  • mvm_resolveExports() - Resolves function exports from bytecode

The bytecode format itself is structured with a header containing:

  • Version info
  • CRC checksums
  • Section offsets (pointers to different parts of the bytecode)
  • Export tables
  • ROM heap data

Looking at this, the most obvious target is mvm_restore(). It’s parsing untrusted binary data (the bytecode file), doing pointer arithmetic, and copying memory around. Classic fuzzing target. This function takes untrusted bytecode and parses it. If there’s going to be a bug, it’s probably here.

Looking at the bytecode format, I found a header structure:

1
2
3
4
5
6
7
8
9
typedef struct mvm_TsBytecodeHeader {
  uint8_t bytecodeVersion;
  uint8_t headerSize;
  // ...
  uint16_t bytecodeSize;
  uint16_t crc;
  uint16_t sectionOffsets[8];  // Offsets to different bytecode sections
  // ...
} mvm_TsBytecodeHeader;

Those sectionOffsets? They’re read directly from the bytecode file and used to locate data in memory. No validation.

Building the Harness

For Microvium, I needed something that would:

  1. Take arbitrary binary data (the fuzzed bytecode)
  2. Try to restore a VM from it
  3. Try to call exported functions
  4. Run garbage collection
  5. Clean up

Here’s what the core of the harness looked like (simplified):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    mvm_VM* vm = NULL;

    // Try to restore VM from fuzzed bytecode
    mvm_TeError err = mvm_restore(&vm, data, size, NULL, resolveImport);

    if (err != MVM_E_SUCCESS || !vm) {
        return 0;  // Invalid bytecode, move on
    }

    // Try to resolve and call exports
    mvm_VMExportID exportID = 1234;
    mvm_Value exportValue;
    err = mvm_resolveExports(vm, &exportID, &exportValue, 1);

    if (err == MVM_E_SUCCESS) {
        // Try calling with various argument counts
        for (int argCount = 0; argCount <= 4; argCount++) {
            mvm_Value args[4] = {0};
            mvm_Value result;
            mvm_call(vm, exportValue, &result, args, argCount);
        }
    }

    // Trigger GC
    mvm_runGC(vm, false);

    // Cleanup
    mvm_free(vm);
    return 0;
}

The harness is intentionally permissive. It doesn’t care if operations fail - that’s expected with random input.

Compile with AddressSanitizer and libFuzzer:

1
2
3
4
clang -fsanitize=address,fuzzer,undefined \
      -O2 -g \
      fuzz_target.c microvium.c \
      -o fuzzer

The sanitizers are crucial here. AddressSanitizer catches memory corruption bugs, UndefinedBehaviorSanitizer catches… well, undefined behavior.

Building a Corpus

For Microvium, I needed valid bytecode files. I created a corpus builder that would:

  1. Write simple JavaScript programs covering different features
  2. Compile them to bytecode using the Microvium compiler
  3. Save the bytecode files as seeds

I wrote 20 test programs covering things like:

  • Basic arithmetic
  • Loops and conditionals
  • Arrays and objects
  • Closures and recursion
  • Edge cases like empty functions

example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// ... 18 more prog, mExportf);
fun f(nunction) dd{
v r = 1  ;eturn
f (v ;
}
// 02-loops.js
vmExport(1, factorial);
function factorial(n) {
  var result = 1;
  for (var i = 2; i <= n; i++) {
    result *= i;
  }
  return result;
}
// ... 18 more programs

Basic operations:

  • Arithmetic (1 + 2 * 3)
  • Loops (for, while)
  • Conditionals (if/else, ternary)
  • Arrays and objects
  • String manipulation
  • Recursion (fibonacci, GCD)
  • Closures

Edge cases:

  • Empty functions
  • Large arrays
  • Deep nesting
  • Many exports
  • Bitwise operations

I also added some hand-crafted edge cases:

  • Empty file (0 bytes)
  • Single byte
  • All zeros (64 bytes)
  • All 0xFF bytes (64 bytes)
  • Sequential bytes 0x00-0xFF

Then I compiled them all to bytecode. This gave me a corpus of 17 valid bytecode files, ranging from 46 bytes to a few hundred bytes. Small files are great for fuzzing because the fuzzer can explore the input space more quickly.

Fuzzing with AFL++

I started with AFL++ because I’m familiar with it. Set it up in Docker:

1
2
3
4
5
6
7
FROM aflplusplus/aflplusplus
COPY . /workspace
WORKDIR /workspace/fuzz
# Compile with AFL instrumentation
RUN afl-clang-fast -fsanitize=address \ 
    fuzz_target_afl.c ../dist-c/microvium.c \
    -I../dist-c -o fuzzer-afl

Then run it:

1
afl-fuzz -i corpus -o findings -m none -D -- ./fuzzer-afl @@

The -m none flag is important - it disables the memory limit because ASAN uses a lot of virtual memory.

AFL++ started running… and ran for about 2 hours. It found some interesting paths, but no crashes.

After debugging, i figured out the problem? AFL++ is great at finding new code paths, but the Microvium bytecode format has a CRC check, and AFL’s mutations were breaking it. So most inputs got rejected immediately at the CRC validation stage.

AFL tracks which basic blocks get executed and tries to find inputs that hit new blocks. But Microvium’s bytecode parser has a lot of early validation that rejects malformed input. The fuzzer was spending most of its time generating bytecode that failed basic CRC checks or header validation and got rejected immediately.

The bugs I eventually found were deeper in the code, past several layers of parsing. AFL’s mutations weren’t sophisticated enough to generate bytecode that was “valid enough” to reach those code paths while still being “invalid enough” to trigger the bugs.

Fuzzing with Fuzzilli

Next, I tried Fuzzilli. This is a JavaScript engine fuzzer that Google built for finding bugs in V8, JavaScriptCore, and SpiderMonkey. It’s structure-aware - instead of mutating bytes, it generates semantically valid JavaScript programs.

The idea was: let Fuzzilli generate JavaScript, compile it with Microvium’s compiler, and fuzz the result. This would test both the compiler and the runtime end-to-end.

I spent a good amount of time on the setup:

  1. Created a custom Microvium profile for Fuzzilli (using Claude)
  2. Implemented the REPRL (Read-Eval-Print-Reset-Loop) protocol
  3. Set up the Docker environment
  4. Integrated everything

The integration looked promising:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// fuzzilli-reprl.js - REPRL target
const Microvium = require('./dist/lib');
while (true) {
    const jsCode = readFuzzilliInput();

    // Compile JS to bytecode
    const snapshot = Microvium.Snapshot.fromSource(jsCode);
    const bytecode = snapshot.get Data();

    // Execute on C runtime
    const vm = NativeVM.restore(bytecode);
    vm.call(exportedFunction);
    vm.runGC();
}

But Claude missed something obvious.Fuzzilli uses coverage feedback to guide its JavaScript generation. It needs to know which parts of the engine are being executed so it can evolve its test cases toward unexplored code paths. This requires the target to be compiled with LLVM’s SanitizerCoverage.

The problem? My Fuzzilli target was a Node.js script. The Microvium compiler runs in TypeScript, compiled to JavaScript, running in Node. There’s no way to get native code coverage from that.

When Fuzzilli starts up, it checks for coverage:

1
2
3
4
5
// From Fuzzilli's libcoverage/coverage.c
if (num_edges == 0) {
    fprintf(stderr, "[LibCoverage] Coverage bitmap size could not be determined...\n");
    exit(-1);  // Fuzzilli exits here
}

And that’s exactly what happened. Fuzzilli printed the error and immediately exited. No fuzzing, no bugs, no glory.

I could have built a C-based Fuzzilli target, but that would only fuzz the runtime, not the compiler. The whole point was end-to-end testing. So Fuzzilli was a dead end. I would have needed to build a C-based JavaScript evaluator with SanitizerCoverage instrumentation, which would have been a massive undertaking just to test the compiler.

So I moved on. I also found fuzzilli codebase to be quite old and struggled to build it

Fuzzing with libFuzzer

libFuzzer is simpler than AFL++ or Fuzzilli, but it’s effective. Coverage-guided, fast, and the sanitizer integration is seamless. libFuzzer is LLVM’s coverage-guided fuzzer, and unlike AFL++, it works at a more fine-grained level. It uses inline coverage instrumentation, which gives it more detailed feedback about code paths.

The setup was cleaner too:

1
2
3
4
5
6
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y clang
# Compile with libFuzzer and sanitizers
RUN clang -fsanitize=fuzzer,address,undefined \
    fuzz_target.c ../dist-c/microvium.c \
    -I../dist-c -o fuzzer

Then just run it:

1
./fuzzer corpus/ -max_len=65536 -timeout=10

AddressSanitizer (ASAN) detects memory errors like buffer overflows. UndefinedBehaviorSanitizer (UBSAN) catches undefined behavior like null pointer dereferences.

Within the first 5 minutes, libFuzzer found a crash.

1
2
3
4
5
6
7
=================================================================
==1==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 45568 at 0xfdb695800e7e thread T0
    #0 in __asan_memcpy
    #1 in memcpy_long /dist-c/microvium.c:8505:3
    #2 in mvm_restore /dist-c/microvium.c:5395:3
0xfdb695800e7e is located 0 bytes to the right of 46-byte region

It tried to read 45,568 bytes from a 46-byte buffer. Over the next 4 hours, libFuzzer found 3 more unique crashes.

Finding and Reviewing Crashes

libFuzzer saves crashing inputs to the crashes/ directory. Each crash is a small bytecode file that reliably triggers the bug.

I used xxd to hexdump each crash file and compared it against valid bytecode to understand what triggered each bug.

Valid bytecode header:

Offset Bytes Field Value
0x00 08 bytecodeVersion 8
0x01 1c headerSize 28
0x04 00 2e bytecodeSize 46 bytes
0x0C 00 1c sectionOffsets[0] 28
0x0E 00 1c sectionOffsets[1] 28
0x10 00 1c sectionOffsets[2] 28
0x12 00 2a sectionOffsets[3] 42

CVE-1 crash file:

Offset Bytes Field Value
0x10 49 00 sectionOffsets[2] 73
0x12 00 c4 sectionOffsets[3] 50,176

The fuzzer had corrupted the section offsets in the bytecode header. When mvm_restore() tried to read section 3, it attempted to read 50,176 bytes from a 46-byte buffer - massive heap overflow.

The VM blindly trusted these offsets and used them in memcpy() without any bounds checking.

Crash 1: CVE-1 - Heap Buffer Overflow in mvm_restore()

File: crash-7066f3dea063a8399d4e8cb5777cd5b236b105ee (46 bytes)

Looking at the hexdump:

1
2
3
00000000: 081c 0000 2e00 5010 0300 0000 1c00 1c00  ......P.........
00000010: 1c49 0000 c400 1c2f 2a00 2ab2 2a80 2a00  .I...../*.*.*.*.
00000020: 2e00 2d00 0100 0101 0000 1900 0100       ..-...........

I compared this to a valid bytecode file. The difference? Bytes 0x10-0x1B - the section offsets.

Valid bytecode:
Offset 0x12: 0x002A = 42 (within 46-byte file)
Offset 0x14: 0x002A = 42 (within 46-byte file)

Malicious bytecode:
Offset 0x12: 0xC400 = 50,176 (way beyond file!)
Offset 0x14: 0x2A2F = 10,799 (way beyond file!)

The code in mvm_restore() trusts these offsets and uses them in memcpy():

1
2
// microvium.c:5395
memcpy_long(destination, source, size); // No bounds checking!

When mvm_restore() tries to read section 3 at offset 50,176 from a 46-byte buffer… boom. Heap buffer overflow.

Impact: Memory corruption, potential RCE. An attacker can control how far past the buffer we read, potentially leaking sensitive data from the heap.

Code location: microvium.c:5395

Crash 2: CVE-2 - Heap Buffer Overflow in vm_resolveExports()

File: crash-8800363e5293536ed58e24cf4336d09e47f45a79 (46 bytes)

Similar issue, different location:

1
2
3
4
5
6
ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 2 at 0xff050d400e80 thread T0
    #0 in LongPtr_read2_aligned /dist-c/microvium.c:6170:21
    #1 in vm_resolveExport /dist-c/microvium.c:6813:31
    #2 in mvm_resolveExports /dist-c/microvium.c:6835:23
0xff050d400e80 is located 2 bytes to the right of 46-byte region

This one reads 2 bytes past the buffer when resolving the export table. Smaller overflow, but still bad - can leak adjacent heap data.

Code location: microvium.c:6813

Crash 3: CVE-3 - Assertion Failure in vm_getHandleTargetOrNull()

File: crash-9fdcc3d9b0f2fb1f87b088f37d4856013645bfac (46 bytes)

1
2
3
4
fuzzer: dist-c/microvium.c:7551: Assertion `false' failed.
    #9  in vm_getHandleTargetOrNull /dist-c/microvium.c:7551:3
    #10 in vm_resolveIndirections /dist-c/microvium.c:7564:19
    #11 in mvm_call /dist-c/microvium.c:2162:23

This one’s different - it’s a reachable assert(false). Looking at the code:

1
2
3
4
5
6
7
8
// microvium.c:7551
switch (handleType) {
  case VM_HANDLE_TYPE_X: ...
  case VM_HANDLE_TYPE_Y: ...
  default:
    assert(false); // Reachable with malformed bytecode!
    return NULL;
}

In production builds (where assert() is typically disabled), this would return NULL and likely cause a null pointer dereference. But even in debug builds, crashing on bad input is a DoS vulnerability.

Impact: Denial of Service

Code location: microvium.c:7551

Crash 4: CVE-4 - Assertion Failure in ShortPtr_decode()

File: crash-53559a3b76bfdc9ac6506e93ad1763e92416a2fd (78 bytes)

1
2
3
4
5
fuzzer: dist-c/microvium.c:6009: Assertion `false' failed.
    #9  in ShortPtr_decode /dist-c/microvium.c:6009:5
    #10 in gc_processShortPtrValue /dist-c/microvium.c:6270:37
    #11 in gc_processValue /dist-c/microvium.c:6453:5
    #12 in mvm_runGC /dist-c/microvium.c:6547:5

Same pattern - reachable assert(false) during garbage collection. If GC crashes, your entire VM is toast.

Code location: microvium.c:6009

List of Findings

Here’s the summary:

ID Type
CVE-1 Heap Buffer Overflow (Write)
CVE-2 Heap Buffer Overflow (Read)
CVE-3 Reachable Assertion
CVE-4 Reachable Assertion

Key Takeaways from fuzzing

What worked:

  • libFuzzer + ASAN = perfect combo for finding memory corruption
  • Starting with a good corpus of valid inputs is crucial
  • Fuzzing the bytecode parser was the right target

What didn’t:

  • AFL++ couldn’t find the bugs (probably due to CRC/structure)
  • Fuzzilli can’t work without coverage instrumentation

The bugs:

  • All stemmed from trusting bytecode header fields without validation
  • Attackers can hex-edit bytecode files to trigger crashes
  • Even though JavaScript can’t directly trigger these (the compiler generates valid bytecode), malicious bytecode files are a real attack vector

Real-world impact:

  • IoT devices loading untrusted bytecode could be crashed or compromised
  • Supply chain attacks via malicious npm packages
  • OTA updates could be weaponized

This was a great exercise in fuzzing embedded systems. The bugs were hiding in plain sight - classic “trust user input” mistakes that fuzzing exposed in a few hours.

Root Cause: Trusting Bytecode

All four vulnerabilities stem from the same architectural flaw: the VM trusts bytecode completely.

Microvium assumes bytecode is well-formed because it comes from the official compiler. No signature verification, no runtime validation, nothing. Just parse and execute.

This is dangerous because:

  1. Direct bytecode manipulation: An attacker can hex-edit .mvm-bc files to inject malicious offsets
  2. Supply chain attacks: Compromised build servers could inject malicious bytecode during compilation
  3. No defense in depth: One vulnerability in the compiler or VM compromises everything

The Microvium workflow is:

JavaScript → Compiler → Bytecode → Device → Execution

If an attacker controls any step after JavaScript, they can exploit these bugs.

Can JavaScript Trigger These?

I spent time researching whether normal JavaScript code could trigger these crashes through the compiler.

Short answer: Probably not through the standard compiler.

The TypeScript compiler calculates section offsets based on actual content it generates. Unless there’s a compiler bug, it should always produce valid offsets.

But: An attacker doesn’t need to go through the compiler. They can:

  1. Compile legitimate JavaScript
  2. Hex-edit the .mvm-bc file
  3. Modify bytes 0x10-0x1B (section offsets)
  4. Distribute the malicious bytecode

I verified this with a simple test:

1
2
3
4
5
6
# Start with valid bytecode
cp corpus/addition.bin evil.mvm-bc
# Corrupt one section offset
printf '\xc4' | dd of=evil.mvm-bc bs=1 seek=18 count=1 conv=notrunc
# Test (triggers heap overflow)
./fuzzer evil.mvm-bc -runs=1

Boom. Instant crash.

Recommendations

For the Microvium team, I’d recommend:

Immediate (Critical):

  1. Add bounds checking in mvm_restore():
1
2
3
4
5
for (int i = 0; i < 8; i++) {
    if (header->sectionOffsets[i] > bytecodeSize) {
        return MVM_E_INVALID_BYTECODE;
    }
}
  1. Replace assertions with error returns:
1
2
// Instead of: assert(false);
return MVM_E_INVALID_POINTER_TYPE;
  1. Implement bytecode signature verification:
1
2
3
4
// Application-level protection
if (!verify_signature(bytecode, signature, PUBLIC_KEY)) {
    return MVM_E_UNTRUSTED_BYTECODE;
}

Maintainer was contacted via email but got not reply, ended up posting as a GitHub issue - https://github.com/coder-mike/microvium/issues/87

This post is licensed under CC BY 4.0 by the author.