So I decided to fuzz Microvium. If you haven’t heard of it, Microvium is this tiny JavaScript engine designed for embedded systems - think IoT devices, microcontrollers, stuff where you’ve got maybe 16KB of RAM to work with https://github.com/coder-mike/microvium.
What is Microvium?
Microvium is a JavaScript engine for microcontrollers.It allows you to write your logic in JavaScript, compile it to a compact bytecode format, and run it on tiny devices with minimal overhead.
The architecture is pretty interesting:
- Compiler (TypeScript/Node.js) - Runs on your dev machine, compiles JS to bytecode
- Runtime (C) - Tiny VM that runs on the device, executes the bytecode
- Snapshot format (.mvm-bc files) - The compiled bytecode that gets loaded
The bytecode files are typically 1-10KB, and the C runtime adds maybe 8-16KB to your firmware. For embedded systems, that’s actually pretty reasonable.
Reviewing the Package: What Should We Fuzz?
Before you start fuzzing, you need to understand what you’re fuzzing.I spent some time digging through the Microvium codebase to figure out the attack surface.The codebase is split into two main parts:
- A TypeScript compiler (src-to-il/) that turns JavaScript into bytecode
- A C runtime (native-vm/) that executes that bytecode
The C runtime (dist-c/microvium.c) has several key functions:
- mvm_restore() - Loads bytecode snapshots into memory
- mvm_call() - Executes exported functions
- mvm_runGC() - Garbage collection
- mvm_resolveExports() - Resolves function exports from bytecode
The bytecode format itself is structured with a header containing:
- Version info
- CRC checksums
- Section offsets (pointers to different parts of the bytecode)
- Export tables
- ROM heap data
Looking at this, the most obvious target is mvm_restore(). It’s parsing untrusted binary data (the bytecode file), doing pointer arithmetic, and copying memory around. Classic fuzzing target. This function takes untrusted bytecode and parses it. If there’s going to be a bug, it’s probably here.
Looking at the bytecode format, I found a header structure:
1
2
3
4
5
6
7
8
9
typedef struct mvm_TsBytecodeHeader {
uint8_t bytecodeVersion;
uint8_t headerSize;
// ...
uint16_t bytecodeSize;
uint16_t crc;
uint16_t sectionOffsets[8]; // Offsets to different bytecode sections
// ...
} mvm_TsBytecodeHeader;
Those sectionOffsets? They’re read directly from the bytecode file and used to locate data in memory. No validation.
Building the Harness
For Microvium, I needed something that would:
- Take arbitrary binary data (the fuzzed bytecode)
- Try to restore a VM from it
- Try to call exported functions
- Run garbage collection
- Clean up
Here’s what the core of the harness looked like (simplified):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
mvm_VM* vm = NULL;
// Try to restore VM from fuzzed bytecode
mvm_TeError err = mvm_restore(&vm, data, size, NULL, resolveImport);
if (err != MVM_E_SUCCESS || !vm) {
return 0; // Invalid bytecode, move on
}
// Try to resolve and call exports
mvm_VMExportID exportID = 1234;
mvm_Value exportValue;
err = mvm_resolveExports(vm, &exportID, &exportValue, 1);
if (err == MVM_E_SUCCESS) {
// Try calling with various argument counts
for (int argCount = 0; argCount <= 4; argCount++) {
mvm_Value args[4] = {0};
mvm_Value result;
mvm_call(vm, exportValue, &result, args, argCount);
}
}
// Trigger GC
mvm_runGC(vm, false);
// Cleanup
mvm_free(vm);
return 0;
}
The harness is intentionally permissive. It doesn’t care if operations fail - that’s expected with random input.
Compile with AddressSanitizer and libFuzzer:
1
2
3
4
clang -fsanitize=address,fuzzer,undefined \
-O2 -g \
fuzz_target.c microvium.c \
-o fuzzer
The sanitizers are crucial here. AddressSanitizer catches memory corruption bugs, UndefinedBehaviorSanitizer catches… well, undefined behavior.
Building a Corpus
For Microvium, I needed valid bytecode files. I created a corpus builder that would:
- Write simple JavaScript programs covering different features
- Compile them to bytecode using the Microvium compiler
- Save the bytecode files as seeds
I wrote 20 test programs covering things like:
- Basic arithmetic
- Loops and conditionals
- Arrays and objects
- Closures and recursion
- Edge cases like empty functions
example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// ... 18 more prog, mExportf);
fun f(nunction) dd{
v r = 1 ;eturn
f (v ;
}
// 02-loops.js
vmExport(1, factorial);
function factorial(n) {
var result = 1;
for (var i = 2; i <= n; i++) {
result *= i;
}
return result;
}
// ... 18 more programs
Basic operations:
- Arithmetic (1 + 2 * 3)
- Loops (for, while)
- Conditionals (if/else, ternary)
- Arrays and objects
- String manipulation
- Recursion (fibonacci, GCD)
- Closures
Edge cases:
- Empty functions
- Large arrays
- Deep nesting
- Many exports
- Bitwise operations
I also added some hand-crafted edge cases:
- Empty file (0 bytes)
- Single byte
- All zeros (64 bytes)
- All 0xFF bytes (64 bytes)
- Sequential bytes 0x00-0xFF
Then I compiled them all to bytecode. This gave me a corpus of 17 valid bytecode files, ranging from 46 bytes to a few hundred bytes. Small files are great for fuzzing because the fuzzer can explore the input space more quickly.
Fuzzing with AFL++
I started with AFL++ because I’m familiar with it. Set it up in Docker:
1
2
3
4
5
6
7
FROM aflplusplus/aflplusplus
COPY . /workspace
WORKDIR /workspace/fuzz
# Compile with AFL instrumentation
RUN afl-clang-fast -fsanitize=address \
fuzz_target_afl.c ../dist-c/microvium.c \
-I../dist-c -o fuzzer-afl
Then run it:
1
afl-fuzz -i corpus -o findings -m none -D -- ./fuzzer-afl @@
The -m none flag is important - it disables the memory limit because ASAN uses a lot of virtual memory.
AFL++ started running… and ran for about 2 hours. It found some interesting paths, but no crashes.
After debugging, i figured out the problem? AFL++ is great at finding new code paths, but the Microvium bytecode format has a CRC check, and AFL’s mutations were breaking it. So most inputs got rejected immediately at the CRC validation stage.
AFL tracks which basic blocks get executed and tries to find inputs that hit new blocks. But Microvium’s bytecode parser has a lot of early validation that rejects malformed input. The fuzzer was spending most of its time generating bytecode that failed basic CRC checks or header validation and got rejected immediately.
The bugs I eventually found were deeper in the code, past several layers of parsing. AFL’s mutations weren’t sophisticated enough to generate bytecode that was “valid enough” to reach those code paths while still being “invalid enough” to trigger the bugs.
Fuzzing with Fuzzilli
Next, I tried Fuzzilli. This is a JavaScript engine fuzzer that Google built for finding bugs in V8, JavaScriptCore, and SpiderMonkey. It’s structure-aware - instead of mutating bytes, it generates semantically valid JavaScript programs.
The idea was: let Fuzzilli generate JavaScript, compile it with Microvium’s compiler, and fuzz the result. This would test both the compiler and the runtime end-to-end.
I spent a good amount of time on the setup:
- Created a custom Microvium profile for Fuzzilli (using Claude)
- Implemented the REPRL (Read-Eval-Print-Reset-Loop) protocol
- Set up the Docker environment
- Integrated everything
The integration looked promising:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// fuzzilli-reprl.js - REPRL target
const Microvium = require('./dist/lib');
while (true) {
const jsCode = readFuzzilliInput();
// Compile JS to bytecode
const snapshot = Microvium.Snapshot.fromSource(jsCode);
const bytecode = snapshot.get Data();
// Execute on C runtime
const vm = NativeVM.restore(bytecode);
vm.call(exportedFunction);
vm.runGC();
}
But Claude missed something obvious.Fuzzilli uses coverage feedback to guide its JavaScript generation. It needs to know which parts of the engine are being executed so it can evolve its test cases toward unexplored code paths. This requires the target to be compiled with LLVM’s SanitizerCoverage.
The problem? My Fuzzilli target was a Node.js script. The Microvium compiler runs in TypeScript, compiled to JavaScript, running in Node. There’s no way to get native code coverage from that.
When Fuzzilli starts up, it checks for coverage:
1
2
3
4
5
// From Fuzzilli's libcoverage/coverage.c
if (num_edges == 0) {
fprintf(stderr, "[LibCoverage] Coverage bitmap size could not be determined...\n");
exit(-1); // Fuzzilli exits here
}
And that’s exactly what happened. Fuzzilli printed the error and immediately exited. No fuzzing, no bugs, no glory.
I could have built a C-based Fuzzilli target, but that would only fuzz the runtime, not the compiler. The whole point was end-to-end testing. So Fuzzilli was a dead end. I would have needed to build a C-based JavaScript evaluator with SanitizerCoverage instrumentation, which would have been a massive undertaking just to test the compiler.
So I moved on. I also found fuzzilli codebase to be quite old and struggled to build it
Fuzzing with libFuzzer
libFuzzer is simpler than AFL++ or Fuzzilli, but it’s effective. Coverage-guided, fast, and the sanitizer integration is seamless. libFuzzer is LLVM’s coverage-guided fuzzer, and unlike AFL++, it works at a more fine-grained level. It uses inline coverage instrumentation, which gives it more detailed feedback about code paths.
The setup was cleaner too:
1
2
3
4
5
6
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y clang
# Compile with libFuzzer and sanitizers
RUN clang -fsanitize=fuzzer,address,undefined \
fuzz_target.c ../dist-c/microvium.c \
-I../dist-c -o fuzzer
Then just run it:
1
./fuzzer corpus/ -max_len=65536 -timeout=10
AddressSanitizer (ASAN) detects memory errors like buffer overflows. UndefinedBehaviorSanitizer (UBSAN) catches undefined behavior like null pointer dereferences.
Within the first 5 minutes, libFuzzer found a crash.
1
2
3
4
5
6
7
=================================================================
==1==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 45568 at 0xfdb695800e7e thread T0
#0 in __asan_memcpy
#1 in memcpy_long /dist-c/microvium.c:8505:3
#2 in mvm_restore /dist-c/microvium.c:5395:3
0xfdb695800e7e is located 0 bytes to the right of 46-byte region
It tried to read 45,568 bytes from a 46-byte buffer. Over the next 4 hours, libFuzzer found 3 more unique crashes.
Finding and Reviewing Crashes
libFuzzer saves crashing inputs to the crashes/ directory. Each crash is a small bytecode file that reliably triggers the bug.
I used xxd to hexdump each crash file and compared it against valid bytecode to understand what triggered each bug.
Valid bytecode header:
| Offset | Bytes | Field | Value |
|---|---|---|---|
| 0x00 | 08 | bytecodeVersion | 8 |
| 0x01 | 1c | headerSize | 28 |
| 0x04 | 00 2e | bytecodeSize | 46 bytes |
| 0x0C | 00 1c | sectionOffsets[0] | 28 |
| 0x0E | 00 1c | sectionOffsets[1] | 28 |
| 0x10 | 00 1c | sectionOffsets[2] | 28 |
| 0x12 | 00 2a | sectionOffsets[3] | 42 |
CVE-1 crash file:
| Offset | Bytes | Field | Value |
|---|---|---|---|
| 0x10 | 49 00 | sectionOffsets[2] | 73 |
| 0x12 | 00 c4 | sectionOffsets[3] | 50,176 |
The fuzzer had corrupted the section offsets in the bytecode header. When mvm_restore() tried to read section 3, it attempted to read 50,176 bytes from a 46-byte buffer - massive heap overflow.
The VM blindly trusted these offsets and used them in memcpy() without any bounds checking.
Crash 1: CVE-1 - Heap Buffer Overflow in mvm_restore()
File: crash-7066f3dea063a8399d4e8cb5777cd5b236b105ee (46 bytes)
Looking at the hexdump:
1
2
3
00000000: 081c 0000 2e00 5010 0300 0000 1c00 1c00 ......P.........
00000010: 1c49 0000 c400 1c2f 2a00 2ab2 2a80 2a00 .I...../*.*.*.*.
00000020: 2e00 2d00 0100 0101 0000 1900 0100 ..-...........
I compared this to a valid bytecode file. The difference? Bytes 0x10-0x1B - the section offsets.
Valid bytecode:
Offset 0x12: 0x002A = 42 (within 46-byte file)
Offset 0x14: 0x002A = 42 (within 46-byte file)
Malicious bytecode:
Offset 0x12: 0xC400 = 50,176 (way beyond file!)
Offset 0x14: 0x2A2F = 10,799 (way beyond file!)
The code in mvm_restore() trusts these offsets and uses them in memcpy():
1
2
// microvium.c:5395
memcpy_long(destination, source, size); // No bounds checking!
When mvm_restore() tries to read section 3 at offset 50,176 from a 46-byte buffer… boom. Heap buffer overflow.
Impact: Memory corruption, potential RCE. An attacker can control how far past the buffer we read, potentially leaking sensitive data from the heap.
Code location: microvium.c:5395
Crash 2: CVE-2 - Heap Buffer Overflow in vm_resolveExports()
File: crash-8800363e5293536ed58e24cf4336d09e47f45a79 (46 bytes)
Similar issue, different location:
1
2
3
4
5
6
ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 2 at 0xff050d400e80 thread T0
#0 in LongPtr_read2_aligned /dist-c/microvium.c:6170:21
#1 in vm_resolveExport /dist-c/microvium.c:6813:31
#2 in mvm_resolveExports /dist-c/microvium.c:6835:23
0xff050d400e80 is located 2 bytes to the right of 46-byte region
This one reads 2 bytes past the buffer when resolving the export table. Smaller overflow, but still bad - can leak adjacent heap data.
Code location: microvium.c:6813
Crash 3: CVE-3 - Assertion Failure in vm_getHandleTargetOrNull()
File: crash-9fdcc3d9b0f2fb1f87b088f37d4856013645bfac (46 bytes)
1
2
3
4
fuzzer: dist-c/microvium.c:7551: Assertion `false' failed.
#9 in vm_getHandleTargetOrNull /dist-c/microvium.c:7551:3
#10 in vm_resolveIndirections /dist-c/microvium.c:7564:19
#11 in mvm_call /dist-c/microvium.c:2162:23
This one’s different - it’s a reachable assert(false). Looking at the code:
1
2
3
4
5
6
7
8
// microvium.c:7551
switch (handleType) {
case VM_HANDLE_TYPE_X: ...
case VM_HANDLE_TYPE_Y: ...
default:
assert(false); // Reachable with malformed bytecode!
return NULL;
}
In production builds (where assert() is typically disabled), this would return NULL and likely cause a null pointer dereference. But even in debug builds, crashing on bad input is a DoS vulnerability.
Impact: Denial of Service
Code location: microvium.c:7551
Crash 4: CVE-4 - Assertion Failure in ShortPtr_decode()
File: crash-53559a3b76bfdc9ac6506e93ad1763e92416a2fd (78 bytes)
1
2
3
4
5
fuzzer: dist-c/microvium.c:6009: Assertion `false' failed.
#9 in ShortPtr_decode /dist-c/microvium.c:6009:5
#10 in gc_processShortPtrValue /dist-c/microvium.c:6270:37
#11 in gc_processValue /dist-c/microvium.c:6453:5
#12 in mvm_runGC /dist-c/microvium.c:6547:5
Same pattern - reachable assert(false) during garbage collection. If GC crashes, your entire VM is toast.
Code location: microvium.c:6009
List of Findings
Here’s the summary:
| ID | Type |
|---|---|
| CVE-1 | Heap Buffer Overflow (Write) |
| CVE-2 | Heap Buffer Overflow (Read) |
| CVE-3 | Reachable Assertion |
| CVE-4 | Reachable Assertion |
Key Takeaways from fuzzing
What worked:
- libFuzzer + ASAN = perfect combo for finding memory corruption
- Starting with a good corpus of valid inputs is crucial
- Fuzzing the bytecode parser was the right target
What didn’t:
- AFL++ couldn’t find the bugs (probably due to CRC/structure)
- Fuzzilli can’t work without coverage instrumentation
The bugs:
- All stemmed from trusting bytecode header fields without validation
- Attackers can hex-edit bytecode files to trigger crashes
- Even though JavaScript can’t directly trigger these (the compiler generates valid bytecode), malicious bytecode files are a real attack vector
Real-world impact:
- IoT devices loading untrusted bytecode could be crashed or compromised
- Supply chain attacks via malicious npm packages
- OTA updates could be weaponized
This was a great exercise in fuzzing embedded systems. The bugs were hiding in plain sight - classic “trust user input” mistakes that fuzzing exposed in a few hours.
Root Cause: Trusting Bytecode
All four vulnerabilities stem from the same architectural flaw: the VM trusts bytecode completely.
Microvium assumes bytecode is well-formed because it comes from the official compiler. No signature verification, no runtime validation, nothing. Just parse and execute.
This is dangerous because:
- Direct bytecode manipulation: An attacker can hex-edit .mvm-bc files to inject malicious offsets
- Supply chain attacks: Compromised build servers could inject malicious bytecode during compilation
- No defense in depth: One vulnerability in the compiler or VM compromises everything
The Microvium workflow is:
JavaScript → Compiler → Bytecode → Device → Execution
If an attacker controls any step after JavaScript, they can exploit these bugs.
Can JavaScript Trigger These?
I spent time researching whether normal JavaScript code could trigger these crashes through the compiler.
Short answer: Probably not through the standard compiler.
The TypeScript compiler calculates section offsets based on actual content it generates. Unless there’s a compiler bug, it should always produce valid offsets.
But: An attacker doesn’t need to go through the compiler. They can:
- Compile legitimate JavaScript
- Hex-edit the .mvm-bc file
- Modify bytes 0x10-0x1B (section offsets)
- Distribute the malicious bytecode
I verified this with a simple test:
1
2
3
4
5
6
# Start with valid bytecode
cp corpus/addition.bin evil.mvm-bc
# Corrupt one section offset
printf '\xc4' | dd of=evil.mvm-bc bs=1 seek=18 count=1 conv=notrunc
# Test (triggers heap overflow)
./fuzzer evil.mvm-bc -runs=1
Boom. Instant crash.
Recommendations
For the Microvium team, I’d recommend:
Immediate (Critical):
- Add bounds checking in mvm_restore():
1
2
3
4
5
for (int i = 0; i < 8; i++) {
if (header->sectionOffsets[i] > bytecodeSize) {
return MVM_E_INVALID_BYTECODE;
}
}
- Replace assertions with error returns:
1
2
// Instead of: assert(false);
return MVM_E_INVALID_POINTER_TYPE;
- Implement bytecode signature verification:
1
2
3
4
// Application-level protection
if (!verify_signature(bytecode, signature, PUBLIC_KEY)) {
return MVM_E_UNTRUSTED_BYTECODE;
}
Maintainer was contacted via email but got not reply, ended up posting as a GitHub issue - https://github.com/coder-mike/microvium/issues/87