Peter Stuifzand

Bytecode interpreter

A bytecode interpreter is basically a loop and a lookup table. The interpreter starts at the beginning of an array of bytes. Each byte in this array is an index into the lookup table. For each key in this table there is a piece of code that needs to be executed. Code for that may look like this:

typedef unsigned char byte;

// the simplest program
byte program[] = {
    0,
};

byte* ip = program;                   // instruction pointer

while ((ip=(lookup_table[*ip])(ip))!=0) {
    // does nothing else, but could
}

The expression in the while loop is a bit complex, but we can divide it in parts. First there is ip. This is the instruction pointer. It points to the instruction that needs to be executed.

The lookup_table is an array with the functions that correspond with the bytecode. The typedef for that looks like this:

typedef byte* (*bytecode_function)(byte* ip);

byte* func_end(byte* ip) {
    return 0;
}

bytecode_function lookup_table[] = {
    func_end,
};

The two lines define the lookup_table. The size of the lookup table is defined to be 256, because that’s the size of bytes. The bytecode_function returns the next ip. This way a function can change the ip and jump to other places. If the function returns a NULL pointer, it will end the loop.

The instruction pointer ip is dereferenced to give the current byte at that place in the program. This byte is the bytecode that is used to lookup the function that needs to be executed.

The last thing is the ip argument to the function. This argument lets the function look at the bytes around the function. These bytes are the arguments to the function.

© 2023 Peter Stuifzand