Crafting a compiler in Rust : Chapter 14

Have you ever wondered how programming languages like Python, Java or Ruby work? Do you have an interest in building your own language? If so, this series of posts is for you. We dive into building a novel programing language from first principles. Get ready to have your hands dirty. It’s going to be a fun ride. Access the final product here : https://github.com/daviddexter/karis-programming-language

Compiling conditions involves handling two keywords; IF and ELSE. For each case, we need to figure out what is the condition that will be evaluated whether it’s true or false. That means we need to access the variables in condition and their corresponding scope. Secondly, we need to figure out how should the condition terminate, otherwise, we risk having an infinite condition. Thirdly, we need to figure out handle chain conditions. That is a condition with multiple if-else.

The beauty of Karis is that the syntax rules and the compiler have our problems taken care of. Consider the following example:

if x < y{
    return x + y;
}else x == y{
    return x - y;
}else{
    return x * y
};

In the example above, each branch has an obvious terminator, the RETURN statement. This makes our work predictable. Also, the variables x and y are scoped to the block where the condition statement is defined. Typically this will be in a function. The Node object in the parser takes care of telling us whether there is an alternative condition in the event the first condition does not evaluate to true. Let’s see how we implement this.

Compilation implementation

The logic for both IF and ELSE are the same in principle with minor differences in opcodes. We will focus on IF to demonstrate the implementation.

IdentifierKind::IF => {
    let condition_binding_key = random_string_id();
    let condition_binding_key_as_bytes = condition_binding_key.as_bytes().to_vec();
    let wrk = worker.borrow();
    let if_scope_id = wrk.generate_scope_id_from_scope(SymbolScope::Local);
    let mut condition_instructions = Vec::new();
}

In Line 2 we create a random unique name for the condition. This is important so that we can be able to identify the condition from the symbols table. In Line 5 we create a scope id for the condition in context so that in the event there are variables defined within the condition block, they are scoped to the condition itself.

// get the condition first
let condition = self.right_child.as_ref().unwrap();
if let Some(instructions) = left_or_right(condition, worker.clone(), scope.clone(), if_scope_id) {
    let inst = instructions.get(0).unwrap();
    condition_instructions.push(inst.clone());
    6 condition_instructions.push(vec![OpCode::OpJumpTo as u8]);
    let block_children = self.block_children.as_ref().unwrap();
    for body_item in block_children.iter() {
        let inst = body_item.compile(worker.clone(), SymbolScope::Local, if_scope_id);
        if let Some(all_insts) = inst {
                for i in all_insts.iter() {
                    condition_instructions.push(i.clone())
                }
            }
        }
}

In Line 2 we get the condition node and then pass it to the left_or_right function. In Line 5 we add the generated instructions to the condition_instructions vector that will store all instructions for the condition. Notice in Line 6 that adds an OpJumpTo instruction. This instruction tells that if the condition evaluates to true it should proceed to execute the children in the truthy block. From Line 8 to Line 12 we iterate the children, compile them then add their respective instructions to the condition_instructions vector.

// check if there is an alternate condition (ELSE)
if let Some(alternate_condition) = &self.alternate {
    if let Some(else_condition) = alternate_condition.compile(worker.clone(), scope.clone(), if_scope_id){
        // we add an alternate condition marker
        condition_instructions.push(vec![OpCode::OpJumpToAlternate as u8]);
        for i in else_condition.iter() {
            condition_instructions.push(i.clone())
        }
    }
}

In Line 2 we check if there is an alternate condition. If so we compile the alternate and push the corresponding instructions to the condition_instructions vector. Because alternate conditions start with the prefix ELSE, the compilation logic will be called recursively until no alternate condition is found.

wrk.add_symbol(condition_binding_key_as_bytes.clone(),condition_instructions.clone(),);
let insts = wrk.instructions_for_condition_statement(OpCode::OpAddIfCondition,scope,if_scope_id, condition_binding_key_as_bytes,);
Some(vec![insts])

We finalize the procedure by adding the instructions to the symbols table and returning the appropriate instructions to the caller as seen in Line 3. For the ELSE node, the final opcode will be an OpJumpToAlternate instead of an OpAddIfCondition. With that, we are set to compile condition statements.

The @main block is the entry point for a Karis program. That means if a main block is not present, the compiler will not be able to compile the program. At the most, we do not have a mechanism to enforce this rule since all we have accomplished is adding instructions to the instructions vector or adding symbols to the symbols table. That changes now. We start by incepting the MAIN node.

IdentifierKind::MAIN => {
    let wrk = worker.borrow();
    let main_scope_id = wrk.generate_scope_id_from_scope(SymbolScope::Main);
    let program = self.block_children.as_ref().unwrap().get(0).unwrap().as_ty_program().unwrap().clone();
    let program_items = program.body;
    if program_items.is_empty() {
        panic!("Program main function should not be empty")
    }
}

In Line 3 we generate a unique scope of the main block. Because this is the main block, this scope will be passed down to every instruction within the block. That means variables within scope are accessible by any instruction with the expectation of block-scoped variables. From Line 4 to Line 7 we unpack the children of the main block and assert that it is not empty. In the event no children are present, we return an error indicating that the syntax is incorrect. Ideally such will be caught by the parser but there is no harm in a little defensive programming.

wrk.add_main();
for item in program_items {
    match item {
        Objects::TyNode(node) => {
            node.compile(worker.clone(), SymbolScope::Local, main_scope_id)
        }
        _ => unreachable!(),
    };
}
None

In Line 1 we add a simple instruction to the CompilerWorker’s instruction vector to indicate that the program’s entry point has been found. We then iterate over each program item and compile them individually. With the main block set to be compiledm the next step is to update the compile function of the CompilerWorker.

let worker = Rc::new(RefCell::new(self));
let global_scope_id = self.global_scope_id();
self.program_ast.compile(worker.clone(), SymbolScope::Global, global_scope_id);

In Line 2 we get access to the global scope which we pass to the compiler function of the AST. As it has been evidenced, recursion plays a critical role in every compile call. When the compiler function completes its work, we expect the instruction vector to contain instructions for the program. At this point, we now need to validate that the entry point is present in the instructions vector.

let instructions = self.instructions.clone();
let instructions = instructions.as_ref().clone();
let instructions = instructions.into_inner();
// check that there is a main instruction. If not, end the program with an error
let mut main_count = 0;
instructions.iter().for_each(|item| {
    item.iter().for_each(|val| {
        let val = *val;
        let op: OpCode = val.into();
        if op == OpCode::OpMain {
            main_count += 1;
        }
    })
});

From Line 1 to Line 3 we access the current instructions. In Line 4 we defined a main_count variable that will be used to assert that a main instruction is present. We expect only one main instruction.

if main_count == 1 {
    let constants = self.constants.clone();
    let constants = constants.as_ref().clone();
    let constants = constants.into_inner();

    let symbols_table = self.symbols_table.clone();
    let symbols_table = symbols_table.as_ref().clone();
    let symbols_table = symbols_table.into_inner();

    ByteCode {
        instructions,
        constants,
        symbols_table,
        global_scope_id,
    }
} else {
    panic!("Program main function not found in scope")
}

If no main instruction is found, we crash the program. Otherwise, we assemble the required attributes and build a ByteCode object which is returned. The returned ByteCode is written to a file which will be consumed by the virtual machine.

So far we have all the building blocks to extend the compiler. That is what we seek to do by introducing the ability to compile built-in printing. The print function is just like any other function that the compiler can handle. The only difference is that it comes pre-named in the compiler. Since printing means something has been passed to the caller, we first access the call_params property of the PRINT node. This tells the compiler what should be printed on the screen.

IdentifierKind::PRINT => {
    if let Some(caller_params) = &self.call_params {
        let mut caller_instructions = Vec::new();
    }else{
        None
    }
}

In Line 3 we instantiate an empty vector that will be utilized later to store instructions for the print intent. In Line 5 if no call parameters are provided, we return a None which ends the process.

if (caller_params.is_empty()) || (caller_params.len() > 1) {
    eprintln!("Invalid number of print parameters : {:?}",caller_params.len());
    process::exit(0x0100)
}

As we are accustomed, we add validation to; check if the call parameter vector is empty; check if the call parameter vector has more than one item. Why is the second check important? Well, the print function expects only variables or function calls.

let caller_param = caller_params.first().unwrap();
call_parameter_map(caller_param, worker.clone(), &mut caller_instructions, scope.clone(), scope_id, OpCode::OpPrint);

In Line 1 we unwrap the call parameter. At this point, it is safe to do so because we are sure there is a value. We then pass the parameter to the call_parameter_map function to generate the instructions for the call parameter.

let binding_key = random_string_id();
let binding_key_as_bytes = binding_key.as_bytes().to_vec();
let wrk = worker.borrow();
wrk.add_symbol(binding_key_as_bytes.clone(), caller_instructions.clone());
let print_instructions = wrk.instructions_for_builtin(scope, scope_id, binding_key_as_bytes);
wrk.add_builtin_instructions(print_instructions.clone());
Some(vec![print_instructions])

The print function has no binding name associated with it. However, we generate a random binding name for it so that we can be able to add it to the symbols table. After adding it to the symbols table, we call the instructions_for_builtin function to generate another vector of instructions that point to how the print function can be accessed. Unlike most functions, the instruction to print is added to the CompileWorker’s instructions vector. That is what we do in Line 6. With that, the compiler is now capable to print variables we pass to it.

We are going to write four tests that each validate a specific scenario. Let’s first set up the testing infrastructure. This entails creating a new module in the compile.rs file. This module will contain all tests that will assert the internal workings of the compiler.

#[cfg(test)]
mod compile_tests {
    use lexer::lexer::Lexer;
    use parser::parser::Parser;
    use super::CompileWorker;
}

In Line 2 we add a new module called compile_tests. Inside it, we import the necessary modules that will be used within the test cases. The first test we add will assert two things; the assignment of a variable and the presence of the main entry point.

#[test]
fn should_compile1() {
    let lx = Lexer::new(String::from(
        "
            @main fn(){
                let num @int = 1;
            }@end;
        ",
    ));
    let mut parser = Parser::new(lx);
    let ast = parser.parse(Some("should_compile1.json")).unwrap();
    let worker = CompileWorker::new(ast);
    let byte_code = worker.compile();
    assert_eq!(byte_code.instructions.len(), 1);
    assert_eq!(byte_code.constants.len(), 1);
    let st = byte_code.symbols_table;
    assert_eq!(st.0.len(), 1);
}

In test should_compile1 we compile a program that contains only one variable. By examining the sample program, we can intuitively see that the constant pool will have one entry, and the instructions vector will have one entry, as will the symbols table. Let’s run the test now.: Terminal widget : cp -r programs/karis-programming-language-7-15-a work/ && cd work/karis-programming-language-7-15-a && cargo check && cargo test -p compiler compile::compile_tests::should_compile1

Nice. It passes just as we expected. For the next test, we try to assert a function definition is compiled correctly in addition to returning variables. Let’s see what that test looks like.

#[test]
fn should_compile2() {
        let lx = Lexer::new(String::from(
            "
            let num @int = fn() {
                let one @int = 1;
                return one;
            };

            @main fn (){
                let num0 @int = num();
            }@end;

        ",
        ));
        let mut parser = Parser::new(lx);
        let ast = parser.parse(Some("should_compile2.json")).unwrap();
        let worker = CompileWorker::new(ast);
        let byte_code = worker.compile();

        assert_eq!(byte_code.instructions.len(), 1);
        assert_eq!(byte_code.constants.len(), 1);
        let st = byte_code.symbols_table;
        assert_eq!(st.0.len(), 3);
}

The expectation is the same as the previous test with the exception of the symbols table which will now have three entries. Unfortunately, it fails. Looking at the sample program we can see that the problem occurs in the return statement. We want to return a variable however our currect implementation does not account for that. Let’s fix that. We add a mechanism to catch the VARIABLE node.

IdentifierKind::VARIABLE => {
    let variable_name = self.variable_name.as_ref().unwrap();
    let variable_name_as_bytes = variable_name.as_bytes().to_vec();
    let wrk = worker.borrow();
    let instructions = wrk.generate_opcode_with_scope_and_vec_parameter(
    OpCode::OpGetBinding, scope,scope_id,variable_name_as_bytes,);
    Some(vec![instructions])
}

From Line 5 to Line 7 we use the variable name represented as a vector of bytes, to generate another vector of instructions that tell the virtual machine how to access the actual value of the variable. Notice we use the OpGetBinding opcode. The name is intentional as it focuses on the perspective of the virtual machine. Consider yourself the virtual machine, if you have instructions like for a variable, the question should be how can I access the value of the variable? Hence GET BINDING. When we rerun the test now, it passes to meet our expectations. Next, we add another function that will assert a function definition, a function call and an arithmetic operation. Let’s see what it looks like.

#[test]
fn should_compile3() {
    let lx = Lexer::new(String::from(
    "
    let summation @int = fn(x @int, y @int) {
        return x + y;
    };

    @main fn(){
        let a @int = 10;
        let sum @int = summation(a, 20);
    }@end;
    ",
    ));
    let mut parser = Parser::new(lx);
    let ast = parser.parse(Some("should_compile3.json")).unwrap();
    let worker = CompileWorker::new(ast);
    let byte_code = worker.compile();
    assert_eq!(byte_code.instructions.len(), 1);
    assert_eq!(byte_code.constants.len(), 2);
    let st = byte_code.symbols_table;
    assert_eq!(st.0.len(), 3);
}

In the sample program, we define the summation function that takes two arguments and sums them up. The output of that operation is assigned to the variable sum. The last test we add wil assert that a function will conditionally statements can be compiled. The sample program is a more realistic presentation of a real-world function.

#[test]
fn should_compile4b() {
    let lx = Lexer::new(String::from(
        "
            let minmax_or_product @int = fn(x @int, y @int){
                if x < y{
                   return x + y;
                }else x > y{
                    return x - y;
                };

                return x * y;
            };

            @main fn(){
                minmax_or_product();
            }@end;
    ",
    ));
    let mut parser = Parser::new(lx);
    let ast = parser.parse(Some("should_compile4b.json")).unwrap();
    let worker = CompileWorker::new(ast);
    let byte_code = worker.compile();

    assert_eq!(byte_code.instructions.len(), 1);
    assert_eq!(byte_code.constants.len(), 0);
    let st = byte_code.symbols_table;
    assert_eq!(st.0.len(), 2);
}

There is nothing too complex with the function minmax_or_product. If we are able to compile to sample problem, it will validate that our compiler does exactly what it is meant to do.

We are now in the execution stage, This is where the virtual machine we bootstrap earlier gets fleshed out with actual implementations. We begin with arithmetic operations. The execute function of the virtual machine currently has a TODO. Let’s introduce some changes.

let command = instruction.first().unwrap();
let command = command.first().unwrap();
let command = *command;
let command: OpCode = command.into();

The first step is to identify and retrieve the first command from the instructions vector in the provided argument. This command is what is going to guide the direction of our logic moving forward. Since the command is in a byte form, we convert it to an OpCode type which we do in Line 4.

match command {
    OpCode::OpAdd | OpCode::OpMinus | OpCode::OpMultiply  | OpCode::OpDivide  | OpCode::OpModulus => {
        let op_instruction = instruction.first().unwrap();
        let instructions = op_instruction.get(2..op_instruction.len()).unwrap();
    }
}

In Line 1 we use the command to match what needs to be done. In this instance, we intend to handle all arithmetic operations as seen in Line 2. In the event the command matches the intent, we retrieve the actual operation instructions in Line 4. At this point, we don’t regard the scope since that will be handled in assignments.

let separator = instructions.iter().find_position(|i| **i == (OpCode::OpNull as u8)).unwrap();
let seperator_index = separator.0;

For arithmetic operations, the left operand and right operand are separated by a Null opcode. We, therefore, find the position of the OpNull in the instructions vector.

let left = instructions.get(0..seperator_index).unwrap().to_vec();
let left = vec![left];

let right = instructions.get(seperator_index + 1..instructions.len()).unwrap().to_vec();
let right = vec![right];

let left_value = match self.executor(&left, params.clone()) {
    Ok(obj) => match obj {
        CompileObject::Integer(number) => number,
        _ => 0_isize,
    },
    Err(_) => 0_isize,
};

let right_value = match self.executor(&right, params.clone()) {
    Ok(obj) => match obj {
        CompileObject::Integer(number) => number,
        _ => 0_isize,
    },
    Err(_) => 0_isize,
};

Once we have the separator, we use it to dissect the instructions vector into two parts, the left and right. In Line 1 to Line 2 we retrieve the left-hand side of the operation. In Line 4 to Line 5 we retrieve the right-hand side of the operation. Arithmetic operations are only valid for integer values. In Karis, integer values match the usize type in Rust. Why usize? So that we can be able to support both negative and positive integers. From Line 7 to Line 13 we call the executor method of passing the left operand. The executor will use the provided instructions to retrieve the actual value from the constant pool. We expect an integer to be returned, otherwise, we return a zero. We do the same thing in the right-hand side operand as seen from Line 15 to Line 21.

let result = match command {
    2OpCode::OpAdd => left_value + right_value,
    OpCode::OpMinus => left_value - right_value,
    OpCode::OpMultiply => left_value * right_value,
    OpCode::OpDivide => left_value / right_value,
    6OpCode::OpModulus => left_value % right_value,
    _ => 0_isize,
};
Ok(CompileObject::Integer(result))

With the actual values available, we proceed to do the operation intended. From Line 2 to Line 6 we carry out the operations and then return the result wrapped in a CompileObject type. This result will be consumed by another operation; either a print or an assignment to a variable.

Executing conditional operations is somewhat more involving than arithmetic operations. Recall conditional operations evaluate boolean values in addition to integers and strings. For each primitive type, the question of how to evaluate such types until they produce a boolean result is going to be our major concern. Consider this example:

let name1 @string = "karis"
let name2 @string = "karis programming language"
let result @boolean = name1 || name2;

At first glance, it’s not exactly obvious how to come up with the result. Should we consider the length or should we consider the similarity to the content? Such edge cases are what we need to be aware of when executing conditional operations.

Execution implementation

Let’s now move to implementing the operations. We pick up from where we left off. We add another matching pattern. This time for all opcodes that are related to conditional operations.

OpCode::OpGreaterThan | OpCode::OpGreaterThanOrEqual | OpCode::OpLessThan | OpCode::OpLessThanOrEqual | OpCode::OpEqualTo
            | OpCode::OpNotEqualTo  | OpCode::OpAND  | OpCode::OpOR | OpCode::OpLAND | OpCode::OpLOR => {
    let op_instruction = instruction.first().unwrap();
    let instructions = op_instruction.get(2..op_instruction.len()).unwrap();
    let separator = instructions.iter().find_position(|i| **i == (OpCode::OpNull as u8)).unwrap();
    let seperator_index = separator.0;
    let left = instructions.get(0..seperator_index).unwrap();
    let right = instructions.get(seperator_index + 1..instructions.len()).unwrap();
}

As seen in Line 1 and Line 2 we inject opcodes of all known conditional operations. Similar to arithmetic operations, we find the index of the separator opcode and dissect the instructions to have both the left and right operands as evidenced from Line 4 to Line 7. Conditional operations can be run on variables or literals. In the event of variables, we need a procedure to extract the actual literal value. We, therefore, introduce a helper closure function that will do this for us.

let variable_or_literal_func = |side_instructions: Vec<u8>, operation_params: Option<Vec<(CompileObject, CompileObject)>>| -> CompileObject {
    let instructions = vec![side_instructions];
    if let Ok(obj) = self.executor(&instructions, operation_params) {
        obj
    } else {
        CompileObject::Null
    }
};

We now use the closure on both the left and right operands to retrieve their respective values.

let lhs = variable_or_literal_func(left.to_vec(), params.clone());
let rhs = variable_or_literal_func(right.to_vec(), params);

Conditional operations are only valid if the type of the operands is the same. Otherwise, it’s like comparing stones with berries. We, therefore, introduce a check to assert we have the same value types.

if lhs.object_type() != rhs.object_type() {
    return Err(KarisError {
        error_type: KarisErrorType::InvalidExecution,
        message: "Comparison type mismatch".to_string(),
    });
}

The next step is to carry out the actual condition operation. Let’s narrow our focus to a simple case, OpGreaterThan.

match command {
    OpCode::OpGreaterThan => {
        let obj_type = lhs.object_type();
    }
}

In Line 3 we use the left operand to figure out the type. It does not matter whether we use the left or right operands since at this point we are using them to be of the same type. We need the type information so that set the correct procedure.

let result = match obj_type {
    INTEGER_OBJECT_TYPE => {
        let lhs_value = lhs.as_integer().unwrap();
        let rhs_value = rhs.as_integer().unwrap();
        lhs_value > rhs_value
    }
    STRING_OBJECT_TYPE => {
        let lhs_value = lhs.as_string().unwrap();
        let rhs_value = rhs.as_string().unwrap();
        lhs_value.len() > rhs_value.len()
    }
    BOOLEAN_OBJECT_TYPE => {
        let lhs_value = lhs.as_boolean().unwrap();
        let rhs_value = rhs.as_boolean().unwrap();
        lhs_value > rhs_value
    }
    _ => false,
};
Ok(CompileObject::Boolean(result))

For an integer type, getting the greater than value is quite straightforward as seen in Line 5. For string values, use the string length the compare with one that is the greatest. For boolean values, the left-most operand is always returned. For instance:

>>> True > False
True
>>> False > True
False

Good. We now have the infrastructure to add more patterns of conditional operations.

We are now in a position where we can introduce changes that will bring life to the virtual machine. Currently, we are no mechanism to know what a variable’s value is. This limits what the virtual machine can do. Let’s begin by examining the instructions produces by a variable binding. Consider the following snippet of code:

let num @int = 100;

When we feed to the compiler, the following is returned:

[[10, 1, 233, 3, 248, 0, 248]]

From the opcodes we defined, the code 10 translates to the intent to access a constant value. In this example, the constant value is 100. However, when we pass the variable num to a function or a print function, it does not carry the instructions of the constant value. However, another vector of instructions is produced. This new instruction will begin with the byte 21. From the example, we can deduce that there is a relationship between a constant value and a binding. For use to be able to get the value of a constant, we first need to resolve the binding which will then point us to where the actual constant value is located. Let’s demonstrate this in code.

Execution implementation

We begin with binding.

OpCode::OpGetBinding => {
    let op_instruction = instruction.first().unwrap();
    let binding_name = op_instruction.get(5..op_instruction.len()).unwrap().to_vec();
}

In Line 2 we retrieve the first element from the instructions vector. As evidenced in the example above, instructions come in the form of a vector of bytes. That’s quite a mouthful but we get the gist. Then we query the instruction to at from position six until the end. This will give us the name that was used in the binding. In some occasions, the OpTerminal byte is unintentionally present in the perspective vector. Hence we need to remove it.

let binding_name = binding_name.iter().filter(|elem| **elem != OpCode::OpTerminal as u8).copied().collect::<Vec<u8>>();

We leverage Rust’s support of functional programming to chain functions that remove theOpTerminal byte and return the cleaned vector as the new binding name.

let binding_value =
                    // fetch the binding from symbol table. If not present, fetch from params
                    if let Some(symbol) = self.byte_code.symbols_table.0.get(&binding_name) {
                        let symbol = &symbol.0;
                        5 if let Ok(obj) = self.executor(symbol, params) {
                            obj
                        } else {
                            CompileObject::Null
                        }
                    10 } else if let Some(parameters) = params   {
                            11 let parameter = parameters.iter().filter(|elem| {
                                    let name_binding = elem.0.as_variable().unwrap();
                                    let name_binding = name_binding.clone();
                                    name_binding == binding_name
                                15 }).collect::<Vec<&(CompileObject,CompileObject)>>();

                            let parameter = parameter.first().unwrap();
                            parameter.1.clone()
                    }else{
                            CompileObject::Null
                    };

Ok(binding_value)

A variable binding can exist in two places; as part of the symbol tables, or passed as a parameter as in the case of function calls. However, that is from the operational view. In an actual sense, all variables must resolve to an entry in the symbol table otherwise it will be deemed nonexistent. In Line 3 we try to retrieve the binding from the symbol table passing in the byte form binding name as the key. If there exists such an entry, we pass it to the executor function so that we can get the actual value it binds to as seen in Line 5. From Line 11 to Line 15 we filter the parameter already available to the caller to determine if the binding name exists in it. Since we have already set in checks, we are certain that if the binding name is not present in the symbol table then it is definitely in the parameter vector. Hence it is safe to unwrap as seen in Line 17. Finally, we wrap and return the binding value in a Result type.

For constants, the implementation follows the logic of extracting the location by fetching it from the constant pool. Nothing too fancy to do in this regard.

OpCode::OpConstant => {
    let op_instruction = instruction.first().unwrap();
    let location = op_instruction.get(5).unwrap();
    let location = *location as usize;
    let obj = self.byte_code.constants.get(location).unwrap();
    let obj = obj.clone();
    Ok(obj)
}

To execute return statements and function calls we need to understand and handle three opcodes. These opcodes are:

OpCode::OpGetCallerParameter
OpCode::OpReturn
OpCode::OpFunctionCaller

OpGetCallerParameter indicates retrieving values specific to a function call. Internally, it will execute other opcodes until the actual value is made available. OpReturn indicates an intent to return a value or values back to the caller, typically a variable binding OpFunctionCaller indicates the intent of calling a function

Let’s begin with the simplest, OpGetCallerParameter and OpReturn

`OpGetCallerParameter` and `OpReturn` execution implementations

Consider the following example of a function call in Line 2:

let a @int = 1;
let b @int = sum(a, 20);

This example shows us that we need to deal with three problems: the first is accessing a value behind a variable binding, the second is accessing a constant value and the third one is mapping the call parameters to what the function definition expects. Luckily we have implicitly addressed these three problems in the compiler. For the virtual machine it needs to do is execute the appropriate executor method.

When a function is compiled, each of its arguments is marked with the OpGetCallerParameter instruction to tell the virtual machine how to find it. It does not matter whether it’s a variable or a constant value. Let’s see that in the code.

OpCode::OpGetCallerParameter => {
    let op_instruction = instruction.first().unwrap();
    let binding_name = op_instruction.get(5..op_instruction.len()).unwrap();

    // get the binding value from symbols table
    let binding_symbol = self.byte_code.symbols_table.0.get(binding_name).unwrap();
    let binding_instructions = &binding_symbol.0;
    let binding_instructions = binding_instructions.get(0).unwrap();
    let binding_instructions = binding_instructions.clone();
    let binding_instructions = vec![binding_instructions];
    self.executor(&binding_instructions, params)
}

In Line 2 we retrieve the binding name and then use it to fetch instructions from the symbol table. Within the instructions vector, there will be instructions that guide the virtual machine to call the executor function for either the variable or constant. Similarly, for return statements, we pick the exact instruction we want to execute and then feed it the executor function as seen below.

OpCode::OpReturn => {
                let op_instruction = instruction.first().unwrap();
                let return_instructions = op_instruction.get(5..op_instruction.len() - 1).unwrap();
                let return_instructions = return_instructions.to_vec();
                let return_instructions = vec![return_instructions];
                self.executor(&return_instructions, params)
}

Function calls implementation

Executing function calls involves leveraging the infrastructure we have created so far and then composing them into one coherent flow. Let’s see that in action.

 OpCode::OpFunctionCaller => {
    let op_instruction = instruction.first().unwrap();
    let binding_name = op_instruction.get(5..op_instruction.len() - 1).unwrap();
    let caller_parameters = instruction.get(1..instruction.len());
}

We begin by dissecting the instructions vector to tell us the name of the function to be executed as seen in Line 2. In Line 3 we follow this up with getting instructions for the call parameters.

#[allow(clippy::useless_vec)]
let caller_parameters = caller_parameters.unwrap_or(&vec![vec![OpCode::OpNull as u8]]).to_vec();

In real-world scenarios, not all functions have arguments. Therefore, we add a mechanism to safeguard subsequent flow from throwing errors by making sure the caller_parameters vector always has something even if it’s a Null.

// get the function to execute from symbols table
let function_definition = self.byte_code.symbols_table.0.get(binding_name).unwrap();
let function_definition_instructions = &function_definition.0;

// retrieve function parameters
5let separator = function_definition_instructions.iter().find_position(|v| v[0] == OpCode::OpNull as u8).unwrap();
let seperator_index = separator.0;

let empty_things = Vec::new();
let function_parameters = match function_definition_instructions.get(0..seperator_index) {
    Some(p) => p,
    None => &empty_things,
};

In Line 2 we use the binding name we retrieved earlier to fetch the function to be executed from the symbols table. Recall instructions of a function have a separate for the function arguments and its body. In Line 5 we locate the separator and then use it to split the function parameters as seen in Line 9.

let caller_parameters: Vec<CompileObject> = caller_parameters
                    .iter()
                    .map(|param| {
                        let param_type = param.get(5).unwrap();
                        let param_type = CallerParamType::from(*param_type);
                        match param_type {
                            CallerParamType::Literal => {
                                let param_location = param.get(7).unwrap();
                                let param_location = *param_location as usize;
                                10 let param_object_value = self.byte_code.constants.get(param_location).unwrap();
                                param_object_value.clone()
                            }
                            CallerParamType::Variable => {
                                let param_instructions = param.get(7..param.len() - 1).unwrap();
                                let param_instructions = param_instructions.to_vec();
                                let param_instructions = vec![param_instructions];
                                if let Ok(obj) = self.executor(&param_instructions, params.clone())
                                {
                                    obj
                                } else {
                                    CompileObject::Null
                                }
                            }
                        }
                    })
                    .collect();

The next step is to map the caller parameters to actual values. A caller parameter can either be a literal or a variable. In the case of a literal parameter, we access its underlying value from the constant pool as seen in Line 10. While for a variable parameter, we call the executor function given its corresponding instructions as seen in Line 17.

let function_def_parameters: Vec<CompileObject> = function_parameters
                    .iter()
                    .map(|param| {
                        let binding_name = param.get(5..param.len() - 1).unwrap();
                        CompileObject::Variable(binding_name.to_vec())
                    })
                    .collect();

let mut params = Vec::new();
for param in zip(function_def_parameters, caller_parameters) {
    params.push(param);
}

With the call parameters ready, we dig into the function definition to determine its arguments. We then merge both the call parameters and function parameters to produce a parameter vector which looks like this:

Vec<(CompileObject(x),CompileObject(10))>

The next step is to move to the function body and execute each instruction.

// retrieve function block items
let block_items = match function_definition_instructions
        .get(seperator_index..function_definition_instructions.len())
    {
        Some(p) => p,
        None => &empty_things, // the likelihood of this is zero
    };

let mut caller_result = Ok(CompileObject::Null);

for item in block_items.iter() {
                    let code = item.first().unwrap();
                    match OpCode::from(*code) {
                        OpCode::OpReturn => {
                            let item = item.clone();
                            let item = vec![item];
                            caller_result = self.executor(&item, Some(params.clone()));
                            break;
                        }
                        OpCode::OpNull => {}
                        _ => {
                            let item = item.clone();
                            let item = vec![item];
                            caller_result = self.executor(&item, Some(params.clone()));
                        }
                    }
}

caller_result

From Line 2 to Line 6, we slice the function definition instructions to fetch instructions for its body items. In Karis, functions without a body are disallowed. Hence we are certain that these instructions are always present. The next logical step is to iterate over the body instructions and execute them one by one until the last statement. In some cases, however, the execution can end if a return statement is found as in the case of conditions. We mutate the caller_result with the result of each execution then return it as the final result of executing the function. With that, we are set to execute functions in the VM.

Karis has only one built-in function, print. Similar to executing custom functions, we will leverage a lot of the infrastructure we have lied out. Let’s look at the implementation.

OpCode::OpPrint => {
    let op_instruction = instruction.first().unwrap();
    let print_type = op_instruction.get(5).unwrap();
    let print_value: CallerParamType = CallerParamType::from(*print_type);
}

In Line 1 we pick the first vector from the instructions vector. The returned vector contains what we need to print. In Line 2 we pick the first byte from the vector. This byte contains information on the type that should be printed. For this intent, the type will be either a literal value or a variable. In Line 3 we convert the typing byte to a CallerParamType so that we can use it in a match pattern.

match print_value {
    CallerParamType::Literal => {
        let loc = op_instruction.get(7).unwrap();
        let loc = *loc as usize;
        let value = self.byte_code.constants.get(loc).unwrap();
        match value {
            CompileObject::Integer(val) => println!("{:?}", val),
            CompileObject::String(val) => println!("{:?}", val),
            CompileObject::Boolean(val) => println!("{:?}", val),
            _ => {}
        }
    }
    CallerParamType::Variable => {
        let variable_access_instructions =
        op_instruction.get(7..op_instruction.len()).unwrap();
        let access_command = variable_access_instructions.first().unwrap();
        let access_command = *access_command;
        let access_command: OpCode = access_command.into();
        19 if access_command == OpCode::OpGetCallerParameter {
            // get the access instructions
            21 let access_key = variable_access_instructions.get(5..variable_access_instructions.len() - 1).unwrap();
            if let Some(symbol) = self.byte_code.symbols_table.0.get(access_key) {
                let symbols: Vec<Vec<u8>> = symbol.0.clone();
                if let Ok(response) = self.executor(&symbols, params) {
                    println!("{:?}", response)
                }
            }
        }
    }
};

From Line 2 to Line 12 we handle a case when the print type is literal. For such, we use instructions for the literal value to retrieve its underlying value from the constants pool. The value returned is of type CompileObject which will allow us to match it against expected primitives. Line 7 shows a demonstration of how to accomplish printing. We use Rust’s println macro and pass the value.

For variables, we handle them slightly differently. Instead of going to the constants pool, we first need to resolve what the variable points to by executing the command OpGetCallerParameter as seen in Line 19. We do this because the print intent is essentially a function call, hence getting values of a variable bound to a function call require handling the OpGetCallerParameter instruction. In Line 21 we pick the binding key of the variable from the instruction vector and then use it to retrieve the value the actual value from the symbols tables. Finally, we pass the value to the println macro to do the actual printing.

January 31, 2023

Crafting a compiler in Rust : Chapter 14

Compilation implementation

Execution implementation

Execution implementation

`OpGetCallerParameter` and `OpReturn` execution implementations

Function calls implementation

Share

January 31, 2023

Crafting a compiler in Rust : Chapter 14

Compilation implementation

Execution implementation

Execution implementation

OpGetCallerParameter and OpReturn execution implementations

Function calls implementation

Share

`OpGetCallerParameter` and `OpReturn` execution implementations