Crafting a compiler in Rust : Chapter 10

Have you ever wondered how programming languages like Python, Java or Ruby work? Do you have an interest in building your own language? If so, this series of posts is for you. We dive into building a novel programing language from first principles. Get ready to have your hands dirty. It’s going to be a fun ride. Access the final product here : https://github.com/daviddexter/karis-programming-language

Congratulations!!!. We have made it this far. We are now entering a new realm where we will be considered with giving meaning to the output of our parser. Without this step, a code snippet like let sum @int = 1 + 2; will just be a series of characters. However. once evaluation kicks in, sum will be equal to three.

The output of the evaluation process depends on the language design. Consider the sample below:

let num @int = 10;
if (num){
    return true;
}
return false;

Line 2 the num parameter is a placed in condition. In some languages, it may designed to return either true or `false. In others, it may requires an explicit check against another parameter like:

let num @int = 10;
if (num != 0){
    return true;
}
return false;

For our use case, we will choose the explicit approach. This approach makes our intent clear and allows us not to have several implicit edge cases to worry about. There will be similar small decisions that we will undertake. This should inform you that evaluation implementation is different for most programming languages.

The question that needs an answer is; now that we have an AST, how do we generate meaning from it? We will solve this problem by using the classic approach of traversing the AST, node by node while generating the meaning of each node on the fly. What we seek to build is a “tree-walking interpreter”. While tree-walking interpreters are inherently slow, they are easier to implement. Remember we are here to learn. Once we have a work evaluator, we can figure out to finetune the process to make it more efficient and robust. Let’s get started.

Before we dive into the specifics of tree-walking interpreters, let’s review what an interpreter is in the context of programming languages. An interpreter is a program that reads source code, parses it, and executes it. Unlike a compiler, which translates source code into machine code ahead of time, an interpreter performs this process at runtime.

A tree-walking interpreter is a type of interpreter that evaluates a program by recursively traversing a tree-like data structure representing the program’s syntax. This approach is commonly used in the implementation of programming languages, especially for dynamically-typed languages.

How do tree-walking interpreters work?

A tree-walking interpreter works by recursively traversing an abstract syntax tree (AST) that represents the program’s syntax. An AST is a tree-like data structure that describes the syntactic structure of a program in a way that is easier for a computer to process.

The interpreter begins at the root of the AST and traverses down each branch, evaluating the nodes as it goes. Each node in the AST represents a different syntactic element of the program, such as an expression, statement, or function call.

As the interpreter traverses the AST, it maintains a runtime environment that stores information about the program’s state. This environment includes information about variables, functions, and other program components.

When the interpreter encounters a node that represents an expression, it evaluates the expression and returns a result. For example, if the node represents a mathematical expression, such as 2 + 2, the interpreter would evaluate the expression and return the result, 4.

When the interpreter encounters a node that represents a statement, such as an assignment statement, it performs the necessary operations to modify the runtime environment accordingly. For example, if the statement is x = 2, the interpreter would set the value of the variable x to 2 in the runtime environment.

Object representation

We begin by updating the Cargo.toml file at the root of the project and adding the entry evaluator. Next we create a new library by running the command:

cargo new --lib evaluator

We already have a way to represent objects in the parser. However, we need a different way of representing objects in the evaluator. From the evaluator’s perspective, we only care about how the output will be represented. This limits our scope to the primary data types and functions.

Let’s create a new module named evaluate.rs. This is where all of our work is going to live. Next, we introduce an enum type to serve as the evaluation object.

#[derive(Default, Debug, Clone, EnumAsInner)]
pub enum EvaluationObject {
    Integer(isize),
    Boolean(bool),
    String(String),
    Array(Vec<EvaluationObject>),
    ReturnValue(Rc<EvaluationObject>),
    Function(Node),

    AlternateCondition,
    #[default]
    Empty,
}

Line 3 to Line 5 points to primitive types int, bool and string. For arrays, we use an array of the evaluation object. Similarly, returning types point to other evaluation objects, as seen in Line 7 However, to conform with Rust ownership semantics, we wrap the evaluation object inside a reference counter giving it the ability to have multiple “owners”. The default value of an evaluation object will be a type value as seen in Line 11

We want the output of the evaluator to be printed. Consider that we will interact with the evaluator via the terminal and we want to see the result of the evaluation printed on the terminal. As such, we need to implement the Display trait for the evaluation object.

impl fmt::Display for EvaluationObject {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            EvaluationObject::Integer(i) => write!(f, "{}", i),
            EvaluationObject::Boolean(b) => write!(f, "{}", b),
            EvaluationObject::String(s) => write!(f, "{}", s),
            EvaluationObject::ReturnValue(obj) => write!(f, "{}", obj),
            EvaluationObject::Array(list) => {
                let mut items = Vec::new();

                for item in list.iter() {
                    let m = format!("{}", item);
                    items.push(m);
                }

                write!(f, "{:?}", items)
            }
            EvaluationObject::Function(_)
            | EvaluationObject::Empty
            | EvaluationObject::AlternateCondition => Ok(()),
        }
    }
}

With that, we have the first building block of the evaluator.

From RPPL to REPL

We a read-parse-print-loop that reads and parses the source code into an AST and optionally prints it on the screen. Use the following command to see it in action:

cargo run -p console -- rppl -p testdata/test01.kr

As a bonus, it can always render a pictorial representation of the AST. Our next move is to read the source code, parse it, evaluate it and print it. This REPL will be the entry point for the evaluator. We open main.rs in the console package and append some changes:

let matches = Command::new(KARIS_WELCOME_MESSAGE)
        .version("v0.1.0")
        .propagate_version(true)
        .subcommand_required(true)
        .arg_required_else_help(true)
        /*Trimmed */
        .subcommand(Command::new("repl").about("Read-Evaluate-Print-Loop for Karis"))
        .get_matches();

In Line 7 we introduce a new subcommand called repl. When this subcommand is called, it starts the REPL program. The next change will be a catch for this command.

Some(("repl", _sub_matches)) => evaluate_from_input(),

As you can see, we call an evaluate_from_input function that does the actual work of the REPL. Let’s add the function.

fn evaluate_from_input() -> Result<(), KarisError> {
    println!("{}", KARIS_INTERACTIVE_MESSAGE.cyan());
    println!("use SHIFT+ENTER to add a new line");
    println!("use SHIFT+UP to move up");
    println!("use SHIFT+DOWN to add a new line");
    println!(" ");
}

The REPL interface should be considered consumer-facing. As such, user experience is important. Thus, from Line 2 to Line 5, we welcome the user to the REPL and detail a few commands that can be used with the REPL program. To build out the actual REPL terminal, we will leverage the rustyline package which provides essential functionalities to construct a REPL. We add it to our package by running the command:

cargo add -p console rustyline

Next, we will set up some infrastructure that will power the REPL editor.

let helper = EditorHelper {
    completer: FilenameCompleter::new(),
    highlighter: MatchingBracketHighlighter::new(),
    hinter: HistoryHinter {},
    colored_prompt: "".to_owned(),
    validator: MatchingBracketValidator::new(),
};

let mut editor = Editor::with_config(config)?;
editor.set_helper(Some(helper));

10 editor.bind_sequence(
        KeyEvent(KeyCode::Left, Modifiers::CTRL),
        Cmd::Move(Movement::BackwardWord(1, Word::Big)),
);
editor.bind_sequence(
        KeyEvent(KeyCode::Right, Modifiers::CTRL),
        Cmd::Move(Movement::ForwardWord(1, At::AfterEnd, Word::Big)),
);
editor.bind_sequence(KeyEvent(KeyCode::Enter, Modifiers::SHIFT), Cmd::Newline);
editor.bind_sequence(
        KeyEvent(KeyCode::Up, Modifiers::SHIFT),
        Cmd::Move(Movement::LineUp(1)),
);
editor.bind_sequence(
        KeyEvent(KeyCode::Down, Modifiers::SHIFT),
        Cmd::Move(Movement::LineUp(1)),
);

Line 1 to Line 7 define helper for the editor such as file completion and source code highlighting. Line 8 does the actual instantiation of the editor then we inject the helpers defined before in Line 9. From Line 10 to Line 26, we add bindings to help the user of the editor. For instance, the first binding will be a combination of CTRL+UP which will move the editor’s cursor up by one character.

We want the editor to run continuously until the user opts to close it. As such we add the editor in a loop construct for it to continuously read input.

loop {
    let prompt = format!("{}", PROMPT.yellow());

    match editor.readline(prompt.as_str()) {
        Ok(input) => {
                6 editor.add_history_entry(input.clone());

                8 let lx = Lexer::new(input);
                let parser = Parser::new(lx);
                let mut evaluator = Evaluator::new(parser);
                evaluator.repl_evaluate_program(resolver.clone());
        }
            Err(ReadlineError::Interrupted) | Err(ReadlineError::Eof) => {
                println!("Exiting...");
                break;
            }

            Err(err) => {
                eprintln!("Error: {:?}", err);
                break;
        }
    }
}

In Line 8 and Line 9 we pass the input to the lexer and the parser respectively. At this point, the AST should be ready. We then pass the AST to the evaluator in Line 10 the call repl_evaluate_program to evaluate it. We do this for every input made available in the editor. Notice we don’t have the Evaluator yet. Let’s create it.

pub trait Evaluate {
    fn eval(
        &self,
        scope: Rc<RefCell<ScopeBindingResolver>>,
        global: Option<Rc<RefCell<ScopeBindingResolver>>>,
    ) -> Result<EvaluationObject, KarisError>;
}

We first define an Evaluate trait. The trait has only one function called eval. Recall that in a tree-walking interpreter, we evaluate nodes recursively. As such, each node type will need to implement the Evaluate trait. The eval function takes a scope parameter and a global parameter. These parameters represent both local and global scopes of the binding found in the source code.

pub struct Evaluator {
    parser: Parser,
}

impl Evaluator {
    pub fn new(parser: Parser) -> Self {
        Self { parser }
    }

    pub fn repl_evaluate_program(&mut self, scope: Rc<RefCell<ScopeBindingResolver>>) {
        match self.parser.parse(Some("repl_evaluate_program.json")) {
            Ok(program) => match program.eval(scope, None) {
               Ok(EvaluationObject::Integer(val)) => println!("{:?}", val),
                Ok(EvaluationObject::String(val)) => println!("{:?}", val),
                Ok(EvaluationObject::Boolean(val)) => println!("{:?}", val),
                Ok(_) => {}
                Err(_) => todo!(),
            },
            Err(err) => println!("{}", err.to_string().red()),
        }
    }
}

In Evaluator type in Line 1 holds the AST that will be evaluated. In Line 13 to Line 15 we print the value of every evaluation. Karis does not support complex constructs like structs and enums. As such we are always guaranteed that the eventual value will always be an integer, string, or boolean. We now have REPL ready. To test it, run the command:

cargo run -p console repl

However, adding any input will crash the program. This is expected because we have not implemented a means to evaluate AST.

Primitive object representation

Karis has three main primitives; integers, strings and booleans. When we evaluate a primitive, we expect the actual value to be printed on the terminal. Consider the following snippet:

let num @int = 123;
let name @string = "Karis";
let valid @bool = true;

Running this should print “123”, “Karis” and true respectively. Let’s examine a node that represents such a primitive.

pub struct Node {
    pub identifier_kind: Option<IdentifierKind>,
    pub left_child: Option<Either<LiteralObjects, Box<Objects>>>,
}

In Line 2, the identifier kind should be INTLITERAL, STRINGLITERAL or BOOLEANLITERAL. This is the only way we know we are handling a primitive type. In Line 3, the child of the literal object will be the actual value of the literal type. Since we are using an Either type, we should only be considered with the left-hand side of the either-type. If the left-hand side is empty, it indicates that something has broken down from the lexer to the parser.

Evaluation Implementation

We will begin by implementing the trait Evaluate since each node object must satisfy it.

impl Evaluate for Objects {
    fn eval(
        &self,
        scope: Rc<RefCell<ScopeBindingResolver>>,
        global: Option<Rc<RefCell<ScopeBindingResolver>>>,
    ) -> Result<EvaluationObject, KarisError> {
        match self {
            Self::TyProgram(program) => program.eval(scope, global),
            Self::TyNode(node) => node.eval(scope, global),
            _ => unreachable!("cannot evaluate UNKNOWN or CONSUMABLE"),
        }
    }
}

In Line 7 we check the object in focus and match it against a TyProgram or TyNode. Recall that TyProgram is always the root of the AST. Since the program is a composition of smaller nodes, we call the eval function of the TyProgram type. Similarly, we do the same thing for the TyNode type on Line 9.

impl Evaluate for Program {
    fn eval(
        &self,
        scope: Rc<RefCell<ScopeBindingResolver>>,
        global: Option<Rc<RefCell<ScopeBindingResolver>>>,
    ) -> Result<EvaluationObject, KarisError> {
        let mut evaluation_result = Ok(EvaluationObject::Empty);

        for item in self.body.iter() {
            if let Ok(val) = item.eval(scope.clone(), global.clone()) {
                match val {
                    EvaluationObject::ReturnValue(r) => {
                        let v = r.as_ref().clone();
                        evaluation_result = Ok(v)
                    }
                    _ => evaluation_result = Ok(val),
                }
            }
        }

        evaluation_result
    }
}

Above we implement the trait for TyProgram type. In Line 9 we iterate over the program body items and for each node, we call its respective eval method. Notice in Line 12 we intercept the result of the evaluation and assign it to the evaluation_result variable defined in Line 7. This is what will be returned when the evaluation completes successfully.

impl Evaluate for Node {
    fn eval(
        &self,
        scope: Rc<RefCell<ScopeBindingResolver>>,
        global: Option<Rc<RefCell<ScopeBindingResolver>>>,
    ) -> Result<EvaluationObject, KarisError> {
        7 let kind = self.identifier_kind.unwrap();

        let scope_clone = scope.clone();

        match kind {
            11 IdentifierKind::INTLITERAL => {
                let lit = self.left_child.as_ref().unwrap().as_ref().left().unwrap();
                let int_value = lit.as_obj_integer_value().unwrap();
                let int_lit = int_value.value.unwrap();
                Ok(EvaluationObject::Integer(int_lit))
            }
            IdentifierKind::STRINGLITERAL => {
                let lit = self.left_child.as_ref().unwrap().as_ref().left().unwrap();
                let string_value = lit.as_obj_string_value().unwrap();
                let string_lit = string_value.value.as_ref().unwrap();
                Ok(EvaluationObject::String(string_lit.clone()))
            }
            IdentifierKind::BOOLEANLITERAL => {
                let lit = self.left_child.as_ref().unwrap().as_ref().left().unwrap();
                let bool_value = lit.as_obj_boolean_value().unwrap();
                let bool_lit = bool_value.value.unwrap();
                Ok(EvaluationObject::Boolean(bool_lit))
            }

            _ => todo!(""),
        }
    }
}

The node object is where the bulk of evaluation logic will be situated. For us to know how to properly evaluate a node, we first need to identify it. In Line 7 we retrieve the object kind and pattern match it against all node types supported by the lexer. For example, integer literals types are caught by the pattern on Line 7. Once we have it, we retrieve the left-hand side child and extract its underlying value. With the value at hand, we wrap it in an evaluation object of type Integer and then return it to the caller. The logic is similar for both string literals and boolean literals. With that, the evaluator is ready to handle primitive types.

Overview of prefix and infix operators evaluation logic

Similar to modern programming languages, prefix and infix operators in Karis appear before and between operands respectively. For Karis in particular, prefix and infix operators don’t appear in dangling situations. Rather they must be associated with a variable of a specific name identifier. For the operators themselves, we need to handle both the left side and right side then merge the two depending on the instruction of the operand. Let’s put this into context. Consider a math expression like below:

let sum @int = 1 + 2;

From an evaluation perspective, this is how we show view the expression:

LET         SUM  @int        =             1           +                2;
^           ^^^^^^^^^
binding     variables      assignment     left-hand       infix            right-hand
intent                      symbol     side             operator        side
                                       operand                          operand

As we can be seen, we need to handle a number of isolated cases:

Handle how a binding should be evaluated
Handle how a variable should be evaluated
Handle how an assignment should be evaluated
Handle how the left-hand and right-hand operands should be evaluated
handle how to merge the result for operand evaluation in relation to the infix operator

Let’s begin

Bindings, variables, and assignments

Handling a binding is done implicitly by evaluating a variable and an assignment.

IdentifierKind::VARIABLE => {
    let variable = self.variable_name.as_ref().unwrap();
    let scope_inner = scope_clone.borrow();

    if let Some(val) = scope_inner.get(variable) {
        return Ok(val.clone());
    }
    Ok(EvaluationObject::Empty)
}

Evaluating a variable is quite simple. All that is required is to retrieve the variable name and add it to the scope as seen in Line 2 to Line 6. Depending on where the variable is defined, the scope can either be local or global.

IdentifierKind::ASSIGN => {
    let binding = self.left_child.as_ref().unwrap().as_ref().right().unwrap().as_ty_node().unwrap();
    let binding_key = binding.variable_name.as_ref().unwrap();

    let rhs = self.right_child.as_ref().unwrap();
    if let Ok(rhs) = left_or_right(rhs, scope, global) {
        let mut scope_inner_mut = scope_clone.borrow_mut();
        scope_inner_mut.insert(binding_key.to_string(), rhs);
    }
    Ok(EvaluationObject::Empty)
}

For the assignment, we get the name that will be referenced and evaluate the expression on the right. In Line 3 we retrieve the binding name. Typically this is a variable name. In Line 5 we leverage a helper function named left_or_right to evaluate the expression. We use a helper function because it will utilize extensively for prefix and infix operators. Let’s introduce a helper function.

fn left_or_right(
    object: &Either<LiteralObjects, Box<Objects>>,
    scope: Rc<RefCell<ScopeBindingResolver>>,
    global: Option<Rc<RefCell<ScopeBindingResolver>>>,
) -> Result<EvaluationObject, KarisError> {
    match object {
        Left(left) => match left {
            LiteralObjects::ObjintegerValue(int) => {
                let int_lit = int.value.unwrap();
                Ok(EvaluationObject::Integer(int_lit))
            }
            LiteralObjects::ObjBooleanValue(bool) => {
                let bool_lit = bool.value.unwrap();
                Ok(EvaluationObject::Boolean(bool_lit))
            }
            LiteralObjects::ObjStringValue(string) => {
                let string_lit = string.value.as_ref().unwrap();
                Ok(EvaluationObject::String(string_lit.clone()))
            }
        },
        Right(right) => right.eval(scope, global),
    }
}

The function checks against the passed object to match whether it’s a literal or a node object. For literals, it’s easy, we just return a wrapped value. For node objects, we recursively call the eval function of the object until a value has been returned.

Operators’ evaluation

Operator evaluation takes the same logic regardless of the operator. We evaluate left and right then merge the result depending on the operator’s instruction.

IdentifierKind::PLUS => {
    let left = self.left_child.as_ref().unwrap();
    let left = left_or_right(left, scope.clone(), global.clone());
    let left = match left.unwrap() {
        EvaluationObject::Integer(int) => int,
        _ => 0,
    };

    9 let right = self.right_child.as_ref().unwrap();
    let right = left_or_right(right, scope, global);
    let right = match right.unwrap() {
                    EvaluationObject::Integer(int) => int,
                    _ => 0,
    15 };

    let sum = left + right;
                Ok(EvaluationObject::Integer(sum))
}

From Line 2 to Line 7 we extract the node and evaluate it. We do the same thing for the right-hand side operand as seen on Line 9 to Line 15. In this case, the instruction is to add two operands. That is what we do on Line 17. If the instruction was a product, we would that line to :

let product = left * right;

That is all to it about prefix and infix operators.

December 11, 2022