Crafting a compiler in Rust : Chapter 11

Have you ever wondered how programming languages like Python, Java or Ruby work? Do you have an interest in building your own language? If so, this series of posts is for you. We dive into building a novel programing language from first principles. Get ready to have your hands dirty. It’s going to be a fun ride. Access the final product here : https://github.com/daviddexter/karis-programming-language

Overview of conditions evaluation logic

In Karis, the main condition clauses are IF and ELSE, combined with logical infix operators that return a boolean value. Intuitively, this tells us that we first need to capture the condition clauses, evaluate the logical infix operators, and then depending on the condition result evaluate the alternative condition or return a new result. Let’s examine what the AST of a condition expression looks like so that we can get a clear picture of what we intend to accomplish. Considering the expression below:

if x < y{
    return x + y;
}else x == y {
    return x - y;
} else {
    return x * y;
};

The above will produce an AST that looks like this:

"TyNode": {
                    "variable_name": null,
                    "return_type": null,
                    "identifier_kind": "IF",
                    "alternate": {
                      "TyNode": {
                        "variable_name": null,
                        "return_type": null,
                        "identifier_kind": "ELSE",
                        "alternate": {
                          "TyNode": {
                            "variable_name": null,
                            "return_type": null,
                            "identifier_kind": "ELSE",
                            "alternate": null,
                            "block_children": [
                              {
                                "TyProgram": {
                                  "body": [
                                    {
                                      "TyNode": {
                                        "variable_name": null,
                                        "return_type": null,
                                        "identifier_kind": "RETURN",
                                        "alternate": null,
                                        "block_children": null,
                                        "right_child": {
                                          "TyNode": {
                                            "variable_name": null,
                                            "return_type": null,
                                            "identifier_kind": "ASTERISK",
                                            "alternate": null,
                                            "block_children": null,
                                            "left_child": {
                                              "TyNode": {
                                                "variable_name": "x",
                                                "return_type": null,
                                                "identifier_kind": "VARIABLE",
                                                "alternate": null,
                                                "block_children": null
                                              }
                                            },
                                            "right_child": {
                                              "TyNode": {
                                                "variable_name": "y",
                                                "return_type": null,
                                                "identifier_kind": "VARIABLE",
                                                "alternate": null,
                                                "block_children": null
                                              }
                                            }
                                          }
                                        }
                                      }
                                    }
                                  ],
                                  "non_root": true
                                }
                              }
                            ],
                            "right_child": {
                              "TyProgram": {
                                "body": [],
                                "non_root": true
                              }
                            }
                          }
                        },
                        "block_children": [
                          {
                            "TyProgram": {
                              "body": [
                                {
                                  "TyNode": {
                                    "variable_name": null,
                                    "return_type": null,
                                    "identifier_kind": "RETURN",
                                    "alternate": null,
                                    "block_children": null,
                                    "right_child": {
                                      "TyNode": {
                                        "variable_name": null,
                                        "return_type": null,
                                        "identifier_kind": "MINUS",
                                        "alternate": null,
                                        "block_children": null,
                                        "left_child": {
                                          "TyNode": {
                                            "variable_name": "x",
                                            "return_type": null,
                                            "identifier_kind": "VARIABLE",
                                            "alternate": null,
                                            "block_children": null
                                          }
                                        },
                                        "right_child": {
                                          "TyNode": {
                                            "variable_name": "y",
                                            "return_type": null,
                                            "identifier_kind": "VARIABLE",
                                            "alternate": null,
                                            "block_children": null
                                          }
                                        }
                                      }
                                    }
                                  }
                                }
                              ],
                              "non_root": true
                            }
                          }
                        ],
                        "right_child": {
                          "TyProgram": {
                            "body": [
                              {
                                "TyNode": {
                                  "variable_name": null,
                                  "return_type": null,
                                  "identifier_kind": "EQ",
                                  "alternate": null,
                                  "block_children": null,
                                  "left_child": {
                                    "TyNode": {
                                      "variable_name": "x",
                                      "return_type": null,
                                      "identifier_kind": "VARIABLE",
                                      "alternate": null,
                                      "block_children": null
                                    }
                                  },
                                  "right_child": {
                                    "TyNode": {
                                      "variable_name": "y",
                                      "return_type": null,
                                      "identifier_kind": "VARIABLE",
                                      "alternate": null,
                                      "block_children": null
                                    }
                                  }
                                }
                              }
                            ],
                            "non_root": true
                          }
                        }
                      }
                    },
                    "block_children": [
                      {
                        "TyProgram": {
                          "body": [
                            {
                              "TyNode": {
                                "variable_name": null,
                                "return_type": null,
                                "identifier_kind": "RETURN",
                                "alternate": null,
                                "block_children": null,
                                "right_child": {
                                  "TyNode": {
                                    "variable_name": null,
                                    "return_type": null,
                                    "identifier_kind": "PLUS",
                                    "alternate": null,
                                    "block_children": null,
                                    "left_child": {
                                      "TyNode": {
                                        "variable_name": "x",
                                        "return_type": null,
                                        "identifier_kind": "VARIABLE",
                                        "alternate": null,
                                        "block_children": null
                                      }
                                    },
                                    "right_child": {
                                      "TyNode": {
                                        "variable_name": "y",
                                        "return_type": null,
                                        "identifier_kind": "VARIABLE",
                                        "alternate": null,
                                        "block_children": null
                                      }
                                    }
                                  }
                                }
                              }
                            }
                          ],
                          "non_root": true
                        }
                      }
                    ],
                    "right_child": {
                      "TyProgram": {
                        "body": [
                          {
                            "TyNode": {
                              "variable_name": null,
                              "return_type": null,
                              "identifier_kind": "LT",
                              "alternate": null,
                              "block_children": null,
                              "left_child": {
                                "TyNode": {
                                  "variable_name": "x",
                                  "return_type": null,
                                  "identifier_kind": "VARIABLE",
                                  "alternate": null,
                                  "block_children": null
                                }
                              },
                              "right_child": {
                                "TyNode": {
                                  "variable_name": "y",
                                  "return_type": null,
                                  "identifier_kind": "VARIABLE",
                                  "alternate": null,
                                  "block_children": null
                                }
                              }
                            }
                          }
                        ],
                        "non_root": true
                      }
                    }
                  }

Notice that the condition node has two important properties,

right_child, this is the actual condition that will be first evaluated
block_children, these are the nodes that will be evaluated if the condition is truthy
alternate, this is the node that will be evaluated if the condition is not truthy

Let’s begin.

Evaluation Implementation

The first step is to interpret the condition intent. Therefore we extend the current node match pattern to include a condition.

IdentifierKind::IF | IdentifierKind::ELSE => {
    todo!("")
}

Notice we capture both the IF and ELSE in one pattern match. This is so because the internal mechanism of evaluation is exactly the same. Plus we are guaranteed that any recursive call will capture either condition intent.

let condition_statement_result = Ok(EvaluationObject::Empty);

let local_binding_resolver = scope.clone();

let condition = self.right_child.as_ref().unwrap();
let condition = left_or_right(condition, scope, global);
let condition = match condition.unwrap() {
    EvaluationObject::Boolean(b) => b,
    _ => false,
};

In Line 1 we instantiate a default result for the condition. While it may be nice to have a condition block that returns, that is not always the case in a real-world programming scenario. As such, we need to return to the evaluator so that it can complete its work.

In Line 3 we clone the scope and assign it to a local variable called local_binding_resolver. When this resolver is at hand, we can now retrieve variables from the scope. In some cases, a condition block can have its own variables. Such variables will be added to the scope meaning that variables with the same name will be overwritten. This characteristic gives Karis a block-scope mechanism without explicitly building it.

From Line 5 to Line 10 we extract the actual condition node and evaluate it using the left_or_right helper function. In Karis a condition evaluation much always return a boolean value, hence we default the result to false as seen on Line 9.

let program_func_worker = |program: Program| -> Result<EvaluationObject, KarisError> {
    let mut result = Ok(EvaluationObject::Empty);
    let program_items = program.body;
    for item in program_items {
        match item {
            Objects::TyNode(node) => {
                let kind = node.identifier_kind.unwrap();
                    match kind {
                        9 IdentifierKind::ASSIGN | IdentifierKind::PRINT => {
                            result = node.eval(
                                local_binding_resolver.clone(),
                                Some(local_binding_resolver.clone()), // points to the global binding scope
                            );
                        }
                        IdentifierKind::RETURN => {
                            result = node.eval(
                                local_binding_resolver.clone(),
                                Some(local_binding_resolver.clone()), // points to the global binding scope
                            );
                            break;
                        }
                        _ => {}
                    }
            }
            _ => unreachable!(),
        }
    }
    result
};

We introduce a closure program_func_worker that will be responsible for evaluating the condition body if the condition is truthy. In Line 9 we intercept an assignment and a print instruction. In this case, it writes to the evaluation to the local_binding_resolver while referencing the global scope. However, for return statements, it writes to the local_binding_resolver while referencing the same local scope.

if condition {
    let program = self.block_children.as_ref().unwrap().get(0)
                    .unwrap().as_ty_program().unwrap().clone();

    return program_func_worker(program);
}

let has_alternate = self.alternate.as_ref().is_some();

if !condition && has_alternate {
    let alternate = self.alternate.as_ref().unwrap();
    let alternate = alternate.as_ty_node().unwrap();

    let program = alternate.block_children.as_ref().unwrap()
                .get(0).unwrap().as_ty_program().unwrap().clone();

    return program_func_worker(program);
}

condition_statement_result

In the final stage, we check the condition and call the closure if required. Otherwise, we retrieve the alternative condition node and do the same thing. The process continues recursively until the entire condition evaluation is complete.

Returning a value after evaluation means that the value should be printed on the screen. Afterall, we are building an evaluator for the REPL, hence it makes sense to output it’s result on the screen. With that in mind, evaluating a return statement involves the two steps:

Extract the node that should be evaluated
Pass the node to the evaluation procedure, wrap the result in an evaluator object of type ReturnValue and return it

Let’s use this in action.

IdentifierKind::RETURN => {
    let right = self.right_child.as_ref().unwrap();
    let resp = left_or_right(right, scope, global.clone());
    match resp {
        Ok(res) => Ok(EvaluationObject::ReturnValue(Rc::new(res))),
        Err(_) => unreachable!(),
    }
}

In Line 2 we access the right child of the current node which points to what we expect to be returned. This can be either a literal or an expression. For example:

return 1;
return 1 + 3;
return function();

As you can see, their various variations of returned statements. As languages grow in complexity, variations of return statements are possible. In our case, these three variations are enough to drive the point home. Once we have the result of the evaluation, we wrap it as seen on Line 5. You may ask, how does this result get printed? Well, the solution was already done while implementing the Evaluate from the Program type. Let’s take another look.

impl Evaluate for Program {
    fn eval(&self,scope: Rc<RefCell<ScopeBindingResolver>>,global: Option<Rc<RefCell<ScopeBindingResolver>>>) -> Result<EvaluationObject, KarisError> {
        let mut evaluation_result = Ok(EvaluationObject::Empty);

        for item in self.body.iter() {
            if let Ok(val) = item.eval(scope.clone(), global.clone()) {
                match val {
                    EvaluationObject::ReturnValue(r) => {
                        let v = r.as_ref().clone();
                        evaluation_result = Ok(v)
                    }
                    _ => evaluation_result = Ok(val),
                }
            }
        }

        evaluation_result
    }
}

Line 7 captures the result of the evaluation and matches against the EvaluationObject::ReturnValue. If the result matches, we extract the wrapped value and assign it to the evaluation_result variable which is returned when the function completes. It is this result that is captured by the repl_evaluate_program function of the Evaluator and prints it on the screen.

Overview of functions evaluation logic

Functions are the primary building block of complex logic in programming languages. The same is true for Karis. To correctly evaluate a function on the fly, we need to separate two concerns, function definition and function call. As the phrase indicates, the function definition is all about outlining a function flow statement by statement. This involves defining function variables, conditional routes and even calling other functions or the same function. On the other hand, the function call is about executing a function that has already been defined. That means a function must first exist for it to be called.

In the context of evaluation, a function call ranks higher than a function definition. Why is this so? The primary objective of an evaluation is to get results. If a function is defined but not called, it is of no value to the evaluator. It’s wasted space. Therefore, we are going to put our focus on handling function calls which will implicitly handle function definition for free.

Let’s get started.

Evaluation implementation

Consider the following example:

let add @int = fn(x @int, y @int){
    return x + y;
};

let result @int = add(10,20);

In Line 1, we see that we bind function to a variable named add. This function takes two arguments, x and y each of type int. Inside the function, we sum the two arguments and return the result. In Line 4 we call the function and assign the result to a variable named result. Pretty basic example.

The above example generates the following AST:

{
  "TyProgram": {
    "body": [
      {
        "TyNode": {
          "variable_name": null,
          "return_type": null,
          "identifier_kind": "ASSIGN",
          "alternate": null,
          "block_children": null,
          "left_child": {
            "TyNode": {
              "variable_name": "add",
              "return_type": "Int",
              "identifier_kind": "LET",
              "alternate": null,
              "block_children": null
            }
          },
          "right_child": {
            "TyNode": {
              "variable_name": null,
              "return_type": null,
              "identifier_kind": "FUNCTION",
              "alternate": null,
              "block_children": [
                {
                  "TyNode": {
                    "variable_name": null,
                    "return_type": null,
                    "identifier_kind": "RETURN",
                    "alternate": null,
                    "block_children": null,
                    "right_child": {
                      "TyNode": {
                        "variable_name": null,
                        "return_type": null,
                        "identifier_kind": "PLUS",
                        "alternate": null,
                        "block_children": null,
                        "left_child": {
                          "TyNode": {
                            "variable_name": "x",
                            "return_type": null,
                            "identifier_kind": "VARIABLE",
                            "alternate": null,
                            "block_children": null
                          }
                        },
                        "right_child": {
                          "TyNode": {
                            "variable_name": "y",
                            "return_type": null,
                            "identifier_kind": "VARIABLE",
                            "alternate": null,
                            "block_children": null
                          }
                        }
                      }
                    }
                  }
                }
              ],
              "func_params": {
                "lit": [],
                "obj": [
                  {
                    "TyNode": {
                      "variable_name": "x",
                      "return_type": "Int",
                      "identifier_kind": "VARIABLE",
                      "alternate": null,
                      "block_children": null
                    }
                  },
                  {
                    "TyNode": {
                      "variable_name": "y",
                      "return_type": "Int",
                      "identifier_kind": "VARIABLE",
                      "alternate": null,
                      "block_children": null
                    }
                  }
                ]
              }
            }
          }
        }
      },
      {
        "TyNode": {
          "variable_name": null,
          "return_type": null,
          "identifier_kind": "ASSIGN",
          "alternate": null,
          "block_children": null,
          "left_child": {
            "TyNode": {
              "variable_name": "result",
              "return_type": "Int",
              "identifier_kind": "LET",
              "alternate": null,
              "block_children": null
            }
          },
          "right_child": {
            "TyNode": {
              "variable_name": "add",
              "return_type": null,
              "identifier_kind": "CALLER",
              "alternate": null,
              "block_children": null,
              "call_params": {
                "lit": [
                  {
                    "ObjintegerValue": {
                      "value": 10
                    }
                  },
                  {
                    "ObjintegerValue": {
                      "value": 20
                    }
                  }
                ],
                "obj": []
              }
            }
          }
        }
      }
    ],
    "non_root": false
  }
}

As you can see, the caller node is responsible for passing the actual values of x and y. Therefore the logic will follow the following procedure:

Retrieve the function definition from the scope
Align arguments specified in the function definition with those specified in the function call
Iterate through every node in the function definition evaluating it while passing the actual argument values

let m = hashbrown::HashMap::new();
let global_scope = match global {
    Some(m) => m,
    None => Rc::new(RefCell::new(m)),
};

In Line 1 we instantiate an empty HashMap. Next, we check the global scope to see if it is empty. If not empty, return it. Otherwise, return the empty scope wrapped in a RefCell to allow interior mutability.

let global_scope_inner = global_scope.borrow();
// use the global_scope to tell if the function is called outside `main` or another function
if global_scope_inner.is_empty() {
    let err:Result<EvaluationObject, KarisError> =  Err(KarisError {
        error_type: KarisErrorType::MissingFunctionInScope,
        message: "A function can only be called inside `@main` or inside another function".to_string(),
    });
    panic!("{:?}", err)
}

Next we do some defensive programming and check whether scope is empty. If so we panic since it means there may be no function to call.

let empty_params = Vec::new();
let call_params = match &self.call_params {
    Some(p) => p,
    None => &empty_params,
};

In Line 1 to Line 5 we instantiate an array to hold parameters passed from the function caller. We don’t assume that there will always be a parameter, hence we check from the call node on Line 2 otherwise we return an empty array.

let scope_inner = scope_clone.borrow();
let retrieve_function_def = |from_global: bool, function_name: &str| {
    if from_global {
        global_scope_inner.get(function_name).unwrap()
    } else {
        scope_inner.get(function_name).unwrap()
    }
};

In Line 2 we define a reusable closure that will retrieve the actual function definition from the scope. Next we will define another closure that will do the heavy lifting.

let function_construct = |from_global: bool, function_name: &str| {
    todo!("")
}

Let’s add some logic.

let func_def = retrieve_function_def(from_global, function_name);
let func_def = func_def.as_function().unwrap();
let func_params = match &func_def.func_params {
                        Some(p) => p,
                        None => &empty_params,
};

if call_params.len() != func_params.len() {
    let err: Result<EvaluationObject, KarisError> = Err(KarisError {
        error_type: KarisErrorType::IncorrectFunctionCall,
        message: format!("Unexpected number of arguments for function `{}`. Wanted {} but got {}. If you passed an expression, bind it to a varible first",
                                function_name, func_params.len(),call_params.len()
        ),
    });
    panic!("{:?}", err)
}

In Line 1 we call the closure function defined earlier to retrieve the function definition then get it’s name as seen in Line 2. We follow this by retrieving params from the function definition. We then check that the size of function parameters and call parameters match, otherwise we return an error.

if func_def.block_children.is_none() {
                        let err: Result<EvaluationObject, KarisError> = Err(KarisError {
                            error_type: KarisErrorType::IncorrectFunctionCall,
                            message: format!(
                                "Function body is empty. Nothing to run`{}`",
                                function_name
                            ),
                        });
                        panic!("{:?}", err)
}

if func_def.block_children.as_ref().unwrap().is_empty() {
                        let err: Result<EvaluationObject, KarisError> = Err(KarisError {
                            error_type: KarisErrorType::IncorrectFunctionCall,
                            message: format!(
                                "Function body is empty. Nothing to run`{}`",
                                function_name
                            ),
                        });
                        panic!("{:?}", err)
}

We continue to check that the function is correctly defined by checking whether its body is empty. Karis does not allow empty functions. If an empty function is encountered, an error is returned.

let mut params = Vec::new();
for param in zip(func_params, call_params) {
    params.push(param);
}

In Line 2 we utilize the zip function to combine function parameters with call parameters. Consider this example:

Function parameters : (x,y)
Call parameters : (10,20)

What zip will do is combine the first element from the first tuple with the first element from the second tuple. The same is true for other consecutive elements. Thus the returned tuple combination will be: ((x,10), (y,20))

The next step is to parse the parameters to the actual value.

let mut local_binding_resolver = hashbrown::HashMap::new();
// add params to local_binding_resolver
for (from_func, from_call) in params {
    if from_func.is_right() == from_call.is_right() {
        let from_func_param = from_func.clone();
        let from_call_param = from_call.clone();

        let func_node = from_func_param.right().unwrap().as_ty_node().unwrap().clone();

        let call_node = from_call_param.right().unwrap().as_ty_node().unwrap().clone();

        let func_key = func_node.variable_name.unwrap_or_default();
        // use the call node to retrieve the actual value from scope
        let call_key = call_node.variable_name.unwrap_or_default();
        let call_value = scope_inner.get(&call_key).unwrap_or(&EvaluationObject::Empty);
        // insert in function scope
        17 local_binding_resolver.insert(func_key, call_value.clone());
    }

    if from_func.is_right() == from_call.is_left() {
        let func_call = from_func.clone();
        let func_node = func_call.right().unwrap().as_ty_node().unwrap().clone();

        let from_call = from_call.clone();
        let call_lit = from_call.left().unwrap();
        let call_lit = match call_lit {
            LiteralObjects::ObjintegerValue(int) => {
                                    let int_lit = int.value.unwrap();
                                    EvaluationObject::Integer(int_lit)
            }
            LiteralObjects::ObjBooleanValue(bool) => {
                                    let bool_lit = bool.value.unwrap();
                                    EvaluationObject::Boolean(bool_lit)
            }
            LiteralObjects::ObjStringValue(string) => {
                                    let string_lit = string.value.as_ref().unwrap();
                                    EvaluationObject::String(string_lit.clone())
            }
        };

        // insert in function scope
        let key = func_node.variable_name.unwrap_or_default();
        local_binding_resolver.insert(key, call_lit);
    }
}

In Line 4 we check if both the function parameter and call parameter are objects. This may occur if an expression or a function call is passed as a value. In Line 19 we add the value to the local binding so that it can be accessed later downstream. In Line 20 we check if the call parameter is a literal while the function parameter is an object. This is the most common case. Notice that the function parameter is always an object, that is because a function definition does not allow literal values.

let global_scope_closure = global_scope.clone();
// evaluate the function itself
evaluate_function(
    func_def.clone(),
    Rc::new(RefCell::new(local_binding_resolver)),
    Some(global_scope_closure),
)

Finally, we pass the function definition, local binding and global scope to the evaluate_function.

fn evaluate_function(
    func: Node,
    scope: Rc<RefCell<ScopeBindingResolver>>,
    global: Option<Rc<RefCell<ScopeBindingResolver>>>,
) -> Result<EvaluationObject, KarisError> {
    let mut main_result = Ok(EvaluationObject::Empty);

    for item in func.block_children.unwrap() {
        if let Ok(res) = item.eval(scope.clone(), global.clone()) {
            match res {
                EvaluationObject::ReturnValue(result) => {
                    let v = result.as_ref().clone();
                    main_result = Ok(v);
                    break;
                }
                _ => main_result = Ok(res),
            }
        }
    }

    main_result
}

The objective of every function is to return a result. Hence we evaluate_function recursively call eval for every node. If the result is an EvaluationObject::ReturnValue the result is returned and the function returns terminates. With that, we are ready to evaluate the functions. One unintentional side effect of this implementation is that function callers don’t have arithmetic expressions correctly. For example, we can not pass the expression n + 5 as an argument to the function caller. To work correctly, we need to assign the expression to a variable and then pass that variable to the caller.

Overview of the main block

The main block is the main entry point of a Karis program. Without the main block, functions can not be executed. Let’s see how the AST of the main block looks like. Consider the following program:

let add @int = fn(x @int, y @int){
            return x + y;
};

@main fn(){
    let result0 @int = add(10,20);
    print(result0);
}@end;

When we pull the AST of the main block only, it looks like this:

{
  "TyNode": {
    "variable_name": null,
    "return_type": null,
    "identifier_kind": "MAIN",
    "alternate": null,
    "block_children": [
      {
        "TyProgram": {
          "body": [
            {
              "TyNode": {
                "variable_name": null,
                "return_type": null,
                "identifier_kind": "ASSIGN",
                "alternate": null,
                "block_children": null,
                "left_child": {
                  "TyNode": {
                    "variable_name": "result0",
                    "return_type": "Int",
                    "identifier_kind": "LET",
                    "alternate": null,
                    "block_children": null
                  }
                },
                "right_child": {
                  "TyNode": {
                    "variable_name": "add",
                    "return_type": null,
                    "identifier_kind": "CALLER",
                    "alternate": null,
                    "block_children": null,
                    "call_params": {
                      "lit": [
                        {
                          "ObjintegerValue": {
                            "value": 10
                          }
                        },
                        {
                          "ObjintegerValue": {
                            "value": 20
                          }
                        }
                      ],
                      "obj": []
                    }
                  }
                }
              }
            },
            {
              "TyNode": {
                "variable_name": null,
                "return_type": null,
                "identifier_kind": "PRINT",
                "alternate": null,
                "block_children": null,
                "call_params": {
                  "lit": [],
                  "obj": [
                    {
                      "TyNode": {
                        "variable_name": "result0",
                        "return_type": null,
                        "identifier_kind": "VARIABLE",
                        "alternate": null,
                        "block_children": null
                      }
                    }
                  ]
                }
              }
            }
          ],
          "non_root": true
        }
      }
    ]
  }
}

As you can see, the property of focus is “block_children” which represents the children of the main block. To evaluate it, we, therefore, iterate over the children and evaluate them individually.

Evaluation implementation

We begin by intercepting the MAIN` token kind. This signals that our program is a valid executable.

IdentifierKind::MAIN => {
    todo!("")
}

Next, we create a local scope that will contain all variables defined within the main block. The scope will be passed to each function call where the actual scope values will be retrieved.

let mut main_result = Ok(EvaluationObject::Empty);
let local_binding_resolver = hashbrown::HashMap::new();
let local_binding_resolver = Rc::new(RefCell::new(local_binding_resolver));

The next step is to extract the program node and iterate over it’s children.

let program = self.block_children.as_ref().unwrap().get(0).unwrap().as_ty_program().unwrap().clone();
let program_items = program.body;
for item in program_items {
    match item {
        Objects::TyNode(node) => {
            let kind = node.identifier_kind.unwrap();
            7 match kind {
                IdentifierKind::ASSIGN
                | IdentifierKind::PRINT
                | IdentifierKind::RETURN => {
             11       main_result = node.eval(local_binding_resolver.clone(), Some(scope.clone()));
                }
                _ => unreachable!("invalid syntax"),
            }
        }
        _ => unreachable!(),
    }
}

main_result

From Line 7 to Line 11 we match the kind of the current node that is been evaluated. Inside the main block, we don’t expect functions or conditions to be defined. That is a design choice. We only expect assignments, printing statements or return statements. Otherwise, we return an error and crash the program.

That’s all that is needed for the main block.

Consider the following code snippet:

print(10);

When we run this program, we expect the number ten to be printed on the screen. When we examine the AST for the snippet, it looks like this:

{
  "TyProgram": {
    "body": [
      {
        "TyNode": {
          "variable_name": null,
          "return_type": null,
          "identifier_kind": "PRINT",
          "alternate": null,
          "block_children": null,
          "call_params": {
            "lit": [
              {
                "ObjintegerValue": {
                  "value": 10
                }
              }
            ],
            "obj": []
          }
        }
      }
    ],
    "non_root": false
  }
}

In Line 8 we see the identifier PRINT. This is the token that we use to tell our parser, evaluator and eventually the compiler that this is a built-in function. There is nothing really special about built-in functions other than the fact that they come already baked in the language. It is the job of the language author to determine how much function will be consumed. In our case, print is global. It does not belong to any object and it can print anything we pass to it.

Evaluation Implementation

Let’s begin with the usual blueprint.

IdentifierKind::PRINT => {
    todo!("")
}

We introduce a new pattern to the node. Every time we get a token of kind PRINT, we want to run some logic that prints something on the screen.

let params = self.call_params.as_ref().unwrap().first().unwrap();
let scope_inner = scope_clone.borrow();

let mut message = String::new();

In Line 1 we access the call parameters that are passed to the print call. This is what we expect to be printed on the screen. In Line 2 we get a copy of the scope where we retrieve values of variables. In Line 4 we instantiate an empty message that represents the actual print value. Downstream login will update the message with the appropriate print value.

match params {
    Left(lit) => match lit {
        LiteralObjects::ObjintegerValue(i) => {
            let msg = format!("{:?}", i.value.unwrap_or_default());
            message = msg;
        }
        LiteralObjects::ObjBooleanValue(b) => {
            let msg = format!("{:?}", b.value.unwrap_or_default());
            message = msg;
        }
        LiteralObjects::ObjStringValue(s) => {
            let msg = format!("{:?}", s.value.clone());
            message = msg;
        }
    },
        Right(obj) => {
        todo!("")
    }
}

In Line 1 we match the type of the parameter to check whether it’s a literal or an object. A literal is either a string, integer or boolean. A object on the other hand is a variable that points to an actual literal value or a call to another function Let’s begin with a literal value. In Line 2, we get to assert that the parameter is a literal, we recheck the parameter to get what type of literal value it represents. For either instance of literal, we parse the value into a string using Rust’s format macro as seen in Line 4, then we update the message variable.

 match params {
    Left(lit) => match lit {
        /*Trimmed for readability */
    },
    Right(obj) => {
        6 let node = obj.as_ty_node().unwrap();
        let kind = node.identifier_kind.unwrap();
        match kind {
            9 IdentifierKind::CALLER => {
                if let Ok(value) = node.eval(scope, global) {
                    let msg = format!("{}", value);
                    message = msg;
                }
            }
            _ => {
                let variable_name = node.variable_name.as_ref().unwrap();
                    if let Some(value) = scope_inner.get(variable_name) {
                        let msg = format!("{}", value);
                        message = msg;
                    } else {
                        error!("missing variable {}", variable_name);
                    }
                }
            }
    }
}

In Line 6 we extract the node object and examine its kind. We expect the object to be either a variable or a call to a function. In Line 9 we catch the caller type and call the appropriate evaluation method. Once we have the result from the evaluation, we format the result to a printable string and update the message variable. The catch-all pattern in Line 15 retrieves the variable name and formats it to a printable string. The side effect consequence is that print does not work well with expressions. For instance, print(1 + 2); only prints the one instead of the sum of three. Let’s add a few tests to verify that the print function evaluates correctly.

#[test]
fn should_evaluate_print() {
        let lx = Lexer::new(String::from(
            "
        print(10);
        ",
        ));

        let global_binding_resolver = hashbrown::HashMap::new();
        let mut parser = Parser::new(lx);
        let res = parser.parse(Some("should_evaluate_print.json"));
        let mut evaluator = Evaluator::new(parser);
        evaluator.repl_evaluate_program(Rc::new(RefCell::new(global_binding_resolver)));

        assert!(res.is_ok());
}

And now we run the test, it works. Nice!!!

December 17, 2022

Crafting a compiler in Rust : Chapter 11

Overview of conditions evaluation logic

Evaluation Implementation

Overview of functions evaluation logic

Evaluation implementation

Overview of the main block

Evaluation implementation

Evaluation Implementation

Share