Compilers: Program 4

Typechecking Jest

Due: Friday, April 24 at 5pm

Read this assignment and the Jest type rules in their entirety.

The goal of this programming assignment is to implement the type checker for Jest and thereby gain more experience writing code that analyzes programs by navigating recursive tree structures using implicit state.


Introduction

Download the starter project, unzip it and familiarize yourself with the files associated with this project:

src/Ast.hs              minor change (Operator supports equality)
src/AstUnparse.hs       very minor change
src/BuildAst.hs         effectively solution for last assignment (+ basis)
src/JestBasis.hs        new
src/Parse.hs            as before
src/ParseTree.hs        as before
src/Region.hs           as before
src/Scan.x              as before
src/SemTy.hs            augmented (AnyTy, isSpecific, matches, show)
src/Tag.hs              as before
src/Token.hs            as before
src/TypeEnvToString.hs  new
src/TypeCheck.hs        new
src/TypeState.hs        new
app/Main.hs             similar, now invokes typeCheck
examples/               new - folder for Jest examples to use for testing

Examine the supplied code, in particular, the modules SemTy, TypeCheck and TypeState. You only need work in TypeCheck and SemTy.

Try building and running the project. As it stands, it is not good for much. It can handle simple assignment statements, assuming that the variable being assigned to was given a type in its declaration and that what is being assigned is simple: It can successfully typecheck assignments of Boolean literals to explicitly declared variables such as:

let /*<Bool>*/ typeChecksOk = true;

To the extent that it can check statements, it can check sequences of statements (tcStmts) so it can handle compound statements and function bodies (but not returns). It can also handle while statements (and break and continue, too, as they require no typechecking at all).

As far as expressions are concerned, it can handle Boolean literals, identifiers, and unary operations, field access for non-records (e.g., .length for arrays and strings).

Some helper code has been started to guide your work (see tcBinExp and tcFunCall).

Here is a more complicated example of what can be run out of the box:

% stack run typecheck
let /*<Bool>*/ b = true;
let /*<[Bool]>*/ uninitializedArrayOfBools;
while (b) {
    let /*<Int>*/ len = uninitializedArrayOfBools.length;
    b = !b;
}

Program parsed successfully.

AST built without error.

Program type checks.

The typechecker can be run in one of two modes; by default, it runs the typechecker on the AST derived from the source input (whether from standard input or from a filename specified as the final command-line argument). Optionally the user may add -v (which if used with a filename, must appear before the filename) to indicate “verbose” output: if the supplied Jest code successfully typechecks, the resulting type environment is displayed. Here are more examples to illustrate the different ways the typechecker can be run. The examples are using the completed typechecker and feature the the mutually recursive even-odd example which puts a lot of different aspects of the typechecker to the test (including functions, arrays, and conditionals).


Part I: typechecking basic expressions

For this part of the assignment focus on tcExp in TypeCheck.hs.

  1. Complete type checking for all literals (integers, floating-point numbers and strings). You can test other kinds of simple assignments now as long as the the variables have been declared with explicit types:

     let /*<Bool>*/ b = true;
     let /*<Float>*/ f = 3.14;
     let /*<Int>*/ i = -907;
     let /*<String>*/ s = 'Help!';

    and since unary operators were implemented out of the box:

     let /*<Int>*/ i = 327;
     i = -(-i);
  2. Out of the box, initialized variables declared without an explicit types cause errors:

     let b = false;
    
     type mismatch in assignment: expected <any> but found Bool

    Modify matches in SemTy.hs so that if either argument is AnyTy it matches (returns true) regardless of the other type. That will allow uninitialized variables to be declared without explicit types, though not yet fully correctly:

     % stack run typecheck -- -v
     let i = 47;
    
     ... Program type checks... Type environment:
     ...
     i declared at line 1 ...: <any>

    (Later, we will have the statement checker update the type environment to reflect the specific type of the assignment.)

  3. Add a case for binary-operator expressions (A.Binary) to tcExp: determine the types of the left and right operands and then call tcBinExp (partially implemented for you). That should allow you to handle the + operator in its various forms:

     let i = 47 + 53;
     let f = 3.14 + 6.28;
     let s = 'hello ' + 'world!';

    It should also catch misuses of +:

     let x = 3 + 'hello';
     ...
     type mismatch in plus right operand: expected Int but found String
  4. Complete tcBinExp so that you can correctly check all binary operations. There are 12 other cases aside from +, but they can be grouped into four categories:

    Ast.Operator now derives the Eq type class which means cases can be combined by testing for memebership in a list as in:

     op `elem` [A.Eq, A.NotEq, A.Less, ...] = ...

    You might notice the use of | in definining tcBinExp that is Haskell’s “guard” syntax for functions. It makes it easier to define different cases for functions that correspond to if-else chains in other languages. You should rely on tcNumericMatch, tcNumericStringMatch and, of course, tcMatch.

    Explain (in README.md): why does tcNumericMatch work the way it does? Or, put another way, why is the type rule for arithmetic operations have two basic versions: one that expect two integer operands and produces an integer result and one that expects two float operands and produces a float result? Why not have one rule that simple says that each operand could be an integer or a float and therefore to allow expressions like 2 * 3.14?

    top


    Part II: typechecking basic statements

  5. Implement typechecking for do ... while and if statements. (This is meant to be easy: follow the model of the While case.) Try typechecking ulam.js and ulam-errors.js.

  6. Complete typechecking for assignments. To do this, first augment isSpecific in SemTy.hs so that it correctly returns true if the supplied type is specific (it makes no mention of AnyTy nor NullTy) or false otherwise. Now modify the assignment case in tcStmt so that it effectively has (at least) two cases. If the type of the left side is specific, then basically proceed as before: match the expected type (left type) to the actual type (right type); if the left side has unspecified type (AnyTy) and it is simply an identifier then adjust the type environment (use setTy defined in TypeState) so that the identifier is now mapped to the right-side type. In the latter case, you should verify that the right-side type is specific. Otherwise, generate a type error. For example (once basic arrays are handled):

     let x = [];
     ...
     specific type required for initial assignment at line 1 cols 1-11

    To really complete the case where an identifier is being remapped from AnyTy to a specific type, you should also update the second part of the type environment. Recall that the type environment is a pair of maps, the first being our normal notion of the type environment (mapping symbol identifiers to types), the second being a map from node identifiers corresponding to expressions to the inferred types for those expressions. It is not essential to handle this for this homework, but to do so you can simply retypecheck the left-side expression.

    Example:

     % stack run typecheck -- -v
     let x = 3;
     ...
     Program type checks.
     Type environment:
     ...
     x declared at line 1 cols 1-10: Int   <--- notice it is Int not <any>
    
     expression at line 1 col 9: Int       <--- type of RHS of assignment (3)
     expression at line 1 cols 1-10: Int   <--- type of LHS of assignment (x)

    Explain (in README.md): why is it important to update the symbol-type environment? Why not just leave the type of an unspecified identifer as AnyTy since it will match other expressions?


    Part III: typechecking function calls and return statements

  7. Complete tcFunCall so that it checks that the type of the function expression and the supplied arguments have the same arity. (For example, if f takes three arguments then the argument list passed to f should have length 3.) Assuming it does, each pair of types - the expected parameter and the type of the corresponding supplied argument should match. I recommend using zip (standard in Haskell) to combine two lists: the list of parameter types (paramTys) and the list of argument expressions (args). Then you can write a helper function which can be applied across that zipped list. tcFunCall should return the return type that corresponds to the function expression (retTy). Procedures should now typecheck properly. For example:

     console.log('Hello!');
     ...
     Program type checks.
    
     console.log(3);
     ...
     type mismatch in argument: expected String but found Int
  8. Add the function-call case to tcExp. This is intended to be easy: You can follow the model of procedure calls in tcStmt, except for the return types: it should be an error if the tcFunCall returns VoidTy. Example:

    let s = Jest.intToString(4103);
  9. Add the return-statement case to tcStmt. Ast.Return includes two pieces of information: a maybe expression (i.e. Just e if something is returned or Nothing for return;) and the identifier that corresponds to the enclosing function. Use findTy to get the type of the enclosing function. You can assume it is a function type (if not, you should use fatalError) and then match the return type of that function type to the type of the returned expression if there is one or, if not, match the expected return type to VoidTy. Try typechecking functions.js and function-errors.js.

    Explain (in README.md): Do expressions passed in as arguments to functions have to have specific types?


    Part IV: typechecking arrays and records

  10. Add the case for subscript expressions (A.Subscript) to tcExp. There are two elements of an a subscript expression: the “base” (the array) and the “index”. The base needs to be an array type of some kind. The integer needs to be an integer. The inferred type should be the element type of the array, so if the inferred type for the base is [t] then the returned type should be t. At this point you should be able to check subscripts, though they get more interesting after you can check array constructions (the next problem). Examples:

     % stack run typecheck -- -v
     let /*< [[Float]] >*/ bogusArrayOfArrayOfFloats;
     let floats = bogusArrayOfArrayOfFloats[47];
     let z = floats[3];
     let y = bogusArrayOfArrayOfFloats[47][31];
     ... Program type checks ...
     bogusArrayOfArrayOfFloats declared at line 1 cols 1-48: [[Float]]
     floats declared at line 3 cols 1-43: [Float]
     z declared at line 4 cols 1-18: Float
     y declared at line 5 cols 1-42: Float
    
     % stack run typecheck -- -v
     let /*< [[Float]] >*/ bogusArrayOfArrayOfFloats;
     let floats = bogusArrayOfArrayOfFloats[47];
     let z = floats[3];
     let /*<Float>*/ wrong0 = z[9];
     let wrong1 = floats[z];
     ... Type errors: 
     subscript on non-array at line 4 ...
     type mismatch in array index: expected Int but found Float at line 5 ...
  11. Add the case for array constructions (A.Array) to tcExp. Array constructions are lists of zero or more expressions (really tagged expressions), corresponding to Jest code like: [3, x, z*y]. The type rules for arrays require that the types of all the elements match each other (they do not have to be identical - see below). The returned type is then of the form ArrayTy t where t is the a type that matches all the types of the expressions in the list and that is at least as specific as all those types. This sounds a bit more complicated than it is. To simplify a bit, think of the expressions each having the same type, t, and the array construction having type [t]. The tricky part (it is not really that tricky) is how to try to find the type that matches the expressions. My recommendation is to write a helper function: either one that explicitly recurses over the list of expressions, or a function that can be supplied to a monadic list-folding function such as foldM. The important thing to identify: what is the base case? In other words, what is the type of the empty array? Once done, you should be able to produce results like these:

     % stack run typecheck -- -v
     let words = ['the', 'quick', 'fox'];
     let primes = [2, words.length, 5, 7, 11, 13];
     let numss = [[], primes, [1, 3, primes.length, 10], []];
    
     ... Program type checks ...
    
     words declared at line 1 cols 1-36: [String]
     primes declared at line 2 cols 1-45: [Int]
     numss declared at line 3 cols 1-56: [[Int]]
    
     stack run typecheck -- -v ~/compilers/jest/array-errors.js
     let /*< [Bool] >*/ primes = [2, 3, 5, 7];
     let words = ['the', 'quick', primes.length, 'fox'];
     ...
     Type errors found:
     type mismatch in assignment: expected [Bool] but found [Int]
     type mismatch in array elements: expected String but found Int
  12. Complete the case for record-member access (A.Access) to tcExp. What has been implemented for you already are the special cases where the accessor is a “built-in” field (length) or method (e.g., push) for strings and arrays. You need to add a case where the type of ob is of the form RecordTy ftm where ftm is the “field-type map” that maps field names (strings) to types. It is not hard to implement this, but it will help to have a specific example in mind:

     let /*< {name: String, age: Int} >*/ rec;
     let nm = rec.name;  // infer that nm's type is String
     let ag = rec.age;   // infer that ag's type is Int
     let wrong0 = rec.bogus;   // error: no such field

    So, you need to lookup the field (fd) in the field-type map of the type of the object (record) expression. You can use M.lookup which returns a (Maybe SemTy). Nothing is the last case above - no such field.

  13. Add the case for record constructions (A.Record) to tcExp. The null “literal” itself is easy - its type is NullTy. Record expressions are formed around lists of pairs of strings (field names) and tagged expressions (the values being assigned to those fields). As with array constructions you will want to write a helper function to traverse the list. In this case, you want to build up a mapping of strings to types by using M.insert. For each pair in the list, use the first part as the key and the inferred type of the second part (the tagged expression, so recursively typecheck using tcTexp) as the value to insert in the map. Examples:

     % stack run typecheck -- -v
     let person = { name: 'Tucker', age: 10 };
     let /*< { name: String, ages: [Int] } >*/ item = { name: 'Abby', ages: [4, 9] };
     let nested = { p: person, it: item};
     nested.p.name = item.name;
     ++person.age;
     ... Program type checks ...
     person declared at ...: { age: Int, name: String }
     item declared at ...: { ages: [Int], name: String }
     nested declared at ...: { it: { ages: [Int], name: String }, p: { age: Int, name: String } }
     ...
     expression at line 4 ...: { ages: [Int], name: String }
     ...
     expression at line 5 ...: { age: Int, name: String }
     expression at line 5 ...: Int
     expression at line 5 ...: Int
     expression at line 5 ...: { age: Int, name: String }
  14. Add the case for null to tcExp (easy — its type is NullTy). Complete isSpecific and matches if you have not done so before, following the guidelines from the type rules: Two types match if:

    The following should now typecheck:

     let person = { name: 'Tucker', age: 10 };
     let /*<{ name: String, item: String }>*/ gift = null;
     let nested = { p: person, g: gift};
     nested.p = null;

    But this should produce an error:

     % stack run typecheck
     let r = null;
     ...
     Type errors found:
     specific type required for initial assignment at line 1

    Explain (in README.md): What motivation do we have for bothering to distinguish between specific and nonspecific types? What sort of restrictions would we need to place on Jest if we wanted avoid including anything like AnyTy or NullTy in SemTy and how onerous (if at all) would they be on a would-be Jest programmer? One argument in favor of having nonspecific types as part of our semantic type system is that it might allow us (one day, in principle) to extend Jest to have new and more powerful type constructs. Can you think of examples of what that would mean and why it might be useful?


    Conclusion: testing your work

  15. Test your code thoroughly, not just on the examples provided, but on your own examples, too. You are welcome to add examples to the examples folder in your project before submitting. If you do so, please indicate what you have added in the README and document your Jest code.

top


Submitting your work

Submit your entire project as a zipped archive (so attach the one .zip file), but run

stack purge

before zipping it up. Name the zip file something that indicates it is yours, so something like mike-types.zip. It is essential that you modify README.md (found in the main project folder) as part of the assignment. It should include at least, your name and answers to any of the parts of the assignment that ask you to “explain” something. You should also include the status of the assignment as submitted: how much did you complete, known bugs, etc. The .md extension implies markdown format, but you can just use plain text in the file.

The body of your email should be left blank. Any specifics about your assignment you think I should require (for example, known bugs or what you did not have time to complete) should be included in proofread comments near the top of your file.

top