Compilers: Program 2

Due: Friday, March 13

Syntactic analysis for Jest using parser combinators

Before starting to implement the parser: Read this assignment in its entirety and reread the Jest reference manual. Familiarize yourself with the basics using Haskell’s monadic parsing library (Text.Parsec)


Regression testing

As you add capabilities to your parser, use regression testing: rerun your parser on valid (and invalid) Jest syntax that you were previous able to parse (or identify syntax errors for) to make sure it still correctly parses (or identifies syntax errors).

Example: as a first pass on assignment parsing, the following might parse correctly (as it should):

x = 3;

and

3 = x;

might generate a syntax error - as it should! Then, after your add the ability to handle “assignment operators” to allow for:

x += 3;

you should make sure that

x = 3;

still parses correctly and that

3 = x;

still produces a syntax error.


What you need to do

Download the starter project, and try compiling and running it. As it stands, the parser is not good for much: it can only successfully parse very simple assignment statements (where the right-hand side is just a literal), break statements, simple while statements and a few other odds and ends.

  1. Sketch an initial grammar for Jest on paper. Test the starter grammar as in this transcript.

  2. Add continue statements. This a trivial warmup - follow the example of break. Now you should be able to parse:

     continue;

    Note: break and continue are only legal inside of loops. However, we are not testing for that property using our parser (we will test for it as a form of static analysis in a subsequent stage of our compiler).

  3. Add identifier expressions (you can use tIdExp) and parenthesized expressions - both should be made part of tTerm. Then you should be able to parse assignments like:

     x = y;
     z = (34);
  4. Add increment and decrement statements such as:

       ++slc;
       --thingsWeWantLessOf;

    Remember that ++ and -- correspond to their own tokens. Later, you may need to revise these rules because any expression that is valid on the left-hand side of a regular assignment statement (an lvalue) is valid as the operand following ++ or --.

  5. Add support for handling the binary operations + and * - this is largely implemented for you, you just need to “connect the dots” and make use of the Text.Parsec.Expr module (already imported) and its buildExpressionParser function. (Follow the model we used for parsing ReCalc expressions.) Now you can parse expressions like this:

      $ stack run parser -- -e
      3 + 4 * 5
    
      Expression parsed successfully.
      Unparses to:
    
      3 + (4 * 5)

    If you connected the supplied combinators as expected, you should also be able to handle division and the “not” (!) operator. So the following should parse correctly:

      a +   !   b*c

    as:

      a + ((!b)*c)

    Note, unary operations like ! and the unary (- - also known as negation) are nonassociative in our parser. (It does not have to be that way, but that is the way Haskell’s combinators work with buildExpressionParser.) So the following should produce syntax errors:

      !!b  // "not not b"

    although it would make sense to read as:

     !(!b)

    The latter (i.e., the version with parentheses) should work fine. (We will revisit the nonassociativity restriction when we add array subscripting and record-member access later.)

  6. Add binary operation rules for all remaining arithmetic, logic and comparison operators. Follow the precedence rules from the Jest reference manual.

    At this point you can parse crazy expressions like:

     a + b / c * d < e || ! f && g - h % i > j &&  k - - m + n !== q

    as:

     ((a + ((b / c) * d)) < e) || (((! f) && ((g - (h % i)) > j)) && (((k - (- m)) + n) !== q))

    Note that the comparison operators (<, etc.) do not associate in Jest, so this is a syntax error:

     a < b < c
  7. Add production rules for do-while statements - they are quite similar to while statements. So the following should now parse:

     do {
         sum = sum + i;
         ++i;
     } while (i <= n);

    and so should more nested loops like:

     do {
         i = 0;
         while (i < j) {
             x = x / 2;
             ++i;
         }
         ++j;
     } while (j !== n);
  8. Add assignment-operator statements. Examples:

     product *= n; 
     long_string += short_string;
     thingsWeWantLessOf -= 7;

    Again, anything that can be on the left-hand side of a regular assignment can be on the left-hand side here. This might pose a challenge for your parser - remember to run regression tests! (If it does, consider using the try combinator.)

  9. Add if statements. They are much like while statements except the else branch is optional. (I suggest using the optionMaybe combinator to help.)

  10. Add return statements. Whether an expression is returned is optional.

  11. Add atomic type annotations and enable initialized declarations (by modifying pProgram). This should allow for an optional type as in these examples:

     let /*<Bool>*/ this_assignment_is_fun = true;
     let types_required_here = false;
     const /*<Int>*/ NUMBER_OF_OWLS = 97;
     let /*<String>*/ s = 'hello';

    Note: basic Jest types (Int, etc.) are lexically just identifiers. This has two implications: first, just what it says (there are no separate tokens for type names such as “Bool”, “Int”, etc.). Second, syntactically any identifier is valid where a type identifier is expected. Invalid types will be identified in a subsequent phase of the compiler. So:

     let /*<bogus_type>*/ x = 33;

    is legal Jest syntax (though not valid in terms of types).

  12. Add uninitialized variable declarations of the form:

    let /*<Int*>/ a, b, c, d;

    (Careful to make sure to run regression tests!)

  13. Modify tBlock to include local declarations and add compound statements. Make sure that declarations now work in while, if and do-while statements.

  14. Add function definitions. They are legal at the global level, but not inside other blocks.

  15. Add function-call expressions. Surprising though it may seem, they can be thought of as postfix operations:

     <some function expression> ( <zero or more arguments>)

    as in:

     coolFun(a, 99)

more challenging tasks

Try to implement as many of these as you can.

top


Helpful resources about monadic parsing and the Parsec library

top


Submitting your work

You should submit only your parser file (Parse.hs). Submit it as an attachment replying to the official email announcing this assignment.

The body of your email should be left blank. Any specifics about your assignment you think I should require (for example, known bugs or what you did not have time to complete) should be included in proofread comments near the top of your file.

top