Compilers: Program 1

Due: Friday, February 21 at 3pm

Lexical analysis for Jest using the Alex lexer generator

Before starting to implement the scanner: Read this assignment in its entirety and the Jest reference manual, and familiarize yourself with the basics of using Alex (the Haskell lexer generator).

The most challenging aspect of the assignment will be understanding Jest and getting familiar with Alex and running it using stack (the Haskell project management and build tool) . Once you understand those two concepts, building the scanner should become a relatively straightforward task (hence, the short window between the assigned and due dates).


What you need to do

Download the starter project and unzip it. Using a terminal command line, navigate into the folder (jest-scanner) and compile it using stack:

 $ stack build

(This will run Alex and compile the other files in the project.)

You can then run the fledgling scanner as:

$ stack exec jest-scanner-exe

The scanner expects “standard input”, so upon executing it you can type some simple Jest (or any text) to see how it responds. For example:

$ stack exec jest-scanner-exe
while (true) { break; }

and it should produce output like:

while  //  While  [1:1]
(  //  LParen  [1:7]
true  //  TrueLit  [1:8]
)  //  RParen  [1:12]
{  //  LBrace  [1:14]
break  //  Break  [1:16]
;  //  Semicolon  [1:21]
}  //  RBrace  [1:23]

Use ctrl-D to exit the program - that sends the “end of file” signal to the scanner. You can also run the program using a file as input by using I/O redirection as in:

$ stack exec jest-scanner-exe < while.js

assuming while.js is a file in your current directory (folder). That should produce line-by-line output of tokens, as above, but does not require that you type ctrl-D when done.

You can also ask the program to print the tokens in one line, using the -l flag, as in:

$ stack exec jest-scanner-exe -- -l

Out of the box, the scanner cannot handle much beyond the example above. Read over the Scan.x file (in the src subfolder of the project) to see what has been implemented for you and to get a send of the organization of the scanner specification file. (You should also look at src/Token.hs, src/Unscan.x, and app/Main.hs.)

  1. Add the complete set of reserved words, operators, punctuation, and boolean and integer literals to the lexical specification. Try to be organized about where and how you add entries to the Alex grammar file. While the file generated by Alex is not intended (at least not fully intended) to be read by humans, the .x specification file itself is. You may wish to use macros to make your regular expressions more readable. Use comments where appropriate. All that is involved here is understanding which tokens are required to parse Jest. That is implied fairly clearly in the Jest reference manual (and quite explicitly in Token.hs and Unscan.hs. You should successfully be able to handle programs such as this that make use of many features of Jest, but do not mention identifiers, floating-point literals, string literals, nor comments.

  2. Add single-line (//) comments and try this example. (Once you have this working, you can begin testing for idempotence.)

  3. Add identifiers. Now you can try some more interesting examples like recursive factorial:

    /*< (Int): Int >*/
    function factorial(n) {
        if (n === 0) {
        return 1;
        } else {
        return n * factorial(n - 1);


    const N = 10;
    let /*<Int>*/ i;
    let sum = 0;
    i = 1;
    while (i <= N) {
        sum += i;
  4. Add floating point literals. Include exponential format, so you should be able to scan all of the following (and more):

  5. Add string literals, which are very much like JavaScript strings, but contain 8-bit ASCII characters (as opposed to 16-bit Unicode characters). Remember, one feature of JavaScript (preversed in Jest) is that strings can be surrounded by either double or single quotes. For now, you need not handle escaped quotes (strings with quotes in them). You should try out your scanner on this example. Later you will add escaped quotes and address a subtlety that arised from embedded escape symbols more generally (see below.)

  6. Add block (/*...*/) comments. Make sure these do not interfere with your program's ability to scan types. (At this stage do not worry about nested comments.)

  7. Strings are not quite as simple as presented above. Consider:

      "this string has a newline \n embedded in it"

    up to now, we have tokenized that string quite literally. But, in practice, we want to replace the \\n with an actual newline. Likewise for \\t, \\r\, \\\", \\\' and \\\\. We should also remove the beginning and ending quote symbols. Implement proper handling of escape sequences including escaped quotes by adding action code to the rule(s) for string-literal formation. Testing requires a slight variation on our usual idempotence method. Explain why in a comment in the relevant section of your code.

  8. Unlike in JavaScript (or in C, C++, or Java), block comments can be nested. Thus,

      /*  this /* is */ nested  */

    is legal in Jest as the whole thing is treated as a comment. There is good reason why we want this feature in Jest. Explain why in a comment in the relevant section of your code.

    In order to implement nested comments, first realize that it is not easy to compactly specify such comments as regular expressions. Why? Go back and ponder the final question on the first problem set. Explain the connection in a comment in the relevant section of your code. (You are not being asked to implement nested comments for the regular part of this assignment.)

    Challenge Problem

*. Implement nested block comments. You can test nested comments on this example. Use Alex “user states”. (Discuss with me before attempting.)


You will need Alex and Stack.


All the files you need are included in this archive.

Submitting your work

You should submit only your Alex grammar file (Scan.x). Submit it as an attachment replying to the official email announcing this assignment. Leave your file named as is (Scan.x).

The body of your email should be left blank. Any specifics about your assignment you think I should require (for example, known bugs or what you did not have time to complete) should be included in proofread comments near the top of your file.