# Introduction to Computer Programming

## Program 5: Monkey Shakespeare

### Due date: Tuesday, November 21 at 5pm

#### Instructions

• Before writing any code, read this entire document (and the starter file) thoroughly!
• Unless otherwise indicated, use only the Python operations and commands we have been using thus far in class and lab.
• You will need to download two files for this assignment: the starter file and the REPL.
• Write all your code in the supplied starter file. Work in areas indicated by the comments marked with `...` and once you have completed work on that area, remove those `...` comments.
• When ready to submit your work, rename your `hw5.py` file as `hw5_`yourusername`.py` where yourusername is your `@gm.slc.edu` username. Submit by replying to the email from the instructor announcing the assignment (it should have the subject: “ICP17: Programming Assignment 5” and attach the one relevant file. Do not attach any other files. Do not expect the instructor to read anything in the body of the message.
• You are welcome to download a runnable solution module called `hw5x`.

#### Range tables

For the purpose of this assignment, we define a range table to be a list of triples of the form `(lo, hi, x)` where `lo` and `hi` are nonnegative integers, `lo` is less than `hi` - so `(lo, hi)` mark a nonempty Python range and the the ranges in the list are consecutive meaning that the higher part of one triple becomes the lower part of next triple and the first lower element is 0. The idea is that the list represents the frequencies of the items (`x`) that occur int he table - the larger the range, the more likely the item.

### The assignment itself

1. Download the two files: the starter file and the REPL. The first is the starter file where all your code will go. The second is the REPL which has been written for you and requires no action on your part - it calls the code in the other file. Out of the box, if the two files are in the same folder and you run the REPL file it will start the top-level function `markov_repl` which will give you a basic menu of options for reading and analyzing a text file and for generating Markov sentences based on those analyses.

2. Complete the function `normalize_char(c)` so that if `c` is whitespace, a letter, or an apostrophe (`'`) it is returned; if it a sentence ending form of punctuation (`!`, `?`, `.`) is replaced by a "normalized sentence ending" - which in this case should be period surround by a space on either side; all other characters should be replaced by a single space. Examples:

``````>>> normalize_char('x')
'x'
>>> normalize_char('\$')
' '
>>> normalize_char("'")
"'"
>>> normalize_char('?')
' . '``````
3. Complete `replace_by_fun(s, f)` so that it returns a new string where every character is replaced by the result of calling `f` on s. You can assume `s` is a string and `f` is a function that takes as input a character (i.e., a string of length 1) and returns a string. Examples:

``````>>> replace_by_fun('Hello!', lambda x:x)
'Hello!'
>>> replace_by_fun('lions, tigers, bears - oh my!', normalize_char)
'lions  tigers  bears   oh my . '``````
4. Complete `parse_into_words(s)` so that it returns a list of lowercase "words" in s where s has been normalized (by using the two previous problems). I recommend using `.lower`, `.strip`, and `.split`. Example:

``````>>> parse_into_words('To be -- or not to be. That is the question?')
['to', 'be', 'or', 'not', 'to', 'be', '.', 'that', 'is', 'the', 'question', '.']``````
5. Complete `find_next(seq, x, start=0)` so that it returns the position in sequence (i.e., lists, tuple or string) `seq` where the next occurrence of `x` occurs, starting a position `start`. If `x` does not occur from any point between `start` and the end of`seq`, it should return `None`. Examples:

``````>>> words = ['to', 'be', 'or', 'not', 'to', 'be', '.', 'that', 'is', 'the', 'question', '.']
>>> find_next(words, 'be')
1
>>> find_next(words, 'be', 2)
5
>>> find_next(words, 'be', 8) is None
True
>>> find_next(words, '.')
6
>>> find_next(words, '.', 7)
11``````
6. Complete `build_firsts(words)` so that it returns a bag (i.e., a dictionary) that consists of the number of times that pairs of words occur as the first two words of sentences in `words` (which can be assumed to be a list of words). The beginning of a sentence is either the first words in the list, or the words immediately following `'.'` in the list. Example:

``````>>> words = ['to', 'be', 'or', '.', 'to', 'be', '.', 'that', 'is', 'the', 'question', '.']
>>> build_firsts(words)
{('to', 'be'): 2, '('that', 'is'): 1}``````
7. Complete `build_range_table(bag)` so it returns a range table derived from `bag`. Example:

``````>>> words = ['to', 'be', 'or', '.', 'to', 'be', '.', 'that', 'is', 'the', 'question', '.']
>>> bag = build_firsts(words)
>>> build_range_table(bag)
[(0, 2, ('to', 'be')), (2, 3, ('that', 'is'))]``````
8. Complete `find_match(items, x, matches)` so that it returns the first position in `items` where `matches` when called on an item in `items` and `x` returns True. This is just like linear search, but depends on the function `matches` rather than equality. If no matches are found, it returns `None`. Assume that `matcher` is defined as:

``````def matcher(triple, x):
return triple[1] > x``````

then here are some examples:

``````>>> triples = [(0, 2, ('to', 'be')), (2, 3, ('that', 'is'))]
>>> find_match(triples, 1, matcher)
0
>>> find_match(triples, 2, matcher)
1
>>> find_match(triples, 3, matcher) is None
True``````

9. Complete `choose_from_rt(rt)` so that it randomly selects an item from range table `rt` according to the weights implied by the ranges. You can examine the upper part of the range of the last item in the table to determine over what range to select a random number. You can assume the lower part of the first range is 0. Call `find_match`. You can test your code by using the REPL and choose `f`irst words option.

10. Complete `build_all_bags(words)` so that it returns a dictionary that maps pairs of words to bags that represent the frequency of the words that follow each pair. So, for instance if "to be" is a pair of words that exists consecutively in `words`, twice followed by "or" and once followed by "that" then the its entry in the the result of this function might look like:

``{..., ('to', 'be'): {'or': 2, 'that':1}, ... }``
11. Complete `build_all_rts(bags)` so that assuming `bags` is a dictionary such that would be generated by the previous problem, it returns a dictionary that maps pairs of words to the range table correspond to the bag in `bags`.

12. Complete `markov_sentence(first_rt, all_rt, max_length=50)` so that it returns a Markov-generated sentence consisting of a pair of words from `first_rt` (the range table for first words) and words in the range tables in `all_rt`. Test your code using the REPL.

#### Extra credit

• Modify `find_match` so that it uses a form of binary search instead of linear search.

• Generalize your code so that it works not only on pairs of words at a time, but more generally on n-tuples. If n is 1, the smaller n, the more random the generated "sentences"; the larger n, the more the sentences will conform to the original text.