Introduction to Computer Programming

Program 3: Cryptology (and strings)

Due date: Thursday, October 11 at 5pm


Instructions


Background

The primary goal of this assignment is to use some very rudimentary aspects of Python strings to implement monoalphabetic substitution ciphers: how to encrypt and decrypt messages using them (and, to a limited extent, how to cryptanalyze them using frequency analysis). To better understand these principles you will need to read pp. 1-25 of Simon Singh's The Code Book. (The excerpt is available via e-reserve on MySLC.)

We represent keys as 26-letter strings of uppercase letters where each letter appears exactly once. (In other words, a key can be thought of as a permutation of the uppercase alphabet.) Examples include:

To manually use a key to encrypt a string, we can line up the alphabet with the key, like this:

abcdefghijklmnopqrstuvwxyz
COMPUTERSINABDFGHJKLQVWXYZ

then to encrypt "headache" using that key, we find the cipher letter corresponding to each plain letter so we get "RUCPCMRU" like this:

headache
RUCPCMRU

Guidelines

Unless explicitly indicated otherwise:


To begin

Download the three files for this assignment: the starter file, the solution file, and the main REPL. Place them all in the same folder. The first is the starter file where all your code will go. The second is a solution version of that file. The third is the REPL (and a supporting function or two) which has been written for you and requires no action on your part - it calls the code in the your file.

If the files are in the same folder and you run the REPL file out of the box, it will start the top-level function crypto_repl which will give you a basic menu of options for creating keys, encrypting text, decrypting text, and performing frequency analysis. Several of the features have been implemented for you. Experiment with the REPL before starting to write code.

In general, you will want to work in order through the problems. However, you can always fall back on the hw3x version of a solution so that being stuck on one problem will not prevent you from trying others elsewhere in the assignment. For example, if you want to temporarily have occurrences work using the solution version, then make sure your hw3.py has an import hw3x command near the top of the file, and you can use:

def occurrences(s, ch):
    return hw3x.occurrences(s, ch)

The assignment itself

  1. Complete the function occurrences(s, ch) so that it returns the number of times that character ch occurs in string s. ch can be any character (it need not be a letter). This is a case sensitive function. Use a for loop over s. Examples:

    >>> occurrences('hello', 'l')
    2
    >>> occurrences('To be or not to be.', ' ')
    5
    >>> occurrences('Case Sensitive', 's')
    2
    >>> occurrences('Case Sensitive', 'S')
    1
  2. Complete are_anagrams(s, t) so that it returns True if t is a rearrangement of s (and False otherwise). Call occurrences and use a while loop. Examples:

    >>> are_anagrams('evil', 'live')
    True
    >>> are_anagrams('Kroy', 'York')  # case sensitive!
    False
    >>> are_anagrams('cool?', '?oclo')
    True

    Once the first two problems are working, the supplied is_valid_key should work.

  3. Complete scramble(s) so that it returns a new string that consists of the same characters as s, but in a random order. Use a for loop, random.randrange and string slicing. Examples might include:

    In []: scramble('evil')
    Out[]: 'veil'
    In []: scramble('Is now the time?')
    Out[]: 'ho nesmt eIw i?t'
    In []: are_anagrams(scramble('whatever'), 'whatever')
    Out[]: True

    Once complete, the supplied function make_random_key() should work so it returns a string consisting of a randomized arrangement of all 26 uppercase letters. For example:

    In []: make_random_key()
    Out[]: 'RNTSAJKBXEGIWFLOVMPDZQUHCY'

    You can test this out using the the r)andom key option from the k)ey menu of the REPL.

  4. Examine encode(upletter) - it has been completed for you. Complete decode(code_number) so that it returns the uppercase letter corresponding to code_number where 0 is 'A', 1 is 'B' and so on through 25 being 'Z'. You can assume code_number is not negative and less than 26. You can (and should) solve this in a single line. Then complete make_shift_key(shift) so that it returns a key string (an arrangement of all 26 uppercase letters) that corresponds to shifting the alphabet forward by shift places. You can again assume shift is not negative and that it is less than 26. Use a for loop, call decode and make user of the % operator. Example:

    >>> make_shift_key(3)
    'DEFGHIJKLMNOPQRSTUVWXYZABC'

    You can test this out using the the s)hift key option from the k)ey menu of the REPL.

  5. Complete encrypt(key, plaintext) so that it returns the encrypted version of plaintext using key as a monoalphabetic substitution key. It should only apply the substitution to uppercase letters; all other symbols should remain as is. Call encode. Examples:

    >>> encrypt('DEFGHIJKLMNOPQRSTUVWXYZABC', 'NOW IS THE TIME!')
    'QRZ LV WKH WLPH!'
    >>> encrypt('DEFGHIJKLMNOPQRSTUVWXYZABC', 'Careful')
    'Fareful'

    You can test this out using the the e)ncrypt option from the main menu of the REPL (assuming that both some text and a key have been specified).

  6. Complete find_first(s, ch) so that it returns the index of string s where character ch first occurs. Returns None if ch does not occur in s. Examples:

    >>> find_first('hello', 'l')
    2
    >>> find_first('Computer Science', 'c')   # case sensitive!
    10
    >>> find_first('Computer Science', 'C')
    0
    >>> find_first('hello', 'x') is None
    True
  7. Complete invert_key(key) so that it returns a new key that is the inverse of key. Call find_first and decode. Examples:

    >>> invert_key('DEFGHIJKLMNOPQRSTUVWXYZABC')
    'XYZABCDEFGHIJKLMNOPQRSTUVW'
    >>> invert_key('RNTSAJKBXEGIWFLOVMPDZQUHCY')
    'EHYTJNKXLFGORBPSVADCWQMIZU'

    (Hint: make use of string.ascii_uppercase.)

    By completing invert_key, the supplied decrypt function and the decryption option of the REPL should now work.

  8. Complete make_phrase_key(phrase) so that it returns a key string built from phrase. Case is ignored in forming the key; all other (i.e., nonalphabetic) symbols are ignored. If the key would not be a full 26 symbols, the remainder of the unused alphabet, in order, should be added to the string. Examples:

    >>> make_phrase_key('')
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    >>> make_phrase_key('RNTSAJKBXEGIWFLOVMPDZQUHCY')
    'RNTSAJKBXEGIWFLOVMPDZQUHCY'
    >>> make_phrase_key('The quick brown fox jumps over the lazy dog.')
    'THEQUICKBROWNFXJMPSVLAZYDG'
    >>> make_phrase_key('Computer science')
    'COMPUTERSINABDFGHJKLQVWXYZ'

    You can test this out using the the p)hrase option from the k)ey menu of the REPL.

  9. Complete remove(s, ch) so that it returns a new string that is like s but without any occurrences of character ch. Examples:

    >>> remove('hello', 'l')
    'heo'
    >>> remove('Computer Science', 'c')   # case sensitive!
    'Computer Siene'
  10. Complete most_frequent(s) so that it returns the character in s that occurs the most times. If more than one character occur that many times, it returns the first one that occurs. It should return None if s is empty. Call occurrences. Examples:

    >>> most_frequent('') is None
    True
    >>> most_frequent('hello')
    'l'
    >>> most_frequent('abba')
    'a'
    >>> most_frequent('computer science')
    'c'
    >>> most_frequent('Computer Science')   # case sensitive!
    'e'

Challenge problem

Complete visualize_frequencies(s, scale, symbol) so that it returns a string that represents a histogram of the frequencies of the symbols occurring in s. The returned string consists of one line per distinct character in s. Each line in the returned string should consists of a character from s and zero or more of symbol. The number of times symbol occurs on each line indicates the relative frequency of that character scaled according to the scale parameter. The most commonly occurring character will have scale symbols on its line. The lines of the string should be in order from the most commonly occurring character to the least. You should call remove and most_frequent. An example:

In []: print(visualize_frequencies('abBBBBBbbccddddee', 20, '#'))
B: ####################
d: ################
b: ############
c: ########
e: ########
a: ####

By completing visualize_frequencies, the frequency analysis feature of the crypto REPL should now work.