Homework 5 - Fall 2023

Introduction to Computer Science: The Way of the Program — Homework 5

Due by 11:59pm Tuesday, October 10

Reading

Study the code examples we discussed in class this week.
Read Chapter 6 of Python Programming for next week.

Programming Exercises — Part 1

Include the following programs from Lab 5 in your assign5.py file:
- wordlen()
- linenums()
- airports()
- people()
- movie() — the version that includes the average rating
- viewers() — the version that includes the youngest and oldest ages
Below is a version of countletters() that reports the letter frequencies of all letters from A to Z in a file (this was exercise 3 from Lab 5):
```
def countletters():
    filename = input("Enter filename: ")
    f = open(filename, "r")
    allChars = f.read()
    f.close()
    allChars = allChars.upper()  # convert to uppercase
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    for letter in alphabet:
        num = allChars.count(letter)
        print("{:6} {}'s".format(num, letter))
```
Modify this program so that it reports the most frequent letter (and its letter count) at the end, as shown below. Hint: build up a list of the letter counts in the for-loop, and then use Python's max(list) function to find the largest value in the final list. To determine the corresponding letter, you can use the list method L.index(x), which returns the position of an element x in the list L.
```
>>> countletters()
Enter filename: haunting.txt
    31 A's
     5 B's
     6 C's
     ...
    12 Y's
     0 Z's
Most frequent letter E occurs 42 times
```

Reading Data from Web Pages

The next few exercises involve reading information from web pages. IMPORTANT: If you are using a Mac, you will first need to run the following command (just once) on your computer: Applications → Python 3.11 → Install Certificates.command. If you have a different version of Python on your machine, choose your own version instead of Python 3.11. If you are using Windows, you don't need to run this command at all.

To open a web page via a URL address, we use the function urlopen from the urllib.request module, instead of the usual open function for files. Thus you will need to put import urllib.request at the top of your code. When you read information from a web page using the read() method, you get back a "bytes" object instead of a string. This object looks a lot like a string, but to use it in your program, you must first convert it to a string by calling its decode() method, which will return a normal string. For example, to read in all of the characters from a web page, you would do this:

import urllib.request

u = urllib.request.urlopen("http://....some URL address....")
rawBytes = u.read()
u.close()
allChars = rawBytes.decode()

If you want to read in all of the lines from a web page, you can use the readlines() method, but this will give you back a list of bytes objects, instead of a list of strings, so you will need to call decode() for each individual line. For example, to loop through each line, you could do something like this:

u = urllib.request.urlopen("http://....some URL address....")
rawLines = u.readlines()
u.close()
for line in rawLines:
   line = line.decode()
   ...

Programming Exercises — Part 2

Information about current weather conditions at John F. Kennedy international airport in New York City (airport code JFK) is available on the web at the URL http://w1.weather.gov/xml/current_obs/KJFK.xml. Likewise, weather conditions in Los Angeles (airport code LAX) are available at http://w1.weather.gov/xml/current_obs/KLAX.xml. In general, we can retrieve weather information from most U.S. airports by using the URL string http://w1.weather.gov/xml/current_obs/K---.xml with the --- replaced by the 3-letter airport code. The extra K in front of the airport code must always be included. (Airports in Alaska and Hawaii use P in place of K, and Canadian airports use C.)

Opening the URL with urllib.request.urlopen, reading in the contents with read, and converting the contents to a string using decode, will yield a complicated-looking string of XML code. This string contains all of the relevant weather information, with different pieces of information surrounded by different XML tags. Specifically, a substring describing the airport location is surrounded by the tags <location> and </location>, and the current temperature information is surrounded by the tags <temperature_string> and </temperature_string>. You can use the find method to locate these tags within the XML string and extract the relevant information between them.

Write a program called currentTemp() that asks the user for a standard 3-letter airport code, constructs the appropriate URL string for that airport, opens the web page and reads in the XML string, and then extracts the location and temperature information from the XML string, printing it out as shown in the examples below.
```
>>> currentTemp()
Enter a 3-letter airport code: JFK
Current temperature at New York, Kennedy International Airport, NY is 70.0 F (21.1 C)

>>> currentTemp()
Enter a 3-letter airport code: LAX
Current temperature at Los Angeles, Los Angeles International Airport, CA is 61.0 F (16.1 C)

>>> currentTemp()
Enter a 3-letter airport code: BMG
Current temperature at Bloomington, Monroe County Airport, IN is 76.0 F (24.4 C)
```

A summary of all of the earthquakes that have occurred in the world during the past 24 hours is available online from the U.S. Geological Survey website at http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv. Each line contains information about a particular seismic event somewhere in the world, organized into 22 comma-separated fields. For example:

2020-10-09T23:05:37.531Z,57.5999,-155.146,7.4,3.3,ml,,,,0.72,ak,ak020d0etbxp,2020-10-09T23:09:46.156Z,"41 km W of Karluk, Alaska",earthquake,,0.2,,,automatic,ak,ak

The relevant fields for our purposes are the following:

Field	Interpretation	Example
0	Combined date and time	`2020-10-09T23:05:37.531Z`
1	Latitude (degrees)	`57.5999`
2	Longitude (degrees)	`-155.146`
3	Depth (kilometers)	`7.4`
4	Magnitude	`3.3`
13	Place	`"41 km W of Karluk, Alaska"`

The combined date/time field is encoded as year-month-dayThour:minute:secondZ, with a single T separating the date and time portions of the field. The trailing Z indicates that the time is Universal Time (UTC), which is four hours later than U.S. Eastern Daylight Time (EDT). For example, the above date/time string specifies 23:05 UTC on October 9, 2020. Notice also that the location of the earthquake is surrounded by double quote marks, which makes extracting the location information easy. Write a program called latestquake() that opens the earthquake page, reads in the information about the most recent earthquake, and prints it out in a nicely formatted way, exactly as shown below. Hint: the most recent earthquake is always the second line in the page (the first line contains the field names, which can be skipped), so there is no need to use a for-loop for this problem. Your program's output should be formatted exactly as shown, with the location followed by the time and date:

A magnitude 3.3 earthquake occurred 41 km W of Karluk, Alaska at 23:05 UTC on 2020-10-09

NOTE: if you have problems opening the earthquake URL on your computer with urllib.request.urlopen, you can use the text file quakeData.txt as your data source instead. This file is an ordinary text file that contains a "snapshot" of the earthquake data from the web page, downloaded on October 6, 2023. However, if you get a "UnicodeDecodeError" when reading data from this file, you may need to open it using a different character encoding, like this:

f = open("quakeData.txt", "r", encoding="utf-8")

EXTRA CREDIT PROBLEMS (OPTIONAL)

You should finish the other problems first before working on these.

Write a program called maxquake() that prints out the largest earthquake that has occurred in the past 24 hours, in the format shown in Exercise 4. Hint: use four variables called magList, placeList, timeList, and dateList, all initialized to empty lists []. As you loop through the data for each earthquake, add the earthquake's information to each of the lists "in parallel". You can then use Python's max(list) function to find the largest magnitude in magList, and from there retrieve the information from the other lists at the corresponding positions. Don't forget to convert the earthquake magnitudes from a string to a number before storing them in magList!
Write a program called plotquakes() that graphically plots the locations of all earthquakes that have occurred in the past 24 hours on an 800 × 475 pixel map of the world, available as the GIF image file: worldmap.gif. Create an 800 × 475 graphics window and use setCoords to set the x-coordinates of the window to the range -180 to +180 (longitude) and the y-coordinates to the range -90 to +90 (latitude). Then create an Image object from the worldmap.gif file and draw it centered in the window at (0, 0). The latitude and longitude values for each earthquake are given in fields 1 and 2, respectively. Important note: the latitude value corresponds to the y-coordinate in the graphics window, while the longitude value corresponds to the x-coordinate. Draw a small colored circle for each earthquake. If you prefer, you can choose the color (or the size) of each circle based on the magnitude of the earthquake, but this is not required. Your program's output might look something like this:
Write a program called letterhist() that asks the user for the name of a text file and then counts the number of occurrences of each letter of the alphabet in the file (ignoring upper/lower case). The program should display the results graphically as a frequency histogram showing the total number of A's, B's, C's, etc. The letter corresponding to a bar should appear at the bottom, and the actual letter count should appear just above the top of the bar (use a smaller font size for this). You should also include some blank space around the border of the histogram to improve readability. For example, a histogram created from the file haunting.txt is shown below:

Once your program works for haunting.txt, try it out on Alice's Adventures in Wonderland (alice.txt), Huckleberry Finn (huckfinn.txt), and Moby Dick (moby.txt). Make sure that it draws the histograms for these larger files correctly. If it doesn't, that means that you've built some implicit assumptions about the expected size of the input into your graphics code, so you'll need to go back and make the code more general.

How much do the overall shapes of the histograms vary for these three files? How well do they match the distribution pattern for haunting.txt? Although these texts are by different authors, they're all in English, and English has a characteristic average distribution of letter frequencies. The larger your sample size, the more your histogram will approach the average distribution for English. We can use this idea to easily break simple encryption schemes that are based on shifting all letters by the same amount in the alphabet in a circular fashion (like the substEncrypt program we wrote in class). For example, by comparing the histograms for the encrypted file mystery.txt and Moby Dick, can you figure out how to decode the file? If so, do it!

Turning in Your Homework

Save all of your program definitions in a single Python file called assign5.py. Make sure to include your name and the assignment number in a comment at the top of your file. Submit your file electronically using the Homework Upload Site. Please DO NOT email your file to me.
If you have questions about anything, don't hesitate to ask!