py-sqlite

Databases

Programming Assignment:

Python-SQL interaction

Due date: Friday, March 10 at 5pm

Instructions

Read this entire document thoroughly.
Work on your own. (Or if you clear it with me at the outset, you may work with one partner.)
Choose one (and only one!) of the two problems: family trees or flights.
Design your project in detail before writing any code.
Document your code and make it readable and consistent.
Make your code robust.
Thoroughly test your code.
Include a readme file that contains an overview of your project, how to run it, and examples of the kinds of output it produces.
Submit one zipped archive of a folder consisting of three or more files: the readme, the SQLite database, and however many Python files you wrote to make your project run.
When ready to submit your work, reply to the email officially announcing the assignment and attach the one zipped file.

Fun with family trees

Create a family-tree database schema with at least two tables such as these:

CREATE TABLE People (
    PRIMARY KEY(id),
    id             INTEGER NOT NULL,
    first_name     TEXT    NOT NULL,
    last_name      TEXT,
    year_of_birth  INTEGER
)
CREATE TABLE ParentChild (
    PRIMARY KEY(parent, child),
    parent INTEGER,
    child  INTEGER,
    FOREIGN KEY parent REFERENCES People(id),
    FOREIGN KEY child  REFERENCES People(id)
)

Populate the database however you see fit, but it must have at least 15 people and at least five generations represented (so there must be at least one pair of people such that one is the great-great-grandparent of the other. Not every pair of people in the table must be related. The information about the people can be entirely fictional.

Implement code to disambiguate between people with the same first name. For instance:

Enter the person's first name: Althea

There is more than one person named "Althea" in the database.
Enter the number of the person you meant:

    1. Althea Atlantis  (no birth year on record)
    2. Althea Gibson    born in 1927
    3. Althea           born in 1943
    4. Althea Reinhardt born in 1996

For simplicity, you can assume that the combination of a person’s first name, last name and year of birth are enough to determine that person’s internal id number uniquely, though you need not enforce that in your code nor in your schema.

Implement code to support queries of this form:

find the youngest grandchild of a given person (or indicate if they do not have any grandchildren at this time)
list all ancestors of a given person

Implement at least one other kind of query that you find worthy. Document what the query does (e.g., “finds all second cousins of a person”) in your readme file and explain why you think it is a query worth implementing. In particular, aim for a query that would not be easy to answer in a single SQL query, but that is not too complicated in Python.

Include a basic REPL to test your code. Your REPL should not expose internal id numbers to the user.

Optional challenge: add the ability for the user to specify two people and determine whether they are related.

Flight of fancy

Consider the following “flat” representation of flight schedules:

airline hub   flight#   plane date  depart etd  arrival eta        ...
AA      ORD   13A       737   2/26  ORD    8am  LGA     11am       ...
DA      ATL   571D      767   2/26  JFK    9am  ATL     10:30am    ...
...

We have discussed in class why such flat representations are less than ideal. Design your own schema for flights. Keep it simple, but at a minimum it must include:

two tables
a foreign key constraint
for a given flight, at least: the airline, the departure city and the arrival city (either explicitly or through the relationships among tables)

Create your own data. It need not be realistic (though if it corresponds loosely to reality it may be easier for you to make sure your code is working correctly). At a minimum there should be:

at least two airlines
at least five different cities
at least 15 different flights

For simplicity, you can assume that city names (or their airport codes) are unique.

Implement code to support queries of this form:

Given an airline, list all the cities, in alphabetical order, for which that airline has flights departing from.
Given an airline and two cities, indicate whether or not it is possible to fly between those cities (not necessarily nonstop!) using only flights on that airline.

Implement at least one other kind of query that you find worthy. Document what the query does (e.g., “list all airlines that land in all cities”) in your readme file and explain why you think it is a query worth implementing. In particular, aim for a query that would not be easy to answer in a single SQL query, but that is not too complicated in Python.

Include a basic REPL to test your code.

Optional challenge: Given two cities find the sequence of flights that will get you from one city to the other in the least amount of time. Or, if you did not choose to include times in your schema (which is fine), find the the fewest number of flights required to get from the first city to the second.