Databases

Programming Assignment:

Python-SQL interaction

Due date: Friday, March 10 at 5pm

Instructions


Fun with family trees

Create a family-tree database schema with at least two tables such as these:

CREATE TABLE People (
    PRIMARY KEY(id),
    id             INTEGER NOT NULL,
    first_name     TEXT    NOT NULL,
    last_name      TEXT,
    year_of_birth  INTEGER
)
CREATE TABLE ParentChild (
    PRIMARY KEY(parent, child),
    parent INTEGER,
    child  INTEGER,
    FOREIGN KEY parent REFERENCES People(id),
    FOREIGN KEY child  REFERENCES People(id)
)

Populate the database however you see fit, but it must have at least 15 people and at least five generations represented (so there must be at least one pair of people such that one is the great-great-grandparent of the other. Not every pair of people in the table must be related. The information about the people can be entirely fictional.

Implement code to disambiguate between people with the same first name. For instance:

Enter the person's first name: Althea

There is more than one person named "Althea" in the database.
Enter the number of the person you meant:

    1. Althea Atlantis  (no birth year on record)
    2. Althea Gibson    born in 1927
    3. Althea           born in 1943
    4. Althea Reinhardt born in 1996

For simplicity, you can assume that the combination of a person’s first name, last name and year of birth are enough to determine that person’s internal id number uniquely, though you need not enforce that in your code nor in your schema.

Implement code to support queries of this form:

Implement at least one other kind of query that you find worthy. Document what the query does (e.g., “finds all second cousins of a person”) in your readme file and explain why you think it is a query worth implementing. In particular, aim for a query that would not be easy to answer in a single SQL query, but that is not too complicated in Python.

Include a basic REPL to test your code. Your REPL should not expose internal id numbers to the user.

Optional challenge: add the ability for the user to specify two people and determine whether they are related.


Flight of fancy

Consider the following “flat” representation of flight schedules:

airline hub   flight#   plane date  depart etd  arrival eta        ...
AA      ORD   13A       737   2/26  ORD    8am  LGA     11am       ...
DA      ATL   571D      767   2/26  JFK    9am  ATL     10:30am    ...
...

We have discussed in class why such flat representations are less than ideal. Design your own schema for flights. Keep it simple, but at a minimum it must include:

Create your own data. It need not be realistic (though if it corresponds loosely to reality it may be easier for you to make sure your code is working correctly). At a minimum there should be:

For simplicity, you can assume that city names (or their airport codes) are unique.

Implement code to support queries of this form:

Implement at least one other kind of query that you find worthy. Document what the query does (e.g., “list all airlines that land in all cities”) in your readme file and explain why you think it is a query worth implementing. In particular, aim for a query that would not be easy to answer in a single SQL query, but that is not too complicated in Python.

Include a basic REPL to test your code.

Optional challenge: Given two cities find the sequence of flights that will get you from one city to the other in the least amount of time. Or, if you did not choose to include times in your schema (which is fine), find the the fewest number of flights required to get from the first city to the second.