Databases

Program 2: Connecting Python to SQL via its CLI

Due date: Friday, March 15 at 10:30am

Instructions


More fun with family trees

Start with a very similar family-tree database schema as to what we have considered in class:

CREATE TABLE People (
  person_id      INTEGER PRIMARY KEY,
  year_of_birth  INTEGER NOT NULL,
  first_name     VARCHAR(30) NOT NULL,
  last_name      VARCHAR(30) NOT NULL
);
CREATE TABLE Children (
  parent INTEGER REFERENCES People(person_id),
  child  INTEGER REFERENCES People(person_id),
  PRIMARY KEY(parent, child)
);

Populate the database however you see fit, but it must have at least 15 people and at least five generations represented (so there must be at least one pair of people such that one is the great-great-grandparent of the other. Not every pair of people in the table must be related. The information about the people can be entirely fictional. (Note the two real changes from the class example: every person must have a last name - it was optional in class - and every person has a year of birth.)

Implement code to disambiguate between people with the same first name. For instance:

Enter the person's first name: Regulus

There is more than one person named "Regulus" in the database.
Enter the number of the person you meant:

    1. Regulus Black born in 1957
    2. Regulus Smith born in 1999
    3. Regulus Black born in 2013

For simplicity, you can assume that the combination of a person's first name, last name and year of birth are enough to determine that person's internal id number uniquely, though you need not enforce that in your code nor in your schema.

Implement code to support queries of this form:

Implement at least one other kind of query that you find worthy. Document what the query does (e.g., "finds all second cousins of a person") in your readme file and explain why you think it was a query worth implementing. In particular, aim for a query that would not be easy to answer in a single SQL query, but that is not too complicated in Python.

Include a basic REPL to test your code. Your REPL should not expose internal id numbers to the user.

Optional challenge: add the ability for the user to specify two people and determine whether they are related.


Flight of fancy

In class we discussed why the following "flat" representation of flight schedules may not be ideal:

airline hub   flight#   plane date  depart etd  arrival eta        ...
AA      ORD   13A       737   2/26  ORD    8am  LGA     11am       ...
DA      ATL   571D      767   2/26  JFK    9am  ATL     10:30am    ...
...

Design your own schema for flights. Keep it simple, but at a minimum it must include:

Create your own data. It need not be realistic in any way. However, at a minimum there should be:

For simplicity, you can assume that city names (or their airport codes) are unique.

Implement code to support queries of this form:

Implement at least one other kind of query that you find worthy. Document what the query does (e.g., "list all airlines that land in all cities") in your readme file and explain why you think it was a query worth implementing. In particular, aim for a query that would not be easy to answer in a single SQL query, but that is not too complicated in Python.

Include a basic REPL to test your code.

Optional challenge: Given two cities find the sequence of flights that will get you from one city to the other in the least amount of time. Or, if you did not choose to include times in your schema (which is fine), find the the fewest number of flights required to get from the first city to the second.