Hello World! An Invitation to Computer Science

Laboratory 1: Timing is Everything

Report due the week of February 25 at the start of your group conference that week.

Before proceeding - read this entire document thoroughly!

You should work on this lab with your partner and submit one lab report with both your names on it. You will need to draw graphs for this report. They can be computer generated or hand drawn, just make sure they are legible. Because I am accepting hand-drawn graphs, you and your partner should submit this lab as hard copy at the start of your lab two weeks from now (i.e. either, Tuesday, February 26 or Wednesday, February 27).


Goals


What you need to do

  1. Binary v. sequential search

    Precisely measuring how long it takes for a computer program to execute is more difficult than it might first appear. This is especially true for computations that are relatively fast - for example, those that finish in well less than a second. One technique we often use in such cases is to repeat the same computation over and over, measure the total time elapsed, and then divide by the number of repetitions.

    As an example, download this Python program load it into python. If you type:

      testSequentialSearch(1000, 100)
    
    it will try 100 different worst-case searches (i.e. searches for items not on the list) using sequential search and report back the total time elapsed. If you wish to compute the average time per search, then just divide the result by the number of repetitions, in this case 100. You can also test binary search in this manner:
      testBinarySearch(1000,1000)
    

    1. Your first task on the lab is to find how long a list is required so that the total time elapsed to do 1000 worst-case searches is roughly 1 second. Once you have that, collect data by keeping the second input (the number of repetitions) constant at 1000, and varying the first input (the length of the list) by multiplying it by 2, 3, 4 and 5. Use the average length of a single sequential search for these five list lengths to form your first data set. Record the data in a table like:

      length of listmilliseconds for failed search
      15001.02
      30001.94
      4500...
      6000...
      7500...

      Now draw a graph of your data. How close is to to being a straight-line? Is its shape what you expect? Explain.

    2. Now consider binary search. For the same sized lists, the reported times should be much smaller than for sequential search. However, if you pick too large a list size the amount of time you have to wait for the process to complete may try your patience. If binary search is so fast, why does it seem to take so long here?

    3. In order not to wait too long for the binary search test, choose a list length and a number of repetitions such that: the list length is at least 100,000; the total amount of time you have to wait until the computation is completed is no more than about ten seconds by your watch (or wall clock, but not by the computer's clock); and the total elapsed time reported is roughly one second. Collect data similar to the way you did for sequential search: For the same number of repetitions you just landed upon, collect data for the list length you have chose, and then 2, 3, 4 and 5 times the list length. Produce a table and graph, as above, for the average amount of time per search.

    4. Collect additional data for lists 8, 16 and 32 times the original length. Make a second graph just consisting of the original data point and the data for the 2, 4, 8, 16, and 32 multiples. Make a graph for this data, but label the y-axis 0,1,2,3,4,5 for the six different points of data (this is a log-scaled graph). How does its shape compare to the shape of the graph for sequential search? What does this say about the computational complexity of binary search?

  2. Comparing and contrasting sorting algorithms

    Experiment with the supplied software that illustrates the workings of a variety of sorting algorithms. (There are some quirks with the software, so make sure you are careful when collecting data.)

    1. Generate timing data for the following sorting algorithms: bubble sort, insertion sort, merge sort, quicksort, and selection sort. For each algorithm, collect data for at least five different values of n (the length of the list being sorted). In each case, choose the smallest value of n such that it requires at least half a second for the sort to complete. For each value of n and each algorithm, you should collect the data three times; record all three values and use the average of the values in your graph. Present your timing data in a table or series of tables. Draw graphs of your timing data. Ideally these should be presented in one graph for easy comparison. However, the important thing is that the graphs should be clear and consistent whether you use one figure or five.

    2. For the largest value of n that you used to record the data, rerun the tests with the list already sorted. How much difference does that seem to make? For which algorithms does it make the most difference? How does this illustrate the principles of best- v. average-case complexity? For the algorithm(s) where the difference is most pronounced try to explain what it is about the algorithm(s) that makes for the discrepancy.

    3. Which of the sorting algorithms are quadratic (O(n2))? Which are faster - and are the faster ones linear or somewhere in between?

    4. What kinds of tests can you run that might call into question the accuracy of the timer? Explain.

  3. Sorting in Python

    Python includes an operation for sorting a list as exemplified in this example. Download it and load it into Python.

    Experiment with it by typing something like:
      timeSort(1000)
    
    which will report how long it takes to sort a randomly-generated list of the specified size (in this case, 1000 items).
    1. Use a similar data-collection technique as above: choose five values of n (the length of the list), where the minimum time is at least half a second; get three times for each value of n, take the average and produce a table and graph of your data.

    2. How do your results for this sort procedure compare to your results from the previous set of problems (comparing the five sorting algorithms)? Is it a fair comparison? Explain.

    3. Based on the shape of the graphs and the timing data you collect, which of the algorithms from the first section are candidates to be "underneath the hood" of Python's built-in sort procedure?

  4. Primality testing

    Recall that a number is prime if it is divisible only by 1 and itself. (For example, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 are all prime; none of the other numbers less than 30 are.) Being able to efficiently determine whether or not a number is prime turns out to be of crucial importance in the world of cryptology (code making and code breaking) and, therefore, to Internet security. Here is a Python program that tests if a number is prime and reports how long it takes to conduct the test. Download it and load it into Python. When you run the program, it asks for a value and returns two pieces of information: first whether or not the supplied number is prime; and second how many microseconds (millionths of a second) it took to compute that fact.

    1. Experiment with the prime-testing program. Based on the algorithm and the timing information you generate, for which kinds of input do you witness best-case behavior? For which do you witness worst-case behavior? Explain.

    2. For a collection of at least five values greater than 10,000 and each separated by at least 1000, where for each value the algorithm seems to exhibit its worst-case behavior, collect data in a table and draw a graph of the value vs. the time required to test for primality.

    3. What kind of function do you think accurately describes the worst-case complexity of this test?

  5. Pizza toppings

    1. It is a cold and dreary day outside and you have been walking for hours. You wonder into Peano's Pizzeria and take a list of the toppings available for the day: anchovy, pepperoni, spinach. How many different kinds of pizza can you order? Assume you can have anything from no toppings to a combination of all three. List all the possibilities.

    2. Think about this problem if you add another topping, say meatball. Now how many different ways? Add pineapple to the list, now how many combinations can you form?

    3. If you were to draw a graph (you don't have to, just think about it) - with the number of toppings on the x-axis and the total number of combinations that you can choose from those toppings on the y-axis, what shape would the graph take? What mathematical function captures the growth rate of this function?

    4. Your buddy Alfonso Gorithm is a very big fan of Peano's Pizza. He looks at the takeout menu and sees that they offer 20 different toppings! Al decides to embark on a quest to eat one whole pizza for each combinations of toppings that Peano's takeout menu offers. If he eats one pizza per day, roughly how many days will it take for him to work his way through all his options?

    Extra credit: write pseudocode for an algorithm that given a list of n toppings, prints out every possible combination of toppings. (Warning: this is quite challenging!)