CS 425 Homework 1, Fall 2015

Due Wednesday, September 16 at 10:45 AM. No late submissions accepted.

Instructions

Complete the following problems. In order to receive credit, you must show enough work to make it clear how you arrived at your answer. Where applicable, you may write programs to solve these problems; in that case, include your source code and program output in your writeup.

You may work in groups of up to three students and turn in one solution for the group. Make sure that all group members' names are on the submitted documents.

You may submit your assignment either via hard copy (in class or office hours) or as a PDF or .doc/.docx/.txt via Moodle. Your source code for Problem 2 MUST be submitted via Moodle, even if the rest of your assignment is submitted in hard copy. It is your responsibility to verify that your submitted files are readable. Corrupted or unreadable files will not receive any credit.

Problem 1: Hardware [10 points]

Barlas, Ch. 1, Exercise 1 (p. 26), but skip the last question. You can find the current list of the most powerful supercomputers in the world at Top500.org. Please cite any sources you use to answer these questions.

Problem 2: Scalability analysis: mergesort [35 points]

This problem is based on Barlas, Ch. 1, Exercise 7 (p. 26), with the following changes:

Download the starter code (mergesort_starter.tgz). If you type "make", you will get three executables:
1. gensort.x: Creates your input data file. See the code for how to run it.
2. checksort.x: Verifies your output data file. Again, see the code.
3. mergesort.x: Based on mergesort.cpp, which is currently empty. All of your code should go into mergesort.cpp.
Your mergesort implementation should take the names of the input and output files as command-line arguments. Please feel free to borrow liberally from the provided gensort.cpp and checksort.cpp files for inspiration. For full credit, your program should exit with error if either of the two files can't be opened.
The first number in the input (or output) file should be the number of values that follow. I strongly recommend that you use this number to dynamically allocate an array of unsigned int to store the input integers.
Your implementation of the mergesort algorithm does not have to be original, and it can be either a top-down or a bottom-up implementation. It does not have to be an in-place sort. If your code is not original, you must cite your source in a comment.
Instead of gprof or valgrind as suggested in the textbook, you should take your timing estimates using the C++ chrono library and its highest-resolution clock.
Be sure to experiment with a variety of dataset sizes. Start with a very small number (like 50) and scale it up until either alpha stabilizes or you are unable to allocate enough space for the array(s). Generate two graphs: both with the dataset size on the x-axis (possibly on a log scale), and one each with alpha and the maximum possible speedup on the y-axis.
The starter code works on the CS department (cwolf) server, as should your code.

Please submit your mergesort.cpp as a separate file from your main writeup on Moodle.

Problem 3: Task decomposition analysis: quicksort [25 points]

This problem is based on Barlas, Ch. 1, Exercise 5 (p. 54), with the following changes:

Use [3, 4, 15, 11, 2, 0, 12, 9] as the input array.

Use this QuickSort pseudocode:

quicksort(input, low, high)
  if low < high
    pos = partition(input, low, high)
    Task t = new Task(quicksort(input, low, pos - 1))
    t.run()
    quicksort(input, pos + 1, high)
    waitUntilDone(t)

Use the Lomuto partition scheme (pseudocode on Wikipedia), in which the pivot is the final element of the array.
Note that t.run() is non-blocking, meaning that the rest of the code continues running in parallel with the newly generated task. The line waitUntilDone(t) will halt the current thread until task t finishes (i.e. exits its final Quicksort call)
You don't necessarily have to do a Gantt chart, but you should produce some visualization of the different calls to Quicksort and the tasks that generate them (some tasks will generate more than one recursive call), as well as the timing and dependencies among them. A sample node in your diagram might be:
```
Quicksort([3, 4, 15, 11, 2, 0, 12, 9], 0, 7) // generated by Task 1
```
Calculate an upper bound on speedup, and state your assumptions.

Problem 4: Performance metrics [30 points]

For all parts of this problem, show your work, and use one of the Amdahl's or Gustafson's Law formulations in Section 1.5.

[10 points] Barlas Chapter 1, Exercise 4 (p. 26).
[10 points] Barlas Chapter 1, Exercise 6.
[10 points] Barlas Chapter 1, Exercise 8.