CS 425 Homework 6/7, Fall 2015


Submission instructions and due date

You are welcome to work in groups of 3 and submit one assignment for the group. Submit your files as a .tar, .zip, or .tgz file on Moodle by Wednesday, December 9 at 10:45 AM.

Your submitted files should include, at a minimum:

This assignment is your final project, and it will count as 2 homework grades.

Specification

Your group will write serial and parallel implementations of an algorithm that is NOT embarrassingly parallel. You will need to describe the problem and justify your parallelization choices with performance measurements.

Algorithm choice and serial implementation (35 points)

You are free to choose an algorithm, as long as it is not embarrassingly parallel. "Embarrassingly parallel" in this context means that it is a trivial task to divide work among threads in a way that is load-balanced and minimizes communication among threads. Our usual parallel sum example falls into this category: each thread or process can be statically assigned an equal chunk of the array and can compute a local sum without any dependence on other threads. Furthermore, if all threads are assigned the same number of array elements, they will all take the same amount of time. The final reduction step requires more communication and does not utilize all threads equally, but the problem is still considered embarrassingly parallel.

If you can't think of a problem, I recommend the traveling salesman problem -- see the link for more.

You should clearly describe your algorithm and provide a working, well-written serial implementation. Please cite any references you use. You may copy publicly available code if you document this very carefully, but you should not copy someone else's prose verbatim (unless you clearly mark it as a direct quote) nor paraphrase overly closely.. This portion will be graded as follows:

Parallel implementation and performance results (65 points)

Next, you should parallelize this program using pthreads, C++ threads, OpenMP, and/or MPI. You should use speedup measurements to justify your parallelization choices. Your program should maintain correctness for any number of threads up to 32. The number of threads (or processes) should be a command-line input to your program.

You should insert timing code into your program as appropriate, and present execution time and speedup data in graphical form for 1, 2, 4, 8, and 16 threads for at least 2 appropriate input data sets. Your grade will be assigned as follows:

Sample algorithm: Tree search

The traveling salesman problem is one of the most famous intractable problems in computer science. In the worst case, it cannot be solved in a reasonable amount of time for even relatively small numbers of cities. However, there are some strategies for speeding it up in practice.

Your mission, should you choose to accept it, is to implement a branch-and-bound solution that uses iterative breadth-first tree search. The branch-and-bound algorithm on Wikipedia provides a pretty good description, as do these lecture notes.

I have sample code for a depth-first, serial, recursive version in ~srivoire/cs425/pickup/tree_search/, which consists of three files:

Hints for your version: