CS 425 Homework 3, Fall 2015
Summary
You will parallelize and optimize a matrix multiplication program using OpenMP. As always, you may work in groups of up to 3.
Due date
Submit your tarfile by Friday, Oct. 16 at 10:45 AM.
Deliverables
Turn in a tarfile (created with -cvf flags) containing the following:
- The provided stub files (if you've changed them, that's OK)
- Your source file(s)
- A makefile that compiles all 4 executables (see below)
- A writeup in .pdf format.
- A groupmembers.txt file that lists all group members' names (if applicable).
Getting started
Copy these two files into your project directory on cwolf:
- ~srivoire/cs425/pickup/matmult-stub.h
- ~srivoire/cs425/pickup/matmult-stub.cpp
You are responsible for defining the matmult function prototyped matmult-stub.h in your own C++ file(s). I will test your code with my own copies of the two provided files, so your code needs to work with these files as written in order to get credit.
The main program in matmult-stub.cpp does the following:
- Initializes two NxN matrices A and B with random data and a matrix C with zeroes
- Multiplies A and B using your matmult function and checks for correctness
- Multiplies A and B repeatedly without reinitializing C in order to time the performance of your function
- Computes and prints timing information for your function.
To get started, you can base your code on the sequential correctness-checking code from matmult-stub.cpp.
Correctness
You must not sacrifice correctness for parallel performance. Your code must pass the correctness check for:
- Any value of N that does not cause the array-initializing functions in the provided code to segfault
- Any value of P up to 64. You may assume that P is less than N.
Optimizations
You may optimize your code either for cwolf or for one of the machines in Darwin 25 or 28. You should specify which one you chose in your writeup. Note that on Linux, including on cwolf, you can get all kinds of information about the processor by typing
cat /proc/cpuinfo
If you use cwolf, since you do not have exclusive access, you must run all performance measurements three times and report the individual measurements and the average.
Hints for optimizing your code
- You have a choice of three loops to parallelize – which one is best?
- You can control OpenMP’s assignment of iterations to threads with the schedule directive.
- You can conceptually subdivide your matrix into smaller pieces in order to maximize the percentage of memory accesses that are fulfilled by the cache. This will require some research about the cache size of your machine, as well as some algorithmic restructuring. A good online resource is: http://mathworld.wolfram.com/BlockMatrix.html
- You can play with g++’s compiler flags to see if they provide any added benefit.
You must attempt at least 4 optimizations; compiler flags alone can be at most one of them. The correctness of your code and the quality of your optimizations are worth 50% of your assignment grade.
Writeup
Your writeup is worth 50% of your assignment grade and must contain the following information for each optimization:
- A summary of the optimization and why you expected it to improve on the baseline code.
- Performance numbers for N = 100 and 250 and for two different thread counts: 1 thread, and a number of threads of your choice.
- An analysis of the performance numbers. Why do you think you got these results?
Your writeup should also summarize which of your optimizations performed best, and why.
Submission
Submit your tarfile (as described above) on cwolf:
~srivoire/bin/submit 425
and choose L05 for the assignment. Check for your name on the confirmation page (which will be for "Lab 5").
Collaboration and Citation
You are welcome to discuss optimization strategies with each other and others, but the course collaboration policy about not sharing your code with others is in full effect, and your writeup must also be your own work. You should feel free to do research on optimizing matrix multiplication and on OpenMP; however, you should cite your sources in your writeup in order to avoid committing academic misconduct.