You are welcome to work in groups of 3 and submit one assignment for the group. Submit your files as a .tar, .zip, or .tgz file on Moodle by Monday, Nov. 23 at 10:45 AM.
In this problem, you will look at the vector addition
for (int i = 0; i < N; i++) { A[i] += B[i]; }
for two arrays A and B of size N.
You should read Section 4.2.1 of CUDA by Example before proceeding.
Submit your answer to this problem in PDF form as part of your tarfile.
Although you will be writing a CUDA program, I don't expect you to have access to CUDA hardware or software tools, and I will be lenient on small CUDA syntax errors. However, your host code (normal C/C++) should be correct and debugged.
If you don't have access to CUDA, surround all of the CUDA-specific parts of your code with
#ifdef CUDA ... C code ... #endif
When I test your code on CUDA-capable hardware, I will define CUDA (and again, it's OK if there are small errors).
You may have already learned how to compute the distance between two points in 2-dimensional space. If point p has components px and py, and point q has components qx and qy, then the distance between p and q is
$$\sqrt{(p_{x}-q_{x})^{2} + (p_{y}-q_{y})^{2}}$$In N-dimensional space, we just tack on terms for the additional components:
$$\sqrt{(p_{1}-q_{1})^{2} + (p_{2}-q_{2})^{2} + \dots + (p_{N}-q_{N})^{2}}$$The easiest way to think about a point in N-dimensional space for very large N is probably not visualizing it physically. Rather, think of each dimension representing some quantity that is independent of all the other dimensions. For example, we could think about each point representing a person's various likes and dislikes: the first dimension might represent how much you like pizza on a scale from 1 to 10, the second dimension might represent your love of Beyoncé on a scale from 1 to 10, and so on for N different preferences. Finding the distance between two points would, in this case, give us a measure of the overall similarity or dissimilarity between two people's likes and dislikes.
You are going to compute the Euclidean distance between two N-dimensional points, where N is at least 1 million. This computation consists of a fully data-parallel phase (computing the square of the difference of the components for each dimension), a reduction (adding together all of these squares), and finally taking the square root of the reduced sum.
Your code should contain the following:
Your host program should make sure that the following happens:
Host result: ______
CUDA result with host reduction: ______
CUDA result with device reduction: ______
For the CUDA kernel that does the second reduction, Mark Harris's CUDA reduction slides are a FANTASTIC resource. You only need to understand the first and second versions of his code. Remember to have a text file explaining any divergence among warps. You don't need to actually minimize divergence, but you should explain whether any exists or not.
Design your code to work for values of N that are both large and small, as well as values that do not evenly divide the number of thread blocks or threads.
Tar or zip these files along with your PDF for Problem 1, and submit on Moodle.