In this lab, you will find the Euclidean distance between two points in N-dimensional space using a combination of GPU-based and host-based computation. Along the way, you'll explore more of the CUDA API.
You may have already learned how to compute the distance between two points in 2-dimensional space. If point p has components px and py, and point q has components qx and qy, then the distance between p and q is sqrt[(px-qx)^2 + (py-qy)^2].
In N-dimensional space, we just tack on terms for the additional components:
sqrt[(p1-q1)^2 + (p2-q2)^2 + ... + (pN-qN)^2].
For prettier formatting of these equations, see the Wikipedia article on Euclidean distance.
The easiest way to think about a point in N-dimensional space for very large N is probably not visualizing it physically. Rather, think of each dimension representing some quantity that is independent of all the other dimensions. For example, we could think about each point representing a person's various likes and dislikes: the first dimension might represent how much you like pizza on a scale from 1 to 10, the second dimension might represent how much you like Grey's Anatomy, and so on for N different preferences. Finding the distance between two points would, in this case, give us a measure of the overall similarity or dissimilarity between two people's likes and dislikes.
We are going to compute the Euclidean distance between two N-dimensional points, where N is at least 1 million. This computation consists of a fully data-parallel phase (computing the square of the difference of the components for each dimension), a reduction (adding together all of these squares), and finally taking the square root of the reduced sum. For this assignment, you are required to do the data-parallel phase on the GPU, but you may do the reduction on the host. We will soon find that parallel reduction in CUDA requires more work from the programmer than it does in OpenMP and TBB.
Our server is up and running! It works best if you log into your cwolf account and then ssh to rickroll.cs.sonoma.edu from cwolf. All your cwolf files are visible on rickroll. You may also do this assignment at home if you have a CUDA-capable graphics card; if you do, mention in your initial comment that you worked at home and note the model of your graphics card (e.g. NVidia GeForce 9800 GT).
To compile programs, type nvcc -o outputfile sourcefile.cu
If you run into problems other than syntax errors, the likely culprit is your environment variables. You may need to source your ~/.bash_profile again.
Your file should be called yourlastnameL9.cu and should contain the following:
Check that your code works for values of N that are both large and small, as well as values that do not evenly divide the number of thread blocks or threads.
You should insert code to time your sequential and parallel computations:
In your initial comment, include the following time measurements (and list the value of N you used for them):
Make sure that your file is correctly named and then copy it to ~srivoire/cs385/submit. Wait 2 minutes, and then check that it was correctly submitted by visiting http://rivoire.cs.sonoma.edu/cs385/lab9sub.txt.