CS 351 Homework 1/2

Links: [Course Home] [Schedule] [Resources] [Study Guide] [Moodle]

Due Sunday, September 24 at 5:00 PM. No late submissions accepted. Solutions will be posted to Moodle by Monday at 2 PM. This assignment is worth 200 points.


Complete the following problems. In order to receive credit, you must show enough work to make it clear how you arrived at your answer. If applicable, you may write programs to solve these problems; in that case, include your source code and program output in your writeup.

You should submit your assignment as a single PDF via Moodle. It is OK to submit a PDF consisting of photos of handwritten pages, as long as your work is clearly legible and the lighting is reasonable. An app like Genius Scan can help with this. Corrupted or unreadable files will not receive any credit, and submissions in formats other than PDF may lose points.

You may work in groups of up to three students. Please refer to the course collaboration policy.

Submitting the assignment: Please have just one group member submit the assignment, and put all group members' names in the submitted document. Because Moodle has no idea who your group members are, the assignment will show up as missing/unsubmitted for the other group members until it's graded. This is annoying but harmless.

Rules for writing assembly

Problem 1: Processor design background [30 points]

Read this article:

Max Chafkin and Ian King, "How Intel Makes a Chip," Bloomberg News, June 9, 2016.

A PDF is also posted to Moodle.

Yes, actually read it. It is the most painless (least painful?) distillation of a lot of helpful background information about processor fabrication and the processor industry. The perspective you gain from it will pay off in this class.

  1. [10 points] Explain why it is so important to test processors thoroughly, citing information from the article.
  2. [10 points] Show a graph of transistor count on a chip vs. year, covering at least the years 1980 through 2010, and cite your source. How does this relate to the transistor feature sizes (65 nm, etc.) discussed in the article?
  3. [10 points] Transistors are the building blocks of processors. According to the article, what are some examples of structures on chip that consist of transistors?

Problem 2: Latency and throughput [20 points]

Refer to (but don't read all of!) this article:

Lee et al., "Implementing Linearizability at Large Scale and Low Latency," in Proceedings of the 25th Symposium on Operating Systems Principles (SOSP), 2015.

Read only the captions under the figures. Pick one figure that illustrates something about latency, and one that illustrates something about throughput. For each graph, explain:

Problem 3: Power, performance, and energy [45 points]

Intel multicore processors have a feature called Turbo Boost, in which some of the cores can run faster than their rated speed if the overall processor is well below its power and thermal limits. To oversimplify a bit, some cores can run faster if other cores are inactive.

This problem concerns a hypothetical 4-core processor, based loosely on the Intel Core i7-4712HQ, a mobile processor which was first launched in 2014. This hypothetical processor has the following characteristics:

# active coresFrequency of active coresTotal static powerTotal dynamic power (max.)Supply voltage
42.26 GHz18 W42 W1.00 V
23.06 GHz20 W?1.25 V
13.20 GHz23 W?1.52 V

Make the following overly simplistic assumptions:

You have a CPU-bound program that takes 10 seconds to run on a single core with Turbo Boost enabled. A single routine that can be easily parallelized over multiple cores takes up 50 5 of those seconds. Assume that you can get perfect linear speedup from parallelizing this routine; i.e. splitting it across 2 cores would result in it taking 25 2.5 seconds.

  1. [10 points] How long will your program take to run when the parallel portion is split across 2 cores? 4 cores? Assume that you can run the sequential portion of the program with just 1 core at the 3.2 GHz clock speed.
  2. [10 points] What is the peak power consumption of this processor when it runs in 2-core mode? What about 1-core mode?
  3. [15 points] What is the total energy consumption of this processor when it runs your program with 1 core? What about when the parallel portion is split across 2 or 4 cores? Which is best?
  4. [10 points] Assume that the rest of your computer (motherboard, disks, memory, etc.) uses a constant amount of power. Is it possible for the most energy-efficient configuration to change when the rest of the system's power is taken into account? If so, which values of this constant would lead to a different result (i.e. "if the rest of the system uses between 50 and 75 watts, the most energy-efficient configuration changes to X")?

Problem 4: Arithmetic and logical instructions [15 points]

Write a sequence of LEGv8 instructions that is equivalent to the following C code, where a, b, and c are unsigned integers. Assume that a is in $s0 X19, b is in $s1 X20, and c is in $s2 X21. For this problem, use only the arithmetic and logical instructions covered in class; do not use branches, jumps, or division/mod operations.

a += b + 2 - c;

Problem 5: Machine code [15 points]

Show how the following sequence of LEGv8 instructions is represented in 32-bit binary machine code. Label the fields of each instruction.

    CBZ X10, exit
    ADD X1, X2, X3
    STUR X10, [SP, 8]

Problem 6: Control flow and memory operations [30 points]

Assume that a 500-element signed integer array (call it A) is currently at the top of the stack. Write assembly code to make room on the stack for a new 2-element integer array (call it B).
The elements of B should be as follows:

Problem 7: Functions and the call stack [45 points]

  1. [15 points] Write a function in LEGv8 assembly that takes an integer x as input and returns x % 8 (the integer remainder when x is divided by 8). You should not use the multiply or divide instructions.
  2. [30 points] Translate the following C function, which copies only the alphabetical characters from one string to another and returns the length of the new string, to LEGv8 assembly.

    The isalpha function is part of the cctype library, which you can assume has been included.
    int copy_alpha(char* dst_ptr, char* src_ptr) {
        /* Arguments: 
         * - dst_ptr should point to the memory where the string is to be copied
         *   (the programmer needs to have allocated enough memory before calling this function)
         * - src_ptr should point to a null-terminated string
         * What this function does:
         * - Copies ONLY alphabetical characters from the source string to the destination string.
         * - Null-terminates the destination string
         * - Returns the length of the destination string
        char* dst_start = dst_ptr;   // copy dst_ptr so we can track string length
        while (*src_ptr) {
            if (isalpha(*src_ptr)) {
                *dst_ptr = *src_ptr; // copy the character
                dst_ptr++;           // advance the destination pointer
            src_ptr++; // advance the source pointer
        *dst_ptr = 0; // add the null terminator
        return (dst_ptr - dst_start); // length = distance from null terminator to 1st character