Cache Write Policies


Introduction: Cache Reads

So far, we've traced sequences of memory addresses that work as follows, if you'll let me anthropomorphize a little bit:

  1. The processor asks the memory subsystem, "Hey, do you have the data at Address XXX?"
  2. The L1 cache tackles this question first by checking the valid bit and tag of whatever block(s) could possibly contain Address XXX's data.
  3. One of two things will happen:

Throughout this process, we make some sneaky implicit assumptions that are valid for reads but questionable for writes. We will label them Sneaky Assumptions 1 and 2:

Why these assumptions are valid for reads:

Why these assumptions are questionable for writes:

In short, cache writes present both challenges and opportunities that reads don't, which opens up a new set of design decisions.

Design Decision #1: Keeping Track of Modified Data

More wild anthropomorphism ahead...

Imagine you're an L1 cache (although this discussion generalizes to other levels as well). The processor sends you a write request for address XXX, whose data you're already storing (a write hit). As requested, you modify the data in the appropriate L1 cache block.

Oh no! Now your version of the data at Address XXX is inconsistent with the version in subsequent levels of the memory hierarchy (L2, L3, main memory...)! Since you care about preserving correctness, you have only two real options:

Write-Through Implementation Details (naive version)

With write-through, every time you see a store instruction, that means you need to initiate a write to L2. In order to be absolutely sure you're consistent with L2 at all times, you need to wait for this write to complete, which means you need to pay the access time for L2.

What this means is that a write hit actually acts like a miss, since you'll need to access L2 (and possibly other levels too, depending on what L2's write policy is and whether the L2 access is a hit or miss).

This is no fun and a serious drag on performance.

Write-Through Implementation Details (smarter version)

Instead of sitting around until the L2 write has fully completed, you add a little bit of extra storage to L1 called a write buffer. The write buffer's job is to keep track of all the pending updates to L2, so that L1 can move on with its life.

This optimization is possible because those write-through operations don't actually need any information from L2; L1 just needs to be assured that the write will go through.

However, the write buffer is finite -- we're not going to be able to just add more transistors to it if it fills up. If the write buffer does fill up, then, L1 actually will have to stall and wait for some writes to go through.

The bottom line: from a performance perspective, we'll treat a write hit to a write-through cache like a read hit, as long as the write buffer has available space. When the write buffer is full, we'll treat it more like a read miss (since we have to wait to hand the data off to the next level of cache).

Write-Back Implementation Details

As long as we're getting write hits to a particular block, we don't tell L2 anything. Instead, we just set a bit of L1 metadata (the dirty bit -- technical term!) to indicate that this block is now inconsistent with the version in L2.

So everything is fun and games as long as our accesses are hits. The problem is whenever we have a miss -- even if it's a read miss -- and the block that's being replaced is dirty.

Whenever we have a miss to a dirty block and bring in new data, we actually have to make two accesses to L2 (and possibly lower levels):

What this means is that some fraction of our misses -- the ones that overwrite dirty data -- now have this outrageous double miss penalty.

Design Decision #2: Write Misses: Should You Care?

Again, pretend (without loss of generality) that you're an L1 cache. You get a write request from the processor. Your only obligation to the processor is to make sure that the subsequent read requests to this address see the new value rather than the old one. That's it.

If this write request happens to be a hit, you'll handle it according to your write policy (write-back or write-through), as described above. But what if it's a miss?

As long as someone hears about this data, you're not actually obligated to personally make room for it in L1. You can just pass it to the next level without storing it yourself.

(As a side note, it's also possible to refuse to make room for the new data on a read miss. But that requires you to be pretty smart about which reads you want to cache and which reads you want to send to the processor without storing in L1. That's a very interesting question but beyond the scope of this class.)

So you have two basic choices: make room for the new data on a write miss, or don't.

Write-allocate

A write-allocate cache makes room for the new data on a write miss, just like it would on a read miss.

Here's the tricky part: if cache blocks are bigger than the amount of data requested, now you have a dilemma. Do you go ask L2 for the data in the rest of the block (which you don't even need yet!), or not? This leads to yet another design decision:

In this class, I won't ask you about the sizing or performance of no-fetch-on-write caches. I might ask you conceptual questions about them, though.

No-write-allocate

This is just what it sounds like! If you have a write miss in a no-write-allocate cache, you simply notify the next level down (similar to a write-through operation). You don't kick anything out of your own cache.

Generally, write-allocate makes more sense for write-back caches and no-write-allocate makes more sense for write-through caches, but the other combinations are possible too.

Help! Nothing Makes Sense!

Maybe these other sources will help: