So far, we've traced sequences of memory addresses that work as follows, if you'll let me anthropomorphize a little bit:
Miss: If the L1 determines that it doesn't have Address XXX's data, then it starts talking to the next level of the hierarchy (probably an L2 cache). The conversation between the L1 and L2 looks a lot like the conversation between the processor and the L1 we've outlined so far.
But eventually, the data makes its way from some other level of the hierarchy to both the processor that requested it and the L1 cache. The L1 cache then stores the new data, possibly replacing some old data in that cache block, on the hypothesis that temporal locality is king and the new data is more likely to be accessed soon than the old data was.
Throughout this process, we make some sneaky implicit assumptions that are valid for reads but questionable for writes. We will label them Sneaky Assumptions 1 and 2:
Sneaky assumption 1: It's OK to unceremoniously replace old data in a cache, since we know there is a copy somewhere else further down the hierarchy (main memory, if nowhere else).
Sneaky assumption 2: If the access is a miss, we absolutely need to go get that data from another level of the hierarchy before our program can proceed.
Why these assumptions are valid for reads:
Sneaky assumption 1: Bringing data into the L1 (or L2, or whatever) just means making a copy of the version in main memory. If we lose this copy, we still have the data somewhere.
Sneaky assumption 2: If the request is a load, the processor has asked the memory subsystem for some data. In order to fulfill this request, the memory subsystem absolutely must go chase that data down, wherever it is, and bring it back to the processor.
Why these assumptions are questionable for writes:
Sneaky assumption 1: Let's think about the data that's being replaced (the technical term is evicted) when we bring in the new data. If some of the accesses to the old data were writes, it's at least possible that the version of the old data in our cache is inconsistent with the versions in lower levels of the hierarchy. We would want to be sure that the lower levels know about the changes we made to the data in our cache before just overwriting that block with other stuff.
Sneaky assumption 2: If the request is a store, the processor is just asking the memory subsystem to keep track of something -- it doesn't need any information back from the memory subsystem. So the memory subsystem has a lot more latitude in how to handle write misses than read misses.
In short, cache writes present both challenges and opportunities that reads don't, which opens up a new set of design decisions.
More wild anthropomorphism ahead...
Imagine you're an L1 cache (although this discussion generalizes to other levels as well). The processor sends you a write request for address XXX, whose data you're already storing (a write hit). As requested, you modify the data in the appropriate L1 cache block.
Oh no! Now your version of the data at Address XXX is inconsistent with the version in subsequent levels of the memory hierarchy (L2, L3, main memory...)! Since you care about preserving correctness, you have only two real options:
Option 1: Write-through. You and L2 are soulmates. Inconsistency with L2 is intolerable to you. You feel uncomfortable when you and L2 disagree about important issues like the data at Address XXX. To deal with this discomfort, you immediately tell L2 about this new version of the data.
Option 2: Write-back. You have a more hands-off relationship with L2. Your discussions are on a need-to-know basis. You quietly keep track of the fact that you have modified this block. If you ever need to evict the block, that's when you'll finally tell L2 what's up.
With write-through, every time you see a store instruction, that means you need to initiate a write to L2. In order to be absolutely sure you're consistent with L2 at all times, you need to wait for this write to complete, which means you need to pay the access time for L2.
What this means is that a write hit actually acts like a miss, since you'll need to access L2 (and possibly other levels too, depending on what L2's write policy is and whether the L2 access is a hit or miss).
This is no fun and a serious drag on performance.
Instead of sitting around until the L2 write has fully completed, you add a little bit of extra storage to L1 called a write buffer. The write buffer's job is to keep track of all the pending updates to L2, so that L1 can move on with its life.
This optimization is possible because those write-through operations don't actually need any information from L2; L1 just needs to be assured that the write will go through.
However, the write buffer is finite -- we're not going to be able to just add more transistors to it if it fills up. If the write buffer does fill up, then, L1 actually will have to stall and wait for some writes to go through.
The bottom line: from a performance perspective, we'll treat a write hit to a write-through cache like a read hit, as long as the write buffer has available space. When the write buffer is full, we'll treat it more like a read miss (since we have to wait to hand the data off to the next level of cache).
As long as we're getting write hits to a particular block, we don't tell L2 anything. Instead, we just set a bit of L1 metadata (the dirty bit -- technical term!) to indicate that this block is now inconsistent with the version in L2.
So everything is fun and games as long as our accesses are hits. The problem is whenever we have a miss -- even if it's a read miss -- and the block that's being replaced is dirty.
Whenever we have a miss to a dirty block and bring in new data, we actually have to make two accesses to L2 (and possibly lower levels):
What this means is that some fraction of our misses -- the ones that overwrite dirty data -- now have this outrageous double miss penalty.
Again, pretend (without loss of generality) that you're an L1 cache. You get a write request from the processor. Your only obligation to the processor is to make sure that the subsequent read requests to this address see the new value rather than the old one. That's it.
If this write request happens to be a hit, you'll handle it according to your write policy (write-back or write-through), as described above. But what if it's a miss?
As long as someone hears about this data, you're not actually obligated to personally make room for it in L1. You can just pass it to the next level without storing it yourself.
(As a side note, it's also possible to refuse to make room for the new data on a read miss. But that requires you to be pretty smart about which reads you want to cache and which reads you want to send to the processor without storing in L1. That's a very interesting question but beyond the scope of this class.)
So you have two basic choices: make room for the new data on a write miss, or don't.
A write-allocate cache makes room for the new data on a write miss, just like it would on a read miss.
Here's the tricky part: if cache blocks are bigger than the amount of data requested, now you have a dilemma. Do you go ask L2 for the data in the rest of the block (which you don't even need yet!), or not? This leads to yet another design decision:
In this class, I won't ask you about the sizing or performance of no-fetch-on-write caches. I might ask you conceptual questions about them, though.
This is just what it sounds like! If you have a write miss in a no-write-allocate cache, you simply notify the next level down (similar to a write-through operation). You don't kick anything out of your own cache.
Generally, write-allocate makes more sense for write-back caches and no-write-allocate makes more sense for write-through caches, but the other combinations are possible too.
Maybe these other sources will help: