1 ECE 587 Advanced Computer Architecture I Chapter 10 Cache Coherence in Multiprocessor Architecture...

78
1 ECE 587 Advanced Computer Architecture I Chapter 10 Cache Coherence in Multiprocessor Architecture Herbert G. Mayer, PSU Herbert G. Mayer, PSU Status 7/15/2015 Status 7/15/2015

Transcript of 1 ECE 587 Advanced Computer Architecture I Chapter 10 Cache Coherence in Multiprocessor Architecture...

1

ECE 587Advanced Computer Architecture

I

Chapter 10Cache Coherence in

Multiprocessor ArchitectureHerbert G. Mayer, PSUHerbert G. Mayer, PSU

Status 7/15/2015Status 7/15/2015

2

Syllabus DefinitionsDefinitions write-oncewrite-once Policy and the Policy and the MESIMESI Protocol Protocol MOESI ExtensionMOESI Extension MESIF ExtensionMESIF Extension Life and Fate of a Cache Line: 7 Life and Fate of a Cache Line: 7 MESIMESI Scenarios Scenarios Scenario 1: A reads line, then B reads same lineScenario 1: A reads line, then B reads same line Scenario 2: A writes line once, then B reads same lineScenario 2: A writes line once, then B reads same line Scenario 3: A writes line multiple times, then B reads Scenario 3: A writes line multiple times, then B reads

same linesame line Scenario 4: A reads line, then B writes same line w/o Scenario 4: A reads line, then B writes same line w/o

readingreading Scenario 5: A reads + writes line, then B writes same lineScenario 5: A reads + writes line, then B writes same line Scenario 6: A writes line repeatedly, then B writes w/o Scenario 6: A writes line repeatedly, then B writes w/o

readingreading Scenario 7: A and B have read the same line, then B writes Scenario 7: A and B have read the same line, then B writes

to itto it Differences in Pentium Pro processor L2 cache managementDifferences in Pentium Pro processor L2 cache management BibliographyBibliography

3

Goals and Context We discuss We discuss shared-memory MP shared-memory MP systems, in which systems, in which

each processor has a each processor has a second levelsecond level (L2) cache in (L2) cache in addition to an internaladdition to an internal first level first level (L1) cache (L1) cache

Many discussions are specific to the cache Many discussions are specific to the cache implementation on the Intelimplementation on the Intel®® Pentium Pentium®® processor processor family, with focus on the family, with focus on the MESIMESI protocol protocol

Reference point is a text by MindShare Inc., Reference point is a text by MindShare Inc., literature reference [1]literature reference [1]

The 7 case studies are taken from [1]The 7 case studies are taken from [1]

MESIMESI is used on Pentium Pro processor; stands is used on Pentium Pro processor; stands for: MESI = for: MESI = mmodified, odified, eexclusive, xclusive, sshared, hared, iinvalidnvalid

4

Cache Coherence Problem

The problem addressed and solved with the The problem addressed and solved with the MESIMESI protocol is the protocol is the coherence of memory coherence of memory and cachesand caches on shared-memory MP systemson shared-memory MP systems

Even in a UP system, in which the Even in a UP system, in which the processor has a data cache, memory and processor has a data cache, memory and cache must have coherent data; if not, cache must have coherent data; if not, needed is a mechanism to ensure that at needed is a mechanism to ensure that at critical moments data integrity is re-critical moments data integrity is re-establishedestablished

StaleStale memory should be short-term and memory should be short-term and must be handled safelymust be handled safely

5

Cache Coherence Problem

On shared-memory MP architectures this On shared-memory MP architectures this problem of data consistency (cache problem of data consistency (cache coherenccoherence) is magnified by the number of e) is magnified by the number of processors sharing memoryprocessors sharing memory

. . . and by the fact that without an . . . and by the fact that without an additional L2 cache performance could drop additional L2 cache performance could drop noticeablynoticeably

In an MP system with N processors and 2 In an MP system with N processors and 2 levels of cache there can be levels of cache there can be N*2+1N*2+1 copies of copies of the same datathe same data

The +1 stemming from the original copy of The +1 stemming from the original copy of the data in memory. These copies must all be the data in memory. These copies must all be coherentcoherent

6

Definitions

7

DefinitionsDefinitions

Allocate-on-WriteAllocate-on-Write If a store instruction experiences a If a store instruction experiences a

cache miss, and as a result a cache line cache miss, and as a result a cache line is filled, then the is filled, then the allocate-on-writeallocate-on-write cache policy is usedcache policy is used

If the write miss causes the paragraph If the write miss causes the paragraph from memory to be streamed into a data from memory to be streamed into a data cache line, we say the cache uses cache line, we say the cache uses allocate-on-writeallocate-on-write

Pentium processors, for example, do not Pentium processors, for example, do not use use allocate-on-writeallocate-on-write

Antonym: Antonym: write-bywrite-by

8

DefinitionsDefinitionsBack-OffBack-Off If processor If processor P1 P1 issues a store to a data issues a store to a data

address shared with another processor address shared with another processor P2P2, and , and P2 has cached and modified the same data, a P2 has cached and modified the same data, a chance for data inconsistency ariseschance for data inconsistency arises

To avoid this, the cache with the modified P2 To avoid this, the cache with the modified P2 line must line must snoopsnoop for all accesses, read or for all accesses, read or write, to guarantee delivery of the newest datawrite, to guarantee delivery of the newest data

Once the snoop detects the access request from Once the snoop detects the access request from P1, P1 must be prevented from getting ownership P1, P1 must be prevented from getting ownership of the data; this is accomplished by of the data; this is accomplished by temporarily preventing bus accesstemporarily preventing bus access

This bus denial for the sakee of preserving This bus denial for the sakee of preserving data integrity is called data integrity is called back-offback-off

9

DefinitionsDefinitions

Blocking CacheBlocking Cache Let a Let a cache misscache miss result in streaming-in result in streaming-in

of a lineof a line If during that stream-in no more If during that stream-in no more

accesses can be made to this cache until accesses can be made to this cache until the data transfer is complete, then this the data transfer is complete, then this cache is called cache is called blockingblocking

Antonym: Antonym: non-blockingnon-blocking Generally, a Generally, a blocking cacheblocking cache yields lower yields lower

performance than a performance than a non-blockingnon-blocking

10

DefinitionsDefinitions

Bus MasterBus Master Only one of the devices connected to a system Only one of the devices connected to a system

bus has the right to send signals across the bus has the right to send signals across the bus; this ownership is called being the bus; this ownership is called being the bus bus mastermaster

Initially the Memory and IO Controller (Initially the Memory and IO Controller (MIOCMIOC) is ) is the bus master; it also is possible a chipset the bus master; it also is possible a chipset includes a special-purpose includes a special-purpose bus arbiterbus arbiter

Over time, all processors, and for the Over time, all processors, and for the processors their caches, request to become processors their caches, request to become bus bus mastermaster for some number of bus cycles for some number of bus cycles

The The MIOCMIOC can grant this right, yet each of the can grant this right, yet each of the processors (more specifically: its cache) can processors (more specifically: its cache) can request a request a back-offback-off, even if otherwise the right , even if otherwise the right to be bus master would be grantedto be bus master would be granted

11

DefinitionsDefinitions

DirectoryDirectory The collection of all The collection of all tagstags is referred to as the is referred to as the

cache cache directorydirectory

In addition to the directory and the actual data In addition to the directory and the actual data there may be further overhead bits in a data cachethere may be further overhead bits in a data cache

Dirty BitDirty Bit Dirty bit is a data structure associated with a Dirty bit is a data structure associated with a

cache line. This bit expresses whether a write hit cache line. This bit expresses whether a write hit has occurred on a system applying has occurred on a system applying write-backwrite-back

Synonym: Synonym: Modified bitModified bit

There may be further overhead bits in a data cacheThere may be further overhead bits in a data cache

12

DefinitionsDefinitions

InvalidInvalid State in the State in the MESIMESI protocol protocol This This II state state (possibly implemented via (possibly implemented via

special purpose bit) indicates that the special purpose bit) indicates that the associated cache line is associated cache line is invalidinvalid, and , and consequently holds no valid dataconsequently holds no valid data

It is desirable to have It is desirable to have II lines: Allows lines: Allows the stream-in of a paragraph without the stream-in of a paragraph without evicting another cache lineevicting another cache line

Invalid (I) stateInvalid (I) state is always set after a is always set after a system resetsystem reset

13

DefinitionsDefinitions

ExclusiveExclusive State in State in MESIMESI protocol. The protocol. The EE state state

indicates that the current cache is not indicates that the current cache is not aware of any other cache sharing the same aware of any other cache sharing the same information, and that the line is information, and that the line is unmodifiedunmodified

E allows that in the future another line may E allows that in the future another line may contain a copy of the same information, in contain a copy of the same information, in which case the which case the E stateE state must transition to must transition to another stateanother state

Possible that a higher-level cache (L1 for Possible that a higher-level cache (L1 for example viewed from an L2) may actually have example viewed from an L2) may actually have a shared copy of the line in exclusive a shared copy of the line in exclusive state; however that level of sharing is state; however that level of sharing is transparent to other potentially sharing transparent to other potentially sharing agents outside the current processoragents outside the current processor

14

DefinitionsDefinitions

MESIMESI Acronym for Acronym for MModified, odified, EExclusive, xclusive, SSharedhared

and and IInvalidnvalid This is an ancient protocol to ensure This is an ancient protocol to ensure

cache coherence on the family of Pentium cache coherence on the family of Pentium processors. A protocol is necessary, if processors. A protocol is necessary, if multiple processors have copy of common multiple processors have copy of common data with right to modifydata with right to modify

Through the Through the MESI MESI protocol data protocol data coherence coherence is ensured no matter which of the is ensured no matter which of the processors performs writesprocessors performs writes

AKA as AKA as Illinois protocolIllinois protocol due to its due to its origin at the University of Illinois at origin at the University of Illinois at Urbana-ChampaignUrbana-Champaign

15

DefinitionsDefinitions

ModifiedModified State in State in MESIMESI protocol protocol MM state state implies that the cache line implies that the cache line

found by a write hit was found by a write hit was exclusiveexclusive, and , and that the current processor has modified that the current processor has modified the datathe data

The The modified modified state expresses: Currently state expresses: Currently not not shared, exclusivelyshared, exclusively owned data have owned data have been modifiedbeen modified

In a UP system, this is generally In a UP system, this is generally expressed by the expressed by the dirtydirty bitbit

16

DefinitionsDefinitions

ParagraphParagraph Conceptual, aligned, fixed-size area of the Conceptual, aligned, fixed-size area of the

logical address space that can be streamed into logical address space that can be streamed into the cachethe cache

Holding area in the cache of paragraph-size is Holding area in the cache of paragraph-size is called a called a lineline

In addition to the actual In addition to the actual datadata, a line in cache , a line in cache has further information, including the has further information, including the dirtydirty and and validvalid bit (in UP systems), the bit (in UP systems), the tagtag, , LRULRU information, and in MP systems the information, and in MP systems the MESIMESI bits bits

The The MESIMESI M state corresponds to the dirty bit M state corresponds to the dirty bit in a UP systemin a UP system

17

DefinitionsDefinitions

SharedShared State in the State in the MESIMESI protocol protocol SS state state expresses that the hit line is expresses that the hit line is

present in more than one cache. Moreover, present in more than one cache. Moreover, the current cache (with the the current cache (with the shared stateshared state) ) has not modified the line after stream-inhas not modified the line after stream-in

Another cache of the Another cache of the same processor same processor may may be such a sharing agent. For example, in be such a sharing agent. For example, in a two level cache, the L2 cache will hold a two level cache, the L2 cache will hold all data present in the L1 cacheall data present in the L1 cache

Similarly, Similarly, another processor’s another processor’s L2 cache L2 cache may share data with the current may share data with the current processor’s L2 cacheprocessor’s L2 cache

18

DefinitionsDefinitions

Snarfing Snarfing scenario 1: scenario 1: cc has a modified line, has a modified line, aa wants wants to readto read

A A snoopingsnooping cache cache cc detects that another bus agent detects that another bus agent aa wants to read some paragraph into wants to read some paragraph into aa’s’s cache line, of cache line, of which which cc has a modified M copyhas a modified M copy

MM of the MESI protocol of the MESI protocol implies exclusivityimplies exclusivity; no other ; no other cache will have a copy at this timecache will have a copy at this time

Instead of Instead of cc 1.) causing 1.) causing a a to back-off, 2.) thento back-off, 2.) then c c streaming-out the line, 3.) streaming-out the line, 3.) aa streaming-in that streaming-in that written paragraph, and 4.) both written paragraph, and 4.) both aa and and cc ending up in ending up in SS state, state, snarfingsnarfing does the following: does the following:

cc streams-out the line and switches to S, but does streams-out the line and switches to S, but does not cause not cause aa to back-off. Instead, to back-off. Instead, aa reads the line reads the line from the bus during the stream-out process. This from the bus during the stream-out process. This saves a full memory access and saves the back-off saves a full memory access and saves the back-off delaydelay

19

DefinitionsDefinitionsSnarfing Snarfing scenario 2: scenario 2: cc has a modified line, has a modified line, aa wants wants

to writeto write A A snoopingsnooping cache cache cc detects that another bus agent detects that another bus agent aa

wants to stream-in a paragraph due to wants to stream-in a paragraph due to allocate-on-allocate-on-writewrite, of which , of which c c has a modified M copy. Note: has a modified M copy. Note: aa does does not have a copy, but uses allocate-on-write, hence not have a copy, but uses allocate-on-write, hence makes it known that the line will be streamed-in and makes it known that the line will be streamed-in and then modified, once presentthen modified, once present

Instead of Instead of cc causing 1.) causing 1.) a a to back-off, 2.) then to back-off, 2.) then streaming-out the line, 3.) switching to I invalid, streaming-out the line, 3.) switching to I invalid, and 4.) letting and 4.) letting aa stream-in the line and modify it, stream-in the line and modify it, snarfingsnarfing does the following: does the following:

cc streams-out the line, streams-out the line, c switches to Ic switches to I, but does not , but does not cause cause aa to back-off. Instead, to back-off. Instead, aa reads the line reads the line directly from the bus during the stream-out process. directly from the bus during the stream-out process. This saves a complete memory access, and saves the This saves a complete memory access, and saves the back-off delayback-off delay

Now Now aa has the modified copy (modified by has the modified copy (modified by cc), as does ), as does memory, and E is the proper state for memory, and E is the proper state for aa. Now . Now aa can can further modify the line, resulting in a state further modify the line, resulting in a state transition from E to M. transition from E to M. cc no longer holds the line no longer holds the line

20

DefinitionsDefinitionsSnoopingSnooping After a After a lineline write hit write hit in a cache using in a cache using write-backwrite-back, ,

the data in cache and memory are no longer identical. the data in cache and memory are no longer identical. In accordance with the In accordance with the write-backwrite-back policy, memory will policy, memory will be written eventually, but be written eventually, but until then memory is until then memory is stalestale

The modifier (the cache that wrote) must pay attention The modifier (the cache that wrote) must pay attention to other bus masters trying to access the same line. to other bus masters trying to access the same line. If this is detected, action must be taken to ensure If this is detected, action must be taken to ensure data integritydata integrity

This This paying attentionpaying attention is called is called snoopingsnooping. The right . The right action may be forcing a back-off, or action may be forcing a back-off, or snarfingsnarfing, or yet , or yet something else that ensures data coherencesomething else that ensures data coherence

Snooping Snooping starts with the lowest-order cache, here the starts with the lowest-order cache, here the L2 cache. If appropriate, L2 lets L1 L2 cache. If appropriate, L2 lets L1 snoopsnoop for the for the same address, because L1 may have further modified the same address, because L1 may have further modified the lineline

21

DefinitionsDefinitions

SquashingSquashing Starting with a read-miss:Starting with a read-miss:

In a In a non-blockingnon-blocking cache, a subsequent memory access cache, a subsequent memory access may be issued after a read-miss, even if that may be issued after a read-miss, even if that previous miss results in a stream-in that is still previous miss results in a stream-in that is still under wayunder way

That subsequent memory access will be a miss again, That subsequent memory access will be a miss again, which is being queued. Whenever an access references which is being queued. Whenever an access references an address for which a request is already an address for which a request is already outstanding, the duplicate request to stream-in can outstanding, the duplicate request to stream-in can be skippedbe skipped

Not entering this in the queue is called Not entering this in the queue is called squashingsquashing

The second and any further outstanding memory access The second and any further outstanding memory access can be resolved, once the first stream-in results in can be resolved, once the first stream-in results in the line being present in the cachethe line being present in the cache

22

DefinitionsDefinitions

Strong Write OrderStrong Write Order A policy ensuring that memory A policy ensuring that memory writes writes

occur in the same orderoccur in the same order as the store as the store operations in the executing object codeoperations in the executing object code

Antonym: Antonym: Weak orderWeak order The advantage of The advantage of weak orderingweak ordering can be can be

speed gain, allowing a compiler or cache speed gain, allowing a compiler or cache policy to schedule instructions out of policy to schedule instructions out of order; this requires some other policy order; this requires some other policy to ensure data integrityto ensure data integrity

23

DefinitionsDefinitions

Stream-InStream-In The movement of a paragraph from memory The movement of a paragraph from memory

into a cache lineinto a cache line Since line length generally exceeds the Since line length generally exceeds the

bus width (i.e. exceeds the number of bus width (i.e. exceeds the number of bytes that can be move in a single bus bytes that can be move in a single bus transaction), a transaction), a stream-instream-in process process requires multiple bus transactionsrequires multiple bus transactions

It is possible that the byte actually It is possible that the byte actually needed arrives last (or first) in a needed arrives last (or first) in a cache line during a sequence of bus cache line during a sequence of bus transactionstransactions

Antonym: Stream-outAntonym: Stream-out

24

DefinitionsDefinitions

Stream-OutStream-Out The movement of one line of modified The movement of one line of modified

data from cache into a memory paragraphdata from cache into a memory paragraph Antonym: Stream-inAntonym: Stream-in Note that unmodified data don’t need to

be streamed-out from cache to memory; they are already present in memory

25

DefinitionsDefinitions

Weak Write OrderWeak Write Order A memory-write policy allowing (a A memory-write policy allowing (a

compiler or cache) that memory writes compiler or cache) that memory writes may occur in a different order than may occur in a different order than their originating store operationstheir originating store operations

Antonym: Antonym: Strong Write OrderStrong Write Order The advantage of weak ordering is The advantage of weak ordering is

potential speed gainpotential speed gain

26

DefinitionsDefinitions

write-backwrite-back Cache write policy that keeps a line of Cache write policy that keeps a line of

data (a paragraph) in the cache even after data (a paragraph) in the cache even after a writea write

The changed state must be remembered via The changed state must be remembered via the the dirtydirty bit, AKA Modified state bit, AKA Modified state

Memory is temporarily Memory is temporarily stale stale in such a casein such a case Upon retirement, any dirty line must be Upon retirement, any dirty line must be

copied back into memory; called copied back into memory; called write-backwrite-back Advantage: only one stream-out, no matter Advantage: only one stream-out, no matter

how many write hits did occur to that same how many write hits did occur to that same lineline

27

DefinitionsDefinitions

write-bywrite-by Cache write policy, in which the cache Cache write policy, in which the cache

is not accessed on a write miss, even if is not accessed on a write miss, even if there are cache lines in I statethere are cache lines in I state

A cache using A cache using write-bywrite-by “hopes” that soon “hopes” that soon there may be a load, which will result there may be a load, which will result in a miss and then stream-in the in a miss and then stream-in the appropriate line; if not, it was not appropriate line; if not, it was not necessary to stream-in the line in the necessary to stream-in the line in the first placefirst place

Antonym: Antonym: allocate-on-writeallocate-on-write

28

DefinitionsDefinitions

write-oncewrite-once Cache write policy that starts out as Cache write policy that starts out as

write-throughwrite-through and changes to and changes to write-back write-back after the first write hit to a lineafter the first write hit to a line

Typical policy imposed onto a higher Typical policy imposed onto a higher level L1 cache by the L2 cachelevel L1 cache by the L2 cache

Advantage: The L1 cache places no Advantage: The L1 cache places no unnecessary traffic onto the system bus unnecessary traffic onto the system bus upon a cache-write hitupon a cache-write hit

Lower level L2 cache can remember that a Lower level L2 cache can remember that a write has occurred by setting the write has occurred by setting the MESIMESI state to state to modifiedmodified

29

DefinitionsDefinitions

write-throughwrite-through Cache write policy that writes data to Cache write policy that writes data to

memory upon a write hit. Thus, cache and memory upon a write hit. Thus, cache and main memory are in synchmain memory are in synch

Disadvantage: repeated memory access Disadvantage: repeated memory access traffic on the bustraffic on the bus

30

write-once Policyin the

MESI Protocol

31

Introduction to write-once

The The MESIMESI protocol is one implementation protocol is one implementation of enforcing data integrity among caches of enforcing data integrity among caches sharing data; the sharing data; the write-oncewrite-once write policy write policy is a method to keep the protocol is a method to keep the protocol performing efficiently by avoiding performing efficiently by avoiding superfluous data traffic on the system superfluous data traffic on the system busbus

First we’ll discuss the First we’ll discuss the write-oncewrite-once, then , then the MESI protocolthe MESI protocol

We’ll also mention MOESI protocol, and MESIF protocol, but focus is MESI

32

write-once Policy

write-through has the advantage of keeping cache and memory continually consistent. Draw-back is the added traffic placed on the system bus

write-back has the advantage of postponing unnecessary bus traffic until the last possible moment and to do so just once, even if many writes to a cache line occurred. The draw-back is the temporary inconsistency between cache line and memory

To avoid catastrophe, a dirty bit must mark the fact that at least one write happened to an exclusively owned line

write-once combines the advantages of both. For efficiency, multi-level caches generally use write-back for L2. write-once is a refinement used for L1 caches. Both use the MESI protocol to preserve data consistency

33

write-once Policy In write-once, L1 starts out using write-

through, and any line shared with an L2 cache is marked S, for shared. The corresponding copy in L2 is also marked S

If a write hit occurs, the modified data are written through from the L1 to the L2 cache; L2 in turn marks its line as modified, M

This transition is used by L2 to cause L1’s write policy to change from write-through to write-back. Also, the L1 line is marked E, for exclusive

May look strange, but is safe, since the M information is recorded in the L2 cache, the first to initiate snooping

34

write-once Policy Subsequently, when the same processor modifies

the same data further, L1 cache experiences a write hit

This time, however, the L1 cache is in write-back mode, and changes from E to M

L1 does not change the L2 cache again. Further writes keep the L1 state in M; L2 doesn’t see further writes

Of two lines with the same paragraph addresses, the one in the L1 cache is more current than the one in L2

Both record the M state; both may have different data

35

write-once Policy

When another processor issues a read from the same paragraph address, the L2 cache with a modified copy of that line snoops and asks the other to back-off. As a result, the line will be written back into memory

First, however, L2 must check if L1 has modified the data as well

Detectable by the M state of L1: and if so the data are first flushed from L1 to L2, and then to memory

Finally, both L1 and L2 change to S. Otherwise, if the L1 cache has not further modified the data, indicated by E, L1 and L2 are already in synch, only the L2 line needs to be written to memory, and both L1 and L2 transition to S

36

MESI ProtocolDetail

37

MESI Protocol

On a Pentium family processor, each cache On a Pentium family processor, each cache line may have its own write policy, line may have its own write policy, independent of other lines even in the independent of other lines even in the same setsame set

The complete, total state of a cache line The complete, total state of a cache line therefore, is expressed in the therefore, is expressed in the write write policy policy used and the used and the MESIMESI state bits, state bits, associated with each lineassociated with each line

These bits are: M for These bits are: M for modifiedmodified, E for , E for exclusiveexclusive, S for, S for shared shared, and I for , and I for invalidinvalid. Initially a line holds no . Initially a line holds no information, so its state is Iinformation, so its state is I

38

MESI Protocol

39

MESI Protocol

During system reset, the During system reset, the MESI MESI bits for the bits for the Pentium’s L1 and L2 caches are set to IPentium’s L1 and L2 caches are set to I

This marks all lines empty and will force any This marks all lines empty and will force any cache read or write access to cache read or write access to missmiss. At the first . At the first read, lines will be streamed into L2 and a read, lines will be streamed into L2 and a portion of those to be streamed into L1portion of those to be streamed into L1

Since L1 has a copy of what is in L2, L1 is set Since L1 has a copy of what is in L2, L1 is set to to SS and L2 to and L2 to EE

EE holds, as long as no other processor’s cache holds, as long as no other processor’s cache shares the same data. This transition is shown in shares the same data. This transition is shown in the figure belowthe figure below

Note in coming example: the other processor B has Note in coming example: the other processor B has not yet made any data accesses, hence its cache not yet made any data accesses, hence its cache lines remain Ilines remain I

40

MESI Protocol Generic Diagram

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

I

S

I

E

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

I

I

System Bus

Shared with L2,prepare for write-once

no other Processor uses this line, E

no activity, set to I,or invalid, after reset

no activity,invalid, or I

Initial Read of cache line into A after reset, B inactive

41

MESI States The table below explains the 4 The table below explains the 4 MESI MESI states. Note that both states. Note that both

caches have caches have MESIMESI bits, and both use bits, and both use write-backwrite-back most of the most of the time on the Pentium familytime on the Pentium family

MESI State Description

Modified The line has been changed by the cache; a store has taken place. It is implied that the data are exclusive. M alerts the cache to take action on a snoop hit. In that case the line is written back and the state is adjusted, generally to Shared. Alternatively, the cache can snarf. Done on Pentium ® Pro, not on Pentium.

Exclusive The line is owned by just this processor. Except for memory, a shared copy may only exist in a higher order cache of the same processor. For example, L2 may label a line E that exists also in its L1. But no other processor’s cache holds a copy of this same line.

Shared The line is present in at least one other cache, maybe in several. However, all of these are identical copies. No other line with these data has performed a write. Shared implies unmodified.

Invalid The line is not valid in this cache. Typical state after system reset, and thus the line is ready for receiving new data.

42

MESI States

Every cache line will be in one of these 4 Every cache line will be in one of these 4 statesstates

State is influenced by the owning processors’ State is influenced by the owning processors’ action (load and stores) or by a bus snoop, action (load and stores) or by a bus snoop, when another processor addresses the same when another processor addresses the same lineline

The latter is supported by special pins and The latter is supported by special pins and connections (lines) between L2 caches on the connections (lines) between L2 caches on the Pentium processorPentium processor

These lines are HIT# (for: cache hit) and These lines are HIT# (for: cache hit) and HITM# (for: cache hit modified)HITM# (for: cache hit modified)

43

MOESI Extension

MOESI protocol is based on MESI, but offers MOESI protocol is based on MESI, but offers a 5a 5thth state: O for Owned state: O for Owned

OO implies modified and sharedimplies modified and shared; goal of the ; goal of the processor who has modified the line is to processor who has modified the line is to defer the eventual, mandatory write-backdefer the eventual, mandatory write-back

To ensure data integrity with the sharing processor, direct cache to cache data transfer is initiated

So done on AMD64; see detail below from AMD

44

MOESI ExtensionMOESI Extension

OwnedOwned A cache with some line in state A cache with some line in state OO is a sharing cache is a sharing cache

with a valid copy, but has the exclusive right to with a valid copy, but has the exclusive right to modifymodify

It broadcast those changes to all other sharing It broadcast those changes to all other sharing cachescaches

Owned state allows Owned state allows dirty sharing dirty sharing of data, i.e., a of data, i.e., a modified cache block can be moved around various modified cache block can be moved around various caches without updating main memorycaches without updating main memory

Cache line state may be changed from Cache line state may be changed from OO to to MM after after invalidating all shared copies, implying invalidating all shared copies, implying exclusivenessexclusiveness

Or it may transition from Or it may transition from OO to to SS by writing the by writing the modifications back to main memorymodifications back to main memory

A line in O must respond to snoop requestsA line in O must respond to snoop requests

45

MOESI Extension, References

1.1. http://en.wikipedia.org/wiki/MOESI_protocolhttp://en.wikipedia.org/wiki/MOESI_protocol

2.2. http://www.revolvy.com/main/index.php?http://www.revolvy.com/main/index.php?s=MOESI%20protocols=MOESI%20protocol

3.3. http://infocenter.arm.com/help/index.jsp?http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0425/ch03s12s01.htmltopic=/com.arm.doc.dai0425/ch03s12s01.html

46

MESIF Extension The MESIF protocol was developed by Intel for The MESIF protocol was developed by Intel for

NUMA architectures, with the usual 4 states of NUMA architectures, with the usual 4 states of MESI, but one added state MESI, but one added state F,F, for forward for forward

FF is a shared is a shared SS state variation, stating state variation, stating this this cachecache must act as the designated responder for must act as the designated responder for line requestsline requests

Protocol ensures that, if any cache holds a line Protocol ensures that, if any cache holds a line in the S state, at most one other cache holds it in the S state, at most one other cache holds it in Fin F

But multiple others may hold the line in SBut multiple others may hold the line in S

NUMA:NUMA: non-uniform memory access. Memory design on non-uniform memory access. Memory design on MP systems, in which memory access time depends MP systems, in which memory access time depends on location; local memory is fast, other slowon location; local memory is fast, other slow

47

MESIF Extension Detail With the MESI protocol, a cache line request received by With the MESI protocol, a cache line request received by

multiple caches holding an S line is serviced inefficiently. multiple caches holding an S line is serviced inefficiently. It may either be satisfied from slow main memory, or all It may either be satisfied from slow main memory, or all sharing caches could respond, flooding the requestorsharing caches could respond, flooding the requestor

With MESIF, a cache line request is serviced only by the With MESIF, a cache line request is serviced only by the cache in the cache in the FF state. This allows the requestor to receive a state. This allows the requestor to receive a copy at cache-to-cache speeds, like in the MOESI protocol, copy at cache-to-cache speeds, like in the MOESI protocol, while minimizing multicast packetswhile minimizing multicast packets

Because a cache may unilaterally invalidate a line in Because a cache may unilaterally invalidate a line in SS or or F F, , it is possible that no cache has a copy in the it is possible that no cache has a copy in the FF state, even state, even though copies in the though copies in the SS state exist state exist

In that case, a request for the line is resolved by slow In that case, a request for the line is resolved by slow streaming in, not via fast cache-to-cache trafficstreaming in, not via fast cache-to-cache traffic

FF can be viewed as a virtual token being passed around: To can be viewed as a virtual token being passed around: To minimize the chance of an minimize the chance of an FF line being discarded, the most line being discarded, the most recent requestor of a line receives recent requestor of a line receives FF; when a cache in state ; when a cache in state FF responds, it hands over the responds, it hands over the FF token to the new cache, token to the new cache, saving the stream insaving the stream in

48

MESIF Extension Detail

Key difference from the MESI protocol is Key difference from the MESI protocol is that a line streamed in for reading results that a line streamed in for reading results in in FF. The only way to enter the . The only way to enter the SS state is state is to satisfy a read request from another to satisfy a read request from another cachecache

There are other techniques for satisfying There are other techniques for satisfying read requests from shared caches while read requests from shared caches while suppressing redundant replies, but having suppressing redundant replies, but having only a single designated cache respond only a single designated cache respond makes it easier to invalidate all copiesmakes it easier to invalidate all copies

And if only one cache is left, it And if only one cache is left, it transitions to transitions to EE

49

MESIF Extension, References

1.1. http://en.wikipedia.org/wiki/MESIF_protocolhttp://en.wikipedia.org/wiki/MESIF_protocol

2.2. http://www.revolvy.com/main/index.php?http://www.revolvy.com/main/index.php?s=MESIF%20protocol&item_id=3737582s=MESIF%20protocol&item_id=3737582

50

Life & Fate of a Cache Line:7 MESI Scenarios

51

MESI Scenarios

This section shows typical state transitions This section shows typical state transitions of the L1 and L2 caches, each transition of the L1 and L2 caches, each transition characterized by the respective titlecharacterized by the respective title

We ignore MOESI and MESIFWe ignore MOESI and MESIF

A bulleted list explains the initial state, A bulleted list explains the initial state, the figure shows the transition, and a the figure shows the transition, and a trailing bulleted list highlights the key trailing bulleted list highlights the key points as a result of the transitionpoints as a result of the transition

First we show 3 situations, in which First we show 3 situations, in which processor B wishesprocessor B wishes to read a line of which to read a line of which processor A has a copy. Processor A has processor A has a copy. Processor A has performed 0, 1, or more writes to that lineperformed 0, 1, or more writes to that line

52

MESI Scenarios

Table of 3 Scenarios: B Reads some Line That is Also in A Scenario Processor A Status A L1 State A L2 State

1 A holds line from memory but has not modified it.

S E

2 A has written a line once, using write through. E M

3 A has written the same line more than once, uses write back write policy after first write.

M M

53

MESI Scenarios

Next follow 4 situations, in which Next follow 4 situations, in which processor B writes to a lineprocessor B writes to a line of which A of which A has a copyhas a copy

Again, A has modified the line 0, 1 or Again, A has modified the line 0, 1 or more times, and B has read the shared more times, and B has read the shared data in one case before it attempts the data in one case before it attempts the writewrite

Assume here the caches use Assume here the caches use write-bywrite-by, NOT , NOT allocate-on-writeallocate-on-write

54

MESI Scenarios

Table of 4 more Scenarios: B Writes some Line That is Also in A Scenario Processor A Status A L1 State A L2 State

4 A holds a memory paragraph but has not modified it. Then B writes to that same line.

S E

5 A has read a line then writes the same line once, L1 using write through. Then B writes to that same line.

E M

6 A has written some line more than once, L1 uses write back write policy after first write. Then B writes to same address.

M M

7 A and B have read the same paragraph, then B writes to that same line.

S S

55

Scenario 1

Scenario 1: A reads line, then B reads Scenario 1: A reads line, then B reads same linesame line

Initial state and actions taken:Initial state and actions taken:

1.1. A has read a paragraph, placed it into a line, but not A has read a paragraph, placed it into a line, but not modified itmodified it

2.2. A’s L1 is in A’s L1 is in SS state state

3.3. A’s L2 is in A’s L2 is in EE state, i.e. no other processor has copy, state, i.e. no other processor has copy, yet its own L1 cache does have a copy; but that is yet its own L1 cache does have a copy; but that is transparent to other processorstransparent to other processors

4.4. See figure: See figure: “Initial Read of cache line into ...”“Initial Read of cache line into ...”

5.5. B has not read any data at all, thus B’s L1 and L2 are B has not read any data at all, thus B’s L1 and L2 are both in Iboth in I

6.6. B next intends to read the same lineB next intends to read the same line

56

Scenario 1

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

S

S

E

S

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

I

S

I

S

System Bus

remains S: no snoop

snoop detects read byother master

CHIT# asserted

copy in L2

A Reads Line, then B Reads Same Line

A L1 writes through

57

Scenario 1 A’s L2 snoops and detects a A’s L2 snoops and detects a snoop hit on readsnoop hit on read

A’s L2 does not request back-off, since request by A’s L2 does not request back-off, since request by other bus master is for other bus master is for read read and its own state is E, and its own state is E, not Mnot M

A’s L1 is in S state; according to A’s L1 is in S state; according to write-oncewrite-once policy: policy: it stays Sit stays S

A’s A’s L2 transitions from E to SL2 transitions from E to S. It is aware another . It is aware another copy exists soon, in processor Bcopy exists soon, in processor B

The whole line is streamed into B’s L2 and then into The whole line is streamed into B’s L2 and then into L1 cacheL1 cache

B’s L1 transitions from I to SB’s L1 transitions from I to S

B’s L2 transitions from I to S; since A holds a copy B’s L2 transitions from I to S; since A holds a copy of that the same paragraph, B’s L2 state cannot be Eof that the same paragraph, B’s L2 state cannot be E

4 lines have copies of the memory paragraph, none 4 lines have copies of the memory paragraph, none modifiedmodified

58

Scenario 2

Scenario 2: A writes line once, then B reads same Scenario 2: A writes line once, then B reads same lineline

Initial state and actions taken, Initial state and actions taken, snarfing NOT used snarfing NOT used here here yet:yet:

1.1. A has read a line, but not modified itA has read a line, but not modified it

2.2. A’s A’s L1 is in S and L2 in E L1 is in S and L2 in E state, since no other state, since no other processor’s cache has a copyprocessor’s cache has a copy

3.3. B has not read any data at all, thus B’s L1 and L2 are B has not read any data at all, thus B’s L1 and L2 are in Iin I

4.4. A now writes the line; experiencing a write hit!A now writes the line; experiencing a write hit!

5.5. A’s L1 transitions to E, A’s L1 transitions to E, switches to write-back switches to write-back due to due to write-oncewrite-once

6.6. A’s L2 transitions to from E to M; new data are in L2, A’s L2 transitions to from E to M; new data are in L2, not in memory; memory is stalenot in memory; memory is stale

7.7. B now intends to read the same line, is still in I stateB now intends to read the same line, is still in I state

59

Scenario 2

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

E

S

M

S

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

I

S

I

S

System Bus

INV sampled low

snoop detects read byother bus master

CHIT# andCHITM# asserted

copy in L2

A Writes Line Once, then B Reads Same Line

L1 writes back

S

E

60

Scenario 2 B’s L1 and L2 experience read miss, L2 sends read B’s L1 and L2 experience read miss, L2 sends read

request to busrequest to bus

A’s L2 snoops, sees a read snoop hit, and forces B A’s L2 snoops, sees a read snoop hit, and forces B to back-offto back-off

A’s L1 notices due to the E state that it already A’s L1 notices due to the E state that it already has the newest data, and not subsequently modified has the newest data, and not subsequently modified themthem

A’s L1 transitions from E to S, since the L2 cache A’s L1 transitions from E to S, since the L2 cache already has a copyalready has a copy

A’s L2 writes back data to memoryA’s L2 writes back data to memory, transitions , transitions from M to S, releases from M to S, releases back-offback-off

B’s L2 streams in the whole line, transitions from B’s L2 streams in the whole line, transitions from I to SI to S

B’s L1 gets copy of line, transitions from I to SB’s L1 gets copy of line, transitions from I to S

61

Scenario 3

Scenario 3: A reads and writes line multiple Scenario 3: A reads and writes line multiple times, then B reads linetimes, then B reads line

Initial state + actions: Initial state + actions: snarfing NOT used snarfing NOT used here, assuming here, assuming write-by:write-by:

1.1. A’s L1 has read, then written line using write-A’s L1 has read, then written line using write-through, transitions to Ethrough, transitions to E

2.2. A’s L2 transitions to M; the new data are in L2, not A’s L2 transitions to M; the new data are in L2, not in memoryin memory

3.3. A’s L1 changes to write-back due to write-once policyA’s L1 changes to write-back due to write-once policy

4.4. A performs writes again, hits L1 cacheA performs writes again, hits L1 cache

5.5. A’s L1 cache transitions from E to M, A’s L2 cache A’s L1 cache transitions from E to M, A’s L2 cache remains in Mremains in M

6.6. The modified data did not get copied to memory, as L2 The modified data did not get copied to memory, as L2 uses write-back; uses write-back; now 3 different copies of the same now 3 different copies of the same paragraph exist!!!paragraph exist!!!

7.7. B intends to read the same line, is in I state; what B intends to read the same line, is in I state; what happens with caches?happens with caches?

62

Scenario 3

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

M

S

M

S

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

I

S

I

S

System Bus

INV sampled low

L2 snoop detects readby other bus master

CHIT# andCHITM# asserted

copy in L2

A Writes Line Multiple Times, then B Reads Same Line

L1 write back

BOFF# BOFF#

63

Scenario 3 B’s L1 and L2 experience read miss, send read request to B’s L1 and L2 experience read miss, send read request to

busbus

A’s L2 snoops, experiences a read A’s L2 snoops, experiences a read snoop hitsnoop hit, realizes , realizes that B would get stale data, and that B would get stale data, and forces B to back-offforces B to back-off

A’s L1 has the newest data, visible through M stateA’s L1 has the newest data, visible through M state

A’s L1 writes the modified data back into L2A’s L1 writes the modified data back into L2

A’s L2 writes data back to memoryA’s L2 writes data back to memory

A and memory are in synch now; A and memory are in synch now; instead of 3 copies, now instead of 3 copies, now there exists 1!there exists 1!

A’s L1 state transitions from M to S, L2 from M to S, A’s L1 state transitions from M to S, L2 from M to S, releases back-offreleases back-off

B’s L2 streams in the whole line, second try; B’s L2 streams in the whole line, second try; transitions I to Stransitions I to S

B’s L1 gets copy of line, transitions I to S; all 4 are B’s L1 gets copy of line, transitions I to S; all 4 are in state Sin state S

64

Scenario 4

Scenario 4: B writes line that is unmodified Scenario 4: B writes line that is unmodified in Ain A

Initial state and actions taken, Initial state and actions taken, snarfing not used, snarfing not used, assume write-byassume write-by::

1.1. A has read a line from memory, but not modified itA has read a line from memory, but not modified it

2.2. A’s L1 is in S stateA’s L1 is in S state

3.3. A’s L2 is in E mode, as L2 believes no other A’s L2 is in E mode, as L2 believes no other processor has copyprocessor has copy

4.4. B has not read any data at all, thus B’s L1 and L2 B has not read any data at all, thus B’s L1 and L2 are both in Iare both in I

5.5. B next intends to write to that same line; note B next intends to write to that same line; note use of write-by!use of write-by!

65

Scenario 4

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

S

I

E

I

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

I

I

I

I

System Bus

INV samples high

L2 snoop detects writeby other bus master

memory write miss

write miss

B Writes Line of Which A Has Copy

BOFF# BOFF#

CHIT#

HIT#

66

Scenario 4 B’s L1 and L2 experiences data cache B’s L1 and L2 experiences data cache write misswrite miss, L2 , L2

initiates memory write bus cycle on system bus to initiates memory write bus cycle on system bus to update memory; we use update memory; we use write-bywrite-by, not , not allocate-on-allocate-on-writewrite

B’s L1 remains in I, L2 similarly stays in IB’s L1 remains in I, L2 similarly stays in I

A’s L2 detects memory write bus cycle, snoops the A’s L2 detects memory write bus cycle, snoops the address, which hitsaddress, which hits

A’s L2 is in E state, which says that a write in B A’s L2 is in E state, which says that a write in B is about to update memory of which A has an is about to update memory of which A has an exclusive copy; but A no longer has a valid copy, exclusive copy; but A no longer has a valid copy, thus is transitions to Ithus is transitions to I

A’s L1 also transitions to IA’s L1 also transitions to I

Note that Pentium does not use Note that Pentium does not use allocate-on-writeallocate-on-write, , hence all lines end up in state Ihence all lines end up in state I

SnarfingSnarfing could be used for a way better policy; see could be used for a way better policy; see Pentium ProPentium Pro

67

Scenario 5

Scenario 5: A reads + writes line, then B Scenario 5: A reads + writes line, then B writes to the same linewrites to the same line

Initial state and actions taken, snarfing not used, Initial state and actions taken, snarfing not used, assume write-by:assume write-by:

1.1. A has read a line from memory, L1 is S and L2 in E A has read a line from memory, L1 is S and L2 in E statestate

2.2. A writes line: A’s L1 writes through, updates L2A writes line: A’s L1 writes through, updates L2

3.3. A’s L1 transitions to E, switches policy to write-A’s L1 transitions to E, switches policy to write-back, L2 transitions to Mback, L2 transitions to M

4.4. B has not read any data at all, thus B’s L1 and L2 B has not read any data at all, thus B’s L1 and L2 are both in Iare both in I

5.5. B next intends to write to that same line, using B next intends to write to that same line, using write-bywrite-by

68

Scenario 5

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

E

I

M

I

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

I

I

I

I

System Bus

INV samples high

L2 snoop detects writeby other bus master

memory write miss

memory write miss

A Reads + Writes Line then B Writes Same Line

BOFF# BOFF#

CHIT#

HIT#

69

Scenario 5 B’s L1 and L2 experience cache write miss, initiate write to B’s L1 and L2 experience cache write miss, initiate write to

memorymemory B’s L1 transitions remains in I, L2 similarlyB’s L1 transitions remains in I, L2 similarly A’s L2 detects memory write bus cycle, snoops address, which A’s L2 detects memory write bus cycle, snoops address, which

matchesmatches A’s L2 is in M state, which says that write in B would A’s L2 is in M state, which says that write in B would

create stale memorycreate stale memory A’s L2 causes B to back-off from writing, checks if L1 made A’s L2 causes B to back-off from writing, checks if L1 made

further writesfurther writes A’s L1 is not in M; L2 writes back data to memoryA’s L1 is not in M; L2 writes back data to memory A’s L2 transitions from M to I, knowing that B will write A’s L2 transitions from M to I, knowing that B will write

datadata A’s L2 releases back-off, forces L1 to transition from E to A’s L2 releases back-off, forces L1 to transition from E to

I as wellI as well B completes write-by, does not fill cache line, L1 and L2 B completes write-by, does not fill cache line, L1 and L2

end up in Iend up in I Note: If B would use Note: If B would use allocate-on-writeallocate-on-write, it could , it could snarf snarf while while

A writes-back, then modify the A writes-back, then modify the snarfedsnarfed line, mark it as M; A line, mark it as M; A would end up in Iwould end up in I

70

Scenario 6

Scenario 6: A reads then writes line Scenario 6: A reads then writes line repeatedly, B writes same without repeatedly, B writes same without prior readprior read

A has written a line repeatedly, switched A has written a line repeatedly, switched from write-through to write-back; we use from write-through to write-back; we use write-by write policy:write-by write policy:

1.1. A’s L1 is in M state, and L2 is in M mode A’s L1 is in M state, and L2 is in M mode as wellas well

2.2. B next intends to write to that same lineB next intends to write to that same line

71

Scenario 6

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

M

I

M

I

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

I

I

I

I

System Bus

INV samples high

L2 snoop detects writeby other bus master

memory write miss

memory write miss

A Writes Line Repeatedly then B Writes Same Line

BOFF# BOFF#

CHIT#

HIT#

72

Scenario 6 B’s L1 experiences write miss, initiates writeB’s L1 experiences write miss, initiates write B’s L2 similarly experiences cache write miss, initiates write on busB’s L2 similarly experiences cache write miss, initiates write on bus B’s L1 and L2 are both invalid I for that address, remain I due to B’s L1 and L2 are both invalid I for that address, remain I due to

write-bywrite-by A’s L2 detects write bus cycle initiated by B, snooped address matchesA’s L2 detects write bus cycle initiated by B, snooped address matches A’s L2 is in M state, i.e. a write by B would access stale memoryA’s L2 is in M state, i.e. a write by B would access stale memory A’s L2 causes B to back-off writing, checks if L1 made further writesA’s L2 causes B to back-off writing, checks if L1 made further writes A’s L1 is in M state, thus has newer data than L2A’s L1 is in M state, thus has newer data than L2 A’s L1 writes back data to L2, transitions to I, L2 writes to memoryA’s L1 writes back data to L2, transitions to I, L2 writes to memory A’s L2 transitions to I, knowing that B will write data. Note that A’s L2 transitions to I, knowing that B will write data. Note that

memory is now NOT memory is now NOT stalestale after the current write by A, and B has NOT after the current write by A, and B has NOT yet writtenyet written

A’s L2 releases back-off, so B can resume (retry) the write to memoryA’s L2 releases back-off, so B can resume (retry) the write to memory B completes write according to write-by, still has no line in cache, B completes write according to write-by, still has no line in cache,

B’s L1 and L2 end up in IB’s L1 and L2 end up in I Note in some other protocol B could Note in some other protocol B could snarf snarf the line, use allocate-on-the line, use allocate-on-

write. After write. After snarfingsnarfing, B could modify line, transition to M, leave A , B could modify line, transition to M, leave A in I, and memory would be safe due to B’s M statein I, and memory would be safe due to B’s M state

73

Scenario 7

Scenario 7: A and B have read the same Scenario 7: A and B have read the same line, then B writes to lineline, then B writes to line

A and B have read the same paragraph:A and B have read the same paragraph:

1.1. A’s L1 is in state S, and L2 is in S mode A’s L1 is in state S, and L2 is in S mode as well, memory is up to dateas well, memory is up to date

2.2. B’s L1 and L2 are in S, next B intends to B’s L1 and L2 are in S, next B intends to write to that same linewrite to that same line

74

Scenario 7

L1 Cache

ProcessorA

ProcessorA’s L2 Cache

S

I

S

I

L1 Cache

ProcessorB

ProcessorB’s L2 Cache

S

S

S

E

System Bus

INV samples high

L2 snoop detects writeby other bus master

memory write miss

memory write miss

A and B read same line, then B writes to that line

BOFF# BOFF#

CHIT#

HIT#

75

Scenario 7

B’s L2 experiences write hit, and initiates memory write, B’s L2 experiences write hit, and initiates memory write, because line is S; note that L1 would not write-through, if it because line is S; note that L1 would not write-through, if it were in state Ewere in state E

B’s L2 transitions to E, knowing other snooping caches B’s L2 transitions to E, knowing other snooping caches transition to Itransition to I

B’s L2 actually writes, though it generally uses write-back; B’s L2 actually writes, though it generally uses write-back; but it was in S; write-back is only used in state Ebut it was in S; write-back is only used in state E

B’s L1 transitions to S, so that L2 would ”know” of subsequent B’s L1 transitions to S, so that L2 would ”know” of subsequent writeswrites

A’s L2 snoops the address, sees the write by B, transitions to A’s L2 snoops the address, sees the write by B, transitions to II

A’s L2 instructs L1 to snoop, which also hits and causes A’s L2 instructs L1 to snoop, which also hits and causes transition to Itransition to I

Since neither A’s L1 or L2 have modified the line, the write in Since neither A’s L1 or L2 have modified the line, the write in B can proceedB can proceed

The states are: A’s L1 and L2 are in I, and B’s L2 is in E and The states are: A’s L1 and L2 are in I, and B’s L2 is in E and L1 in SL1 in S

You probably expected B’s write to be held-back and B to end up You probably expected B’s write to be held-back and B to end up in state E for L1 and M for L2; but the write does take place in state E for L1 and M for L2; but the write does take place in the MESI protocol;in the MESI protocol;

Seems inefficient!Seems inefficient!

76

Pentium Pro L2 Cache PentiumPentium® ® Xeon™ is designed for 4-processor MP Xeon™ is designed for 4-processor MP

configurationconfiguration L2 cache is in separate cavity on Pentium Pro, but L2 cache is in separate cavity on Pentium Pro, but

on same chip, wire-bondedon same chip, wire-bonded L1 and L2 cache snoop simultaneously, hence L1 and L1 and L2 cache snoop simultaneously, hence L1 and

L2 can be in E state simultaneously!!L2 can be in E state simultaneously!! L1 cache on Pentium Pro (before Klamath) is twice L1 cache on Pentium Pro (before Klamath) is twice

the size, 8 KB each, code and data cachethe size, 8 KB each, code and data cache L2 in Pentium Pro performs L2 in Pentium Pro performs snarfingsnarfing On Pentium Pro, L2 unified cache is 4-way set-On Pentium Pro, L2 unified cache is 4-way set-

associative, the L1 data cache is 2-way, and the associative, the L1 data cache is 2-way, and the instruction cache 4-way set-associativeinstruction cache 4-way set-associative

Streaming into cache uses Streaming into cache uses toggle-modetoggle-mode, or , or critical-quad-firstcritical-quad-first mode. This resolves the hit, mode. This resolves the hit, having caused the miss, in the shortest possible having caused the miss, in the shortest possible timetime

Pentium Pro has no instruction delimiter bit per Pentium Pro has no instruction delimiter bit per instruction byte in the I-cacheinstruction byte in the I-cache

Pentium Pro Pentium Pro squashessquashes

77

Pentium Pro L2 Cache

Table Showing Toggle Mode in Pentium Pro, Critical-Quad-First ModeTable Showing Toggle Mode in Pentium Pro, Critical-Quad-First Mode

Address First Quad Second Quad Third Quad Fourth Quad 0x0..0x7 0x0 0x8 0x10 0x18 0x8..0xf 0x8 0x0 0x18 0x10

0x10..0x17 0x10 0x18 0x0 0x8 0x18..0x1f 0x18 0x10 0x8 0x0

78

Bibliography

1.1. Don Anderson and Shanley, T., MindShare [1995]. Don Anderson and Shanley, T., MindShare [1995]. Pentium TM Processor System ArchitecturePentium TM Processor System Architecture, , Addison-Wesley Publishing Company, Reading MA, PC Addison-Wesley Publishing Company, Reading MA, PC System Architecture Series. ISBN 0-201-40992-5System Architecture Series. ISBN 0-201-40992-5

2.2. Pentium Pro Developer’s Manual, Volume 1: Pentium Pro Developer’s Manual, Volume 1: SpecificationsSpecifications, 1996, one of a set of 3 volumes, 1996, one of a set of 3 volumes

3.3. Pentium Pro Developer’s Manual, Volume 2: Pentium Pro Developer’s Manual, Volume 2: Programmer's Reference ManualProgrammer's Reference Manual, Intel document, , Intel document, 1996, one of a set of 3 volumes1996, one of a set of 3 volumes

4.4. Pentium Pro Developer’s Manual, Volume 3: Pentium Pro Developer’s Manual, Volume 3: Operating Systems Writer’s ManualOperating Systems Writer’s Manual, Intel , Intel document, 1996, one of a set of 3 volumesdocument, 1996, one of a set of 3 volumes

5.5. Y. Sheffer: Y. Sheffer: http://webee.technion.ac.il/courses/044800/lecturhttp://webee.technion.ac.il/courses/044800/lectures/MESI.pdfes/MESI.pdf

6.6. MOESI protocol: MOESI protocol: http://en.wikipedia.org/wiki/MOESI_protocolhttp://en.wikipedia.org/wiki/MOESI_protocol

7.7. MESIF protocol: MESIF protocol: http://en.wikipedia.org/wiki/MESIF_protocolhttp://en.wikipedia.org/wiki/MESIF_protocol