C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced...

C SINGH, JUNE 7-8, 2010 IWW 2010, ISATANBUL, TURKEYAdvanced Computers Architecture, UNIT 1

Advanced Computers Architecture

Lecture 9

ByRohit Khokher

Department of Computer Science, Sharda University, Greater Noida, India

Cache

• Small amount of fast memory• Sits between normal main memory and CPU• May be located on CPU chip or module

Cache/Main Memory Structure

Cache operation – overview

• CPU requests contents of memory location• Check cache for this data• If present, get from cache (fast)• If not present, read required block from main

memory to cache• Then deliver from cache to CPU• Cache includes tags to identify which block of

main memory is in each cache slot

Cache Read Operation - Flowchart


Cache Design

• Size• Mapping Function• Replacement Algorithm• Write Policy• Block Size• Number of Caches


Size

• Cost– More cache is expensive

• Speed– More cache is faster (up to a point)– Checking cache for data takes time


Comparison of Cache Sizes

Processor TypeYear of

Introduction L1 cachea L2 cache L3 cache

IBM 360/85 Mainframe 1968 16 to 32 KB — —

PDP-11/70 Minicomputer 1975 1 KB — —

VAX 11/780 Minicomputer 1978 16 KB — —

IBM 3033 Mainframe 1978 64 KB — —

IBM 3090 Mainframe 1985 128 to 256 KB — —

Intel 80486 PC 1989 8 KB — —

Pentium PC 1993 8 KB/8 KB 256 to 512 KB —

PowerPC 601 PC 1993 32 KB — —

PowerPC 620 PC 1996 32 KB/32 KB — —

PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB

IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB

IBM S/390 G6 Mainframe 1999 256 KB 8 MB —

Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —

IBM SPHigh-end server/ supercomputer

2000 64 KB/32 KB 8 MB —

CRAY MTAb Supercomputer 2000 8 KB 2 MB —

Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB

SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —

Itanium 2 PC/server 2002 32 KB 256 KB 6 MB

IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB

CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —


Mapping functions are used as a way to decide which main memory block occupies which line of cache. As there are less lines of cache than there are main memory blocks, an algorithm is needed to decide this.

Three cache mapping functions, i.e., methods of addressing to locate data within a cache.

Direct Full Associative Set Associative

Each of these depends on two facts:

Mapping Function


RAM is divided into blocks of memory locations. In other words, memory locations are grouped into blocks of 2n locations where n represents the number of bits used to identify a word within a block. These n bits are found at the least-significant end of the physical address. The image below has n=2 indicating that for each block of memory, there are 22 = 4 memory locations.

First Concept


Therefore, for this example, the least two significant bits of an address indicate the location within a block while the remaining bits indicate the block number. The table below shows an example with a 20 bit address with four words per block. Notice that for each group of four words, the word bits take on each of the four possible values allowed with 2 bits while the block identification bits remain constant.,

Block Address Block identification bits Word bits

Block 0

0x00000 0000 0000 0000 0000 00 00

0x00001 0000 0000 0000 0000 00 01

0x00002 0000 0000 0000 0000 00 10

0x00003 0000 0000 0000 0000 00 11

Block 1

0x00004 0000 0000 0000 0000 01 00

0x00005 0000 0000 0000 0000 01 01

0x00006 0000 0000 0000 0000 01 10

0x00007 0000 0000 0000 0000 01 11

Block 2

0x00008 0000 0000 0000 0000 10 00

0x00009 0000 0000 0000 0000 10 01

0x0000A 0000 0000 0000 0000 10 10

0x0000B 0000 0000 0000 0000 10 11

Block 3

0x0000C 0000 0000 0000 0000 11 00

0x0000D 0000 0000 0000 0000 11 01

0x0000E 0000 0000 0000 0000 11 10

0x0000F 0000 0000 0000 0000 11 11

And so on...until we get to the last row

Block 2n-1

0xFFFFC 1111 1111 1111 1111 11 00

0xFFFFD 1111 1111 1111 1111 11 01

0xFFFFE 1111 1111 1111 1111 11 10

0xFFFFF 1111 1111 1111 1111 11 11


The cache is organized into lines, each of which contains enough space to store

exactly one block of data and a tag uniquely identifying where that block came from

in memory.

Second Concept


This is the simplest form of mapping. One block from main memory maps into only one possible line of cache memory. As there are more blocks of main memory than there are lines of cache, many blocks in main memory can map to the same line in cache memory.

To implement this function, use the following formula:

α = β % γ

where, α is the cache line number, β is the block number in main memory,γ is the total number of lines in cache memory and % being the modulus operator.

Direct Mapping


The address for this example is broken down something like the following:

The address is broken into three parts:

(s-r) MSB bits represent the tag to be stored in a line of the cache corresponding to the block stored in the line; r bits in the middle identifying which line the block is always stored in; andw LSB bits identifying each word within the block. This means that:

The number of addressable units = 2s+w words or bytes

The block size (cache line width not including tag) = 2w words or bytes

The number of blocks in main memory = 2s (i.e., all the bits that are not in w)

The number of lines in cache = m = 2r

The size of the tag stored in each line of the cache = (s - r) bits

Tag 8 bits identifying line in cache

word id bits


Disadvantage

There is a fixed cache location for any given block in main memory.

If two blocks of memory sharing the same cache line are being continually referenced, cache misses would occur and these two blocks would continuously be swapped, resulting in slower memory access due to the time taken to access main memory


Associative Mapping

• A main memory block can load into any line of cache

• Memory address is interpreted as tag and word

• Tag uniquely identifies block of memory• Every line’s tag is examined for a match• Cache searching gets expensive• Address in this mapping . Tag word id bits


The number of addressable units = 2s+w words or bytes

The block size (cache line width not including tag) = 2w words or bytes

The number of blocks in main memory = 2s (i.e., all the bits that are not in w)

The number of lines in cache is not dependent on any part of the memory address

The size of the tag stored in each line of the cache = s bits


Set Associative

• Cache is divided into a number of sets• Each set contains a number of lines• A given block maps to any line in a given set

– e.g. Block B can be in any line of set i• e.g. 2 lines per set

– 2 way associative mapping– A given block can be in one of 2 lines in only one

set


For direct mapping where there is only one possible line for a block of memory, no replacement algorithm is needed.For associative and set associative mapping, however, an algorithm is needed. For maximum speed, this algorithm is implemented in the hardware. Four of the most common algorithms are:

least recently used This replaces the candidate line in cache memory that has been there the longest with no reference to it.first in first out This replaces the candidate line in the cache that has been there the longest.least frequently used This replaces the candidate line in the cache that has had the fewest references.random replacement This algorithm randomly chooses a line to be replaced from among the candidate lines. Studies have shown that this yields only slightly inferior performance than other algorithms.

Replacement Algorithms


This is important because if changes were made to a line in cache memory, the appropriate changes should be made to the block in main memory before removing the line from the cache. The problems to contend with are more than one device may have access to main memory (I/O modules). If more than one processor on the same bus with its own cache is involved, the problem becomes more complex. Any change in either cache or main memory could invalidate the others.

WRITE THROUGH

The simplest technique is called “write through”. In using this technique, both main memory and cache are written to when a write operation is performed, ensuring that main memory is always valid.

The main disadvantage of this technique is that it may generate substantial main memory traffic, causing a bottle neck and decreasing performance.

Write Policy


WRITE BACK An alternative technique, known as “write back” minimizes main memory writes. Updates are made only in the cache. An update bit associated with the line is set. Main memory is updated when the line in cache gets replaces only if the update bit has been set.

The problem with this technique is that all changes to main memory have to be made through the cache in order not to invalidate parts of main memory, which potentially may cause a bottle neck.


When a block of data is retrieved from main memory and put into the cache, the desired word and a number of adjacent words are retrieved.

As the block size increases from a very small size, the hit ratio will at first increase due to the principle of locality of reference, which says that words in the vicinity of a referenced word are more likely to be referenced in the near future.

As the block size increases, however, the hit ratio will decrease as the probability of reusing the new information becomes less than that of using the information that has been replaced.

Line Size


Two aspects of this are:

MultilevelDue to increased logic density, it has become possible to have a cache on the same chip as the processor.

This increases execution time as less activity over an external bus is needed. Even though an on-chip cache exists, it is typically desirable to have an off-chip cache as well.

This means that if a miss occurs on the level 1 cache (on-chip), instead of retrieving the data from the slower main memory, information may be retrieved from the level 2 cache, which, although slower than level 1 cache, is still appreciably faster than main memory.

Some level 2 caches are stored on-chip and a level 3 cache has been implemented off-chip.

Number of Caches


Unified/Split

Two types of words exist that are stored in cache, namely data and instruction. It has become common to split the cache into two to separate these words.

Two potential advantages to a unified cache are:A greater hit rate than split caches because the load between instruction and data fetches are balanced automatically.Only one cache needs to be designed and implemented.

The key advantage of the split cache design is that it eliminates contention for the cache between the instruction fetch/decode unit and the execution unit. This is important for designs that rely on pipelining of instructions.

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced...

Documents

Transcript of C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced...