Simulations of Memory Hierarchy LAB 2: CACHE LAB.

22
Simulations of Memory Hierarchy LAB 2: CACHE LAB

Transcript of Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Page 1: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Simulations of Memory Hierarchy

LAB 2: CACHE LAB

Page 2: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

OVERVIEW• Objectives

• Cache Set-Up

• Command line parsing

• Least Recently Used (LRU)

• Matrix Transposition

• Cache-Friendly Code

Page 3: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

OBJECTIVE• There are two parts to this lab:

• Part A: Cache Simulator

• Simulate a cache table using the LRU algorithm

• Part B: Optimizing Matrix Transpose

• Write “cache-friendly” code in order to optimize cache hits/misses in the implementation of a matrix transpose function

• When submitting your lab, please submit the handin.tar file as described in the instructions.

Page 4: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

MEMORY HIERARCHY• Pick your poison: smaller, faster, and costlier, or larger,

slower, and cheaper

Page 5: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

CACHE ADDRESSING• X-bit memory addresses (in Part A, X <= 64 bits)

• Block offset: b bits

• Set index: s bits

• Tag bits: X – b – s

• Cache is a collection of S=2^s cache sets

• Cache set is a collection of E cache lines

• E is the associativity of the cache

• If E=1, the cache is called “direct-mapped”

• Each cache line stores a block of B=2^b bytes of data

Page 6: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

ADDRESS ANATOMY

Page 7: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

CACHE TABLE BASICS• Conditions:

• Set size (S)

• Block size (B)

• Line size (E)

• Note that the total capacity of this cache would be S*B*E

• Blocks are the fundamental units of the cache

Page 8: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

CACHE TABLE CORRESPONDENCE WITH ADDRESS

Page 9: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Example for 32 bit address

Page 10: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

CACHE SET LOOK-UP• Determine the set index and the tag bits based on the

memory address

• Locate the corresponding cache set and determine whether or not there exists a valid cache line with a matching tag

• If a cache miss occurs:

• If there is an empty cache line, utilize it

• If the set is full then a cache line must be evicted

Page 11: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

TYPES OF CACHE MISSES• Compulsory Miss:

• First access to a block has to be a miss

• Conflict Miss:

• Level k cache is large enough, but multiple data objects all map to the same level k block

• Capacity Miss:

• Occurs when the working set of blocks (blocks of memory being used) is larger than the cache

Page 12: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

PART A:CACHE SIMULATION

Page 13: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

YOUR OWN CACHE SIMULATOR• NOT a real cache

• Block offsets are NOT used but are important in understanding the concept of a cache

• s, b, and E given at runtime

Page 14: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

FUNCTIONS TO USE FOR COMMAND LINE PARSING• int getopt(int argc, char*const* argv, const char*

options)

• See: http://www.gnu.org/software/libc/manual/html_node/Example-of-Getopt.html#Example-of-Getopt

• long long int strtoll(const char* str, char** endptr, int base)

• See: http://www.cplusplus.com/reference/cstdlib/strtoll/

Page 15: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

LEAST RECENTLY USED (LRU) ALGORITHM

• A least recently used algorithm should be used to determine which cache lines to evict in what order

• Each cache line will need some sort of “time” field which should be update each time that cache line is referenced

• If a cache miss occurs in a full cache set, the cache line with the least relevant time field should be evicted

Page 16: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

PART B:OPTIMIZING MATRIX TRANSPOSE

Page 17: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

WHAT IS A MATRIX TRANSPOSITION?

• The transpose of a matrix A is denoted as AT

• The rows of AT are the columns of A, and the columns of AT are the rows of A

• Example:

Page 18: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

GENERAL MATRIX TRANSPOSITION

Page 19: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

CACHE-FRIENDLY CODE• In order to have fewer cache misses, you must make

good use of:

• Temporal locality: reuse the current cache block if possible (avoid conflict misses [thrashing])

• Spatial locality: reference the data of close storage locations

• Tips:

• Cache blocking

• Optimized access patterns

• Your code should look ugly if done correctly

Page 20: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

CACHE BLOCKING• Partition the matrix in question into sub-matrices

• Divide the larger problem into smaller sub-problems

• Main idea:

• Iterate over blocks as you perform the transpose as opposed to the simplistic algorithm which goes index by index, row by row

• Determining the size of these blocks will take some amount of thought and experimentation

Page 21: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

QUESTIONS TO PONDER• What would happen if instead of accessing each index in row

order you alternated with jumping from row to row within the same column?

• What would happen if you declared only 4 local variables as opposed to 12 local variables?

• Is it possible to get rid of the local variables all together?

• What happens when accessing elements along the diagonal?

• What happens when the program is run in a different directory?

Page 22: Simulations of Memory Hierarchy LAB 2: CACHE LAB.

(XKCD)