The Storage Hierarchy
description
Transcript of The Storage Hierarchy
![Page 1: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/1.jpg)
1
The Storage Hierarchy
Registers Cache memory Main memory (RAM) Hard disk Removable media (CD, DVD etc) Internet
Fast, expensive, few
Slow, cheap, a lot
![Page 2: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/2.jpg)
What is Main Memory?
Where data reside for a program that is currently running Sometimes called RAM (Random Access Memory): you can
load from or store into any main memory location at any time Much slower => much cheaper => much bigger
2
![Page 3: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/3.jpg)
What Main Memory Looks Like
3
…0 1 2 3 4 5 6 7 8 9 10
536,870,911
You can think of main memory as a big long 1D array of bytes.
![Page 4: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/4.jpg)
Cache
[4]
![Page 5: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/5.jpg)
Cache
Caching is a technology based on the memory subsystem of your computer. The main purpose of a cache is to accelerate your computer while keeping the price of the computer low. Caching allows you to do your computer tasks more rapidly.
5
![Page 6: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/6.jpg)
Librarian Example Librarian behind the desk to give you the books you ask for
– he fetches it for you from a set of stacks in the storeroom You ask for the book Moby Dick The librarian goes into the storeroom, gets the book, returns
to the counter and gives the book to the customer. Later, the client comes back to return the book. The
librarian takes the book and returns it to the storeroom. He then returns to his counter waiting for another customer.
Let's say the next customer asks for Moby Dick . The librarian then has to return to the storeroom to get the book he recently handled and give it to the client
6
![Page 7: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/7.jpg)
Librarian Example Cont. Under this model, the librarian has to make a complete
round trip to fetch every book -- even very popular ones that are requested frequently. Is there a way to improve the performance of the librarian?
7
![Page 8: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/8.jpg)
Librarian Example with Cache Let's give the librarian a backpack into which he will be
able to store 10 books (in computer terms, the librarian now has a 10-book cache).
In this backpack, he will put the books the clients return to him, up to a maximum of 10.
The day starts. The backpack of the librarian is empty. Our first client arrives and asks for Moby Dick.
The librarian has to go to the storeroom to get the book. He gives it to the client.
8
![Page 9: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/9.jpg)
Librarian Example With Cache Later, the client returns and gives the book back to the
librarian. Instead of returning to the storeroom to return the book, the
librarian puts the book in his backpack and stands there (he checks first to see if the bag is full).
Another client arrives and asks for Moby Dick. Before going to the storeroom, the librarian checks to see if this title is in his backpack. He finds it! All he has to do is take the book from the backpack and give it to the client. There's no journey into the storeroom, so the client is served more efficiently.
9
![Page 10: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/10.jpg)
Not in BackpackWhat if the client asked for a title not in the cache (the
backpack)? In this case, the librarian is less efficient with a cache than
without one, because the librarian takes the time to look for the book in his backpack first.
Even in our simple librarian example, the latency time (the waiting time) of searching the cache is so small compared to the time to walk back to the storeroom that it is irrelevant.
The cache is small (10 books), and the time it takes to notice a miss is only a tiny fraction of the time that a journey to the storeroom takes.
10
![Page 11: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/11.jpg)
What is Cache? A special kind of memory where data reside that are
about to be used or have just been used.
Very fast => very expensive => very small (typically 100 to 10,000 times as expensive as RAM per byte)
Data in cache can be loaded into or stored from registers at speeds comparable to the speed of performing computations.
Data that are not in cache (but that are in Main Memory) take much longer to load or store.
Cache is near the CPU: either inside the CPU or on the motherboard that the CPU sits on.
11
![Page 12: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/12.jpg)
The Relationship Between
Main Memory & Cache
![Page 13: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/13.jpg)
13
RAM is SlowCPUThe speed of data transfer
between Main Memory and theCPU is much slower than thespeed of calculating, so the CPUspends most of its time waitingfor data to come in or go out.
![Page 14: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/14.jpg)
From Cache to the CPU
14
Typically, data move between cache and the CPU at speeds relatively near to that of the CPU performing calculations.
CPU
Cache
![Page 15: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/15.jpg)
15
Why Have Cache?CPUCache is much closer to the speed
of the CPU, so the CPU doesn’thave to wait nearly as long forstuff that’s already in cache:it can do moreoperations per second!
![Page 16: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/16.jpg)
Important Facts about Cache Cache technology is the use of a faster but smaller memory
type to accelerate a slower but larger memory type. When using a cache, you must check the cache to see if an
item is in there. If it is there, it's called a cache hit. If not, it is called a cache miss and the computer must wait for a round trip from the larger, slower memory area.
A cache has some maximum size that is much smaller than the larger storage area.
16
![Page 17: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/17.jpg)
Facts about Cache cont. It is possible to have multiple layers of cache. With our librarian example, the smaller but faster memory
type is the backpack, and the storeroom represents the larger and slower memory type. This is a one-level cache.
There might be another layer of cache consisting of a shelf that can hold 100 books behind the counter. The librarian can check the backpack, then the shelf and then the storeroom. This would be a two-level cache.
17
![Page 18: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/18.jpg)
How Cache WorksWhen you request data from a particular address in Main
Memory, here’s what happens:
1. The hardware checks whether the data for that address is already in cache. If so, it uses it.
1. Otherwise, it loads from Main Memory the entire cache line that contains the address.
18
![Page 19: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/19.jpg)
If It’s in Cache, It’s Also in RAMIf a particular memory address is currently in cache, then it’s
also in Main Memory (RAM).
That is, all of a program’s data are in Main Memory, but some are also in cache.
19
![Page 20: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/20.jpg)
Cache Use Jargon Cache Hit: the data that the CPU needs right now are
already in cache. (remember librarian example) Cache Miss: the data that the CPU needs right now are
not currently in cache.
If all of your data are small enough to fit in cache, then when you run your program, you’ll get almost all cache hits (except at the very beginning), which means that your performance could be excellent!
Sadly, this rarely happens in real life: most problems of scientific or engineering interest are bigger than just a few MB.
20
![Page 21: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/21.jpg)
Cache Lines A cache line is a small, contiguous region in cache,
corresponding to a contiguous region in RAM of the same size, that is loaded all at once. Typical size of cache lines: 32 to 1024 bytes
Cache reuse: the program is much more efficient if all of the data and instructions fit in cache; try to use what’s in cache a lot before using anything that isn’t in cache (e.g., tiling)
21
![Page 22: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/22.jpg)
Improving Your Cache Hit Rate
Many scientific codes use a lot more data than can fit in cache all at once.
Therefore, you need to ensure a high cache hit rate even though you’ve got much more data than cache.
So, how can you improve your cache hit rate? Use the same solution as in Real Estate:
Location, Location, Location!
22
![Page 23: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/23.jpg)
Cache Example 1 Cache Line Size (CLS) = 32 bytes Cache Size (CS) = 64 lines Direct Mapping Each element is 4 bytes longreal a (1024, 100)…
do i=1, 1024 do j=1, 100 a(i,j) = a(i,j) + 1
end do end do Cache misses = 100*1024 = 102,400
23
![Page 24: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/24.jpg)
Example 1 using Loop Interchange
Interchange I and Jreal a (1024, 100)…
do j=1, 1024 do i=1, 100 a(i,j) = a(i,j) + 1
end do end do Takes advantage of spatial locality to reduce cache misses Cache misses = 102,400/8 = 12,800, number of cache lines
occupied by the array a
24
![Page 25: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/25.jpg)
25
Tiling
![Page 26: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/26.jpg)
26
Tiling Tile: A small rectangular subdomain of a problem domain.
Sometimes called a block or a chunk. Tiling: Breaking the domain into tiles. Tiling strategy: Operate on each tile to completion, then
move to the next tile. Tile size can be set at runtime, according to what’s best for
the machine that you’re running on.
![Page 27: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/27.jpg)
Example 2do j=1, 100
do i=1, 4096 a(i) = a(i)*a(i)
end doend do Cache misses = (4096/8)* 100 After accessing elements a(1..512) the cache is full and a
new cache line has to be written back to memory before a new line is written to cache
Each element a(i) is reused 100 times we can apply TILING to reduce the number of cache misses
27
![Page 28: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/28.jpg)
Example 2 using Tilingdo ii= 1, 4096, B do j=1, 100 do i=ii, min(ii+B-1,4096)
a(i) = a(i) + a(i) end doend do
The cache can accommodate 512 elements of a.Thus, if we choose B=512, cache misses = 8
28
![Page 29: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/29.jpg)
29
A Sample ApplicationMatrix-Matrix MultiplyLet A, B and C be matrices of sizesnr nc, nr nk and nk nc, respectively:
ncnrnrnrnr
nc
nc
nc
aaaa
aaaaaaaaaaaa
,3,2,1,
,33,32,31,3
,23,22,21,2
,13,12,11,1
A
nknrnrnrnr
nk
nk
nk
bbbb
bbbbbbbbbbbb
,3,2,1,
,33,32,31,3
,23,22,21,2
,13,12,11,1
B
ncnknknknk
nc
nc
nc
cccc
cccccccccccc
,3,2,1,
,33,32,31,3
,23,22,21,2
,13,12,11,1
C
![Page 30: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/30.jpg)
30
Matrix Multiply: Naïve VersionSUBROUTINE matrix_matrix_mult_naive (dst, src1, src2, & & nr, nc, nq) IMPLICIT NONE INTEGER,INTENT(IN) :: nr, nc, nq REAL,DIMENSION(nr,nc),INTENT(OUT) :: dst REAL,DIMENSION(nr,nq),INTENT(IN) :: src1 REAL,DIMENSION(nq,nc),INTENT(IN) :: src2
INTEGER :: r, c, q
DO c = 1, nc DO r = 1, nr dst(r,c) = 0.0 DO q = 1, nq dst(r,c) = dst(r,c) + src1(r,q) * src2(q,c) END DO END DO END DOEND SUBROUTINE matrix_matrix_mult_naive
![Page 31: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/31.jpg)
31
Multiplying Within a TileSUBROUTINE matrix_matrix_mult_tile (dst, src1, src2, nr, nc, nq, & & rstart, rend, cstart, cend, qstart, qend) IMPLICIT NONE INTEGER,INTENT(IN) :: nr, nc, nq REAL,DIMENSION(nr,nc),INTENT(OUT) :: dst REAL,DIMENSION(nr,nq),INTENT(IN) :: src1 REAL,DIMENSION(nq,nc),INTENT(IN) :: src2 INTEGER,INTENT(IN) :: rstart, rend, cstart, cend, qstart, qend
INTEGER :: r, c, q
DO c = cstart, cend DO r = rstart, rend IF (qstart == 1) dst(r,c) = 0.0 DO q = qstart, qend dst(r,c) = dst(r,c) + src1(r,q) * src2(q,c) END DO END DO END DOEND SUBROUTINE matrix_matrix_mult_tile
![Page 32: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/32.jpg)
32
The Advantages of Tiling It allows your code to exploit data locality better, to get
much more cache reuse: your code runs faster! It’s a relatively modest amount of extra coding (typically a
few wrapper functions and some changes to loop bounds). If you don’t need tiling – because of the hardware, the
compiler or the problem size – then you can turn it off by simply setting the tile size equal to the problem size.
![Page 33: The Storage Hierarchy](https://reader035.fdocuments.in/reader035/viewer/2022062301/56815a92550346895dc80a96/html5/thumbnails/33.jpg)
33
Will Tiling Always Work?Tiling WON’T always work. Why?
Well, tiling works well when: the order in which calculations occur doesn’t matter much,
AND there are lots and lots of calculations to do for each memory
movement.
If either condition is absent, then tiling may not help.