University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems...

21
University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches

Transcript of University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems...

Page 1: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 1

Computer Systems

the impact of caches

Page 2: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 2

Introduction

Different sorts of memory

• On-die 0/1/10 cycles

• On-board 100

• On-disk 10.000

• Off-machine 1.000.000

Page 3: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 3

The CPU-Memory Gap

• The increasing gap between disk, DRAM and SRAM, CPU speeds.

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1980 1985 1990 1995 2000

year

ns

Disk seek time

DRAM access time

SRAM access time

CPU cycle time

Page 4: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 4

Storage Trendsbigger, not faster

(Culled from back issues of Byte and PC Magazine)

metric 1980 1985 1990 1995 2000 2000:1980

$/MB 8,000 880 100 30 1 8,000access (ns) 375 200 100 70 60 6typical size (MB) 0.064 0.256 4 16 64 1,000

DRAM

metric 1980 1985 1990 1995 2000 2000:1980

$/MB 500 100 8 0.30 0.05 10,000access (ms) 87 75 28 10 8 11typical size (MB) 1 10 160 1,000 9,000 9,000

Disk

Page 5: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 5

metric 1980 1985 1990 1995 2000 2000:1980

$/MB 19,200 2,900 320 256 100 190access (ns) 300 150 35 15 2 100typical size (MB) 0.008 0.016 0.032

Processor trendsfaster

1980 1985 1990 1995 2000 2000:1980

processor 8080 286 386 Pent P-IIIclock rate (MHz) 1 6 20 150 750 750cycle time (ns) 1,000 166 50 6 1.6 750

SRAM

Page 6: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 6

Intel Processors CacheSRAM

L1 L2

486 1989-1994 8K -

Pentium 1993 8 K 8K -

Pentium Pro 1995-1999 8 K 8K 256K-1M

Pentium II 1997 16 K 16 K 512K ½

Celeron A 1998 16 K 16 K 128K

Pentium III Coppermine

2000 16 K 16 K 256K

Pentium 4Willamette

2000 12 K 8 K 256K

Pentium 4Northwood

2002 12 K 8 K 512K

http://www.intel.com/pressroom/kits/quickreffam.htm

Page 7: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 7

Memory Hierarchy

Registers

On-chip L1cache (SRAM)

Main memory(DRAM)

Local secondary storage(local disks)

Larger, slower,

and cheaper (per byte)storagedevices

Remote secondary storage(distributed file systems, Web servers)

Local disks hold files retrieved from disks on remote network servers.

Main memory holds disk blocks retrieved from local disks.

Off-chip L2cache (SRAM)

L1 cache holds cache lines retrieved from the L2 cache.

CPU registers hold words retrieved from cache memory.

L2 cache holds cache lines retrieved from memory.

L0:

L1:

L2:

L3:

L4:

L5:

Smaller,faster,and

costlier(per byte)storage devices

Page 8: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 8

Pay the price

• To access large amounts of data in a cost-effective manner, the bulk of the data must be stored on disk

1GB: ~$200 80 GB: ~$110

4 MB: ~$500

DiskDRAMSRAM

Page 9: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 9

Locality• Principle of Locality:

– Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves.

– Temporal locality: Recently referenced items are likely to be referenced in the near future.

– Spatial locality: Items with nearby addresses tend to be referenced close together in time.

Page 10: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 10

Page 11: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 11

Locality Example

• Data– Reference array elements in succession

(stride-1 reference pattern):– Reference sum each iteration:

• Instructions– Reference instructions in sequence:– Cycle through loop repeatedly:

sum = 0;for (i = 0; i < n; i++)

sum += a[i];return sum;

Spatial locality

Spatial locality

Temporal locality

Temporal locality

Page 12: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 12

Power Programmer

• Claim: Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer.

• Good locality?

int sumarrayrows(int a[M][N]){ int i, j, sum = 0;

for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum}

Page 13: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 13

Stride-M example

• Question: Does this function have good locality?

int sumarraycols(int a[M][N]){ int i, j, sum = 0;

for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum}

Page 14: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 14

Matrix M=2,N=3

Adress 0 4 8 12 16 20

Contents a00 a01 a02 a10 a11 a12

Acces order 1 2 3 4 5 6

Adress 0 4 8 12 16 20

Contents a00 a01 a02 a10 a11 a12

Acces order 1 3 5 2 4 6

int sumarrowrows()

int sumarrowcols()

Page 15: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 15

Expect: Stride-1 is better! 32 bytes

0

100

200

300

400

500

600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)

MB

/s

Series1

– int A[2][4]

Page 16: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 16

Reality: small matrices fit in cache

4 KB

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)

Th

rou

gp

ut

(MB

/s)

Series1

– int A[32][32]

Page 17: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 17

Reality: Performance-drop cache L2 / L1

not dramatic128 KB

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)

Th

rou

gh

pu

t (M

B/s

)

Series1

– int A[180][180]

Page 18: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 18

Reality: Only when DRAM is accessed,

the penalty can be seen 1 MB

0

200

400

600

800

1000

1200

1400

1600

1800

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

stride (words)

Th

rou

gh

pu

t (M

B/s

)

Series1

– int A[512][512]

Page 19: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 19

s1

s3

s5

s7

s9

s11

s13

s15

8m

2m 512k 12

8k 32k 8k

2k

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000R

ead

th

rou

gh

pu

t (M

B/s

)

Stride (words)Working set size (bytes)

Pentium 42.4 GHz 8 KB L1 d-cache12 KB L1 i-cache512 KB L2 cache

Ridges oftemporallocality

L1

L2

Mem

Slopes ofspatiallocality

xe

Memory Mountain

Page 20: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 20

Summary

• As long as your data fits in the cache, and your program shows good locality, good performance is guaranteed.

Page 21: University of Amsterdam Computer Systems – the impact of caches Arnoud Visser 1 Computer Systems the impact of caches.

University of Amsterdam

Computer Systems – the impact of caches Arnoud Visser 21

Assignment

• Practice Problem 6.9 (p. 624): 'Order three functions to the spatial locality enjoyed by each.'

• Practice Problem 6.22 (p. 659): 'Estimate the time, in CPU cycles, to read a 8-byte word, from the different L1-d of a i7 processor