Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Finite State Machine
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems...
-
Upload
tia-peterson -
Category
Documents
-
view
217 -
download
3
Transcript of Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems...
![Page 1: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/1.jpg)
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor
Systems
Mrinmoy GhoshHsien-Hsin S. Lee
School of Electrical and Computer Engineering Georgia Institute of Technology
Atlanta, GA
![Page 2: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/2.jpg)
2
• Definition of MLI:• Cache Line present in lower level cache
Cache Line present in higher level cache
• Use of MLI:• Facilitates efficient cache coherence implementation• Shields lower level caches from snoop requests
• Implementing MLI:• “I” bit in cache tags• Higher level cache gets info about clean evictions
Multi-Level Inclusion in Caches
![Page 3: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/3.jpg)
3
IBM Power 4 Cache Hierarchy
• 1.5MB L2 shared by 2 cores, with a 32MB L3• Inclusion maintained between L1 and L2• Inclusion indication can be false
L1 T
ag
L1$
L2 Cache
Inclusion bits
1
Level 3 Cache
snoop
Bu
s
![Page 4: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/4.jpg)
4
Another Approach: Piranha CMP (Compaq)
• 8 cores (64KB I$ + 64KB D$, 1MB shared L2)• Aggregate L1 = 1MB = L2• No inclusion maintained
L1 T
ag
L2 CacheL1
Tag
L2 controller
Duplicate L1 tag and state
snoop
L1$
Bu
s
![Page 5: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/5.jpg)
5
Power Implication in MLI Caches
• The same active information kept in both caches• With locality, L2 is rarely accessed
L2 CacheL1
Tag
L1$
11
1
1
11
11
1
1
1
111
11
1
1
• Cache larger deeper • Moore’s law more transistors for insurance?
L1 T
ag
L1$
L1 T
ag
L1$
L1 T
ag
L1$
![Page 6: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/6.jpg)
6
Prior Architectural Art in Saving Cache Leakage
BL BL
WL
Gated Vdd Control
Drowsy
Drowsy
Vdd (1V)
Vdd Low (0.3 V)
Vdd
Cache Decay
[ISCA-28]
Could lead to more power
Drowy Cache:
[ISCA-29][MICRO-35]
Could impact access latency
![Page 7: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/7.jpg)
7
Virtual ExclusionVirtual Exclusion
![Page 8: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/8.jpg)
8
0Gated Vdd
Control
Core
L1 Cache
Tag VD I 0x12341212ff001122301498ab34123445
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Virtual Exclusion: L1 Cache Line Fill
![Page 9: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/9.jpg)
9
1Gated Vdd
Control
Core
L1 Cache
Tag VD I
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Drowsy = 1
Vdd_low
Virtual Exclusion: L1 Eviction
0xffddeeaa109900110000001111111100
![Page 10: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/10.jpg)
10
Core
L1 Cache
Tag VD I
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data ArraySnoop
Request
Forward Snoop to L1
Protocol Change ─ Snoop Forwarding
![Page 11: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/11.jpg)
11
Core
L1 Cache
Tag VD I
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Invalidation Request
L1 Cache Write Notification
Protocol Change ─ Write Invalidation
![Page 12: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/12.jpg)
12
Modified Cache DecayModified Cache Decay
![Page 13: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/13.jpg)
13
Core
L1 Cache
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Tag DC I
Memory
L2 Linefill
Decay of counter continues even if line is in L1 Cache
Modified Cache Decay for MLI: L2 Line Fill
Tag DC I
Decay Counter
0x12341212ff001122301498ab34123445
![Page 14: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/14.jpg)
14
Core
L1 Cache
Tag DC I
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Tag DC I
Memory
Eviction
Decay of counter
unaffected by L1 Eviction
Modified Cache Decay for MLI : L1 Eviction
![Page 15: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/15.jpg)
15
Core
L1 Cache
Tag DC I
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Tag DC I
Memory
Access hits L2 Cache
Modified Cache Decay for MLI: L2 Hit
0x12341212ff001122301498ab34123445
![Page 16: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/16.jpg)
16
Hybrid Virtual Exclusion
• Observation:– Cache decay starts decaying when L1
has high locality
• Hybrid Virtual Execution does– Virtual Execution when L1 has high
locality– Start decaying after L1 eviction
![Page 17: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/17.jpg)
17
Core
L1 Cache
Tag DC I
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Tag DC I
Memory
L2 Linefill
Hybrid Virtual Exclusion: L2 Line Fill
0x12341212ff001122301498ab34123445
0Gated Vdd
Control
L1 & L2 virtually exclusive
![Page 18: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/18.jpg)
18
Core
L1 Cache
Tag DC I
2-Way L2 Cache
Tag RAM Data Array
Shared Bus
Tag RAM Data Array
Tag DC I
Memory
Eviction
Decay starts only after line is evicted from L1
Hybrid Virtual Exclusion: L1 Eviction
0x12341212ff001122301498ab34123445
![Page 19: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/19.jpg)
19
Experimental FrameworkSingle processor model Ultra Sparc T1 like (Niagara)
L1 data/instruction cache 2-way 16KB, 64 byte line
L2 caches 8-way 256KB, 512KB
L1 access 1 cycle
L2 access
(Shared for Multi-Core)
(Private for SMP)
10 cycles (normal)
12 cycles (drowsy)
Memory access 200 cycles
DRAM 256MB (conservative base)
Energy Baseline Drowsy cache scheme
• M5 simulator from Michigan• System level emulation• Power models integrated into M5
– ECacti from UC Irvine (leakage + dynamic)
– MICRON DRAM datasheet
• 2P, 4P, & 8-P SMP• Dual, Quad, & Oct- Multicore• Benchmark workload
– SPLASH-2 (ran to completion)– SPEC 2000
![Page 20: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/20.jpg)
20
-5%
5%
15%
25%
35%
45%
55%
Bar
nes
Cho
lesk
y
F
FT
F
MM
LUC
ontig
LUN
onco
ntig
Oce
anC
ontig
Oce
anN
onco
nt
Rad
ix
Ray
trac
e
Wat
erN
Squ
ared
Wat
erS
patia
l
Ave
rage
Decay Virtual Ex Hybrid
Leakage Energy Reduction (2-way SMP)
![Page 21: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/21.jpg)
21
Leakage Energy Reduction (Various SMPs)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
256-2P 256-4P 256-8P 512-2P 512-4P 512-8P
Decay Virtual Ex Hybrid
• Average of SPLASH2 benchmark
![Page 22: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/22.jpg)
22
-5%
5%
15%
25%
35%
45%
55%
65%
Bar
nes
Cho
lesk
y
FF
T
FM
M
LU
Con
tig
LUN
onco
ntig
Oce
anC
ontig
Oce
anN
onco
nt
R
adix
Ray
trac
e
Wat
erN
Squ
ared
Wat
erS
patia
l
Ave
rage
Decay Virtual Exclusion Hybrid
Leakage Energy Reduction (4-way Multi-Core)
![Page 23: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/23.jpg)
23
Leakage Energy Reduction (Various Multi-Cores)
-5%
0%
5%
10%
15%
20%
25%
256 2P 256 4P 256 8P 512 2P 512 4P 512 8P Mean
Decay Virtual Exclusion Hybrid
Configuration SPEC 2000 benchmark mix
2-way Multicore bzip, gzip
4-way Multicore bzip, gzip, crafty, gap
8-way Multicore 2x (bzip, gzip, crafty, gap)
![Page 24: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/24.jpg)
24
Conclusions• Prior art can violate Multi-level Inclusion for cache
coherence protocols
• Virtual Exclusion– Maintain correctness for Multi-Level Inclusion – Low overhead architectural approach– Enhanced Cache Decay to work correctly with MLI
• Significant energy savings over a drowsy cache baseline– Symmetric Multiprocessors (46% for 8-way, SPLASH2)– Multi-Core processors (35% for 4-way, SPLASH2)
![Page 25: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/25.jpg)
Thank You!
Georgia TechECE MARS Labshttp://arch.ece.gatech.edu
![Page 26: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/26.jpg)
BACKUP
![Page 27: Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.](https://reader033.fdocuments.in/reader033/viewer/2022050714/56649c785503460f9492dd09/html5/thumbnails/27.jpg)
27
Prior Architectural Art in Saving Cache Leakage• Cache Decay [ISCA-28]
– Use Gated-Vdd– Turn off cache lines when not used for a
while– Can lead to more power consumption– Did not consider cache coherence
• Drowsy Cache [ISCA-29][MICRO-35]
– Maintain state in low leakage drowsy mode
– Has latency implication