Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation...
Transcript of Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation...
![Page 1: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/1.jpg)
Virtualized and Flexible ECC for Main Memory
Doe Hyun Yoon and Mattan Erez
Dept. Electrical and Computer Engineering
The University of Texas at Austin
1ASPLOS 2010
![Page 2: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/2.jpg)
Memory Error Protection
• Applying ECC uniformly – ECC DIMMs– Simple and transparent to programmers
• Error protection level– Fixed, design-time decision
• Chipkill-correct used in high-end servers– Constrain memory module design space
• Allow only x4 DRAMs• Lower energy efficiency than x8 DRAMs
• Virtualized ECC – objectives– To provide flexible memory error protection– To relax design constraints of chipkill
2
![Page 3: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/3.jpg)
Virtualized ECC
• Two-tiered error protection
• Tier-1 Error Code (T1EC)
– Simple error code for detection or light-weight correction
• Tier-2 Error Code (T2EC)
– Strong error correcting code
• Store T2EC within the memory namespace itself
– OS manages T2EC
• Flexible memory error protection
– Different T2EC for different data pages
– Stronger protection for more important data
3
![Page 4: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/4.jpg)
Virtualized ECC – Example
4
Physical Memory
Data T1EC
Virtual Address space
Low
High
Virtual Page to Physical Frame mapping
Physical Frame to ECC Page mapping
T2EC for Chipkill
T2EC for DoubleChipkill
Page frame – i
Page frame – j
Page frame – k
ECC page – j
ECC page – k
Virtual page – i
Virtual page – j
Virtual page – k
Error Protection
Level
![Page 5: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/5.jpg)
VIRTUALIZED ECC
5
![Page 6: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/6.jpg)
Observations on Memory Errors
• Per-system error rate is still low– Most of time, we try to detect errors
finding no error
• To detect errors is a common case operation– Need a low latency, low complexity
error detection mechanism T1EC
• To correct errors is an uncommon case operation– Correction can be complex, take a long time– But, still need to manage
error correction info somewhere Virtualized T2EC
6
![Page 7: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/7.jpg)
Uniform ECC
7
Physical Memory
Data ECC
VPN
Virtual Memory
VA offset
PFN offsetPA
Page Frame
PA
![Page 8: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/8.jpg)
Virtualized ECC
8
Physical Memory
Data T1EC
VPNVA offset
PFN offsetPA
Scale according to T2EC size
offsetECC Address
OS managesPFN to EPNtranslation
ECC page number
T2EC
ECC Page
PA
EA
Virtual Memory
Page Frame
![Page 9: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/9.jpg)
Data T1EC Data T1EC
LLC
DRAM Rank 0 Rank 1
ECC Address Translation Unit
T2EC for Rank 1 data
T2EC for Rank 0 data
0000
0080
0100
0180
0200
0280
0300
0380
0400
0480
0500
0580
0040
00c0
0140
01c0
0240
02c0
0340
03c0
0440
04c0
0540
05c0
PA: 0x02003
Wr: 0x02002
B0
B0
Rd: 0x00c01
A
A
B1
B2
B3
1 2 3
1 2 3
EA: 0x054040
0
Wr: 0x05405
Virtualized ECC operationRead: fetch data and T1ECDon’t need T2EC in most casesWrite: update data, T1EC, and T2ECECC Address Translation Unit: fast PA to EA translationT2ECs of consecutive data lines map to a T2EC lineT2EC lines can be partially validUpdate only valid T2EC to DRAM
![Page 10: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/10.jpg)
Penalty with V-ECC
• Increased data miss rate
– T2EC lines in LLC reduce effective LLC size
• Increased traffic due to T2EC write-back
– One-way write-back traffic
• Not in a critical-path
10
![Page 11: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/11.jpg)
CHIPKILL-CORRECT
11
![Page 12: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/12.jpg)
Chipkill-correct
• Single Device-error CorrectDouble Device-error Detect
– Can tolerate a DRAM failure
– Can detect a second DRAM failure
• Chipkill requires x4 DRAMs
• x8 chipkill is impractical
– But, x8 DRAM is more energy efficient
12
![Page 13: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/13.jpg)
Baseline x4 Chipkill• Two x4 ECC DIMMs
– 128bit data + 16bit ECC (redundancy overhead: 12.5%)– 4 check symbol error code using 4-bit symbol
• Access granularity – 64B in DDR2 (min. burst 4 x 128 bit)– 128B in DDR3 (min. burst 8 x 128 bit)
13
x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4
x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4
144-bit wide data bus
![Page 14: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/14.jpg)
x8 Chipkill• x8 chipkill with the same access granularity
– 152-bit wide data path
• 128-bit data + 24-bit ECC
• Redundancy overhead: 18.75%
– Need a custom-designed DIMM
• Increase the system cost a lot
14
152-bit wide data bus
x8 x8 x8 x8 x8 x8 x8 x8 x8
x8 x8 x8 x8 x8 x8 x8 x8 x8
x8
![Page 15: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/15.jpg)
x8 Chipkill /w Standard DIMMs
• Increase access granularity– 128B in DDR2 (min. burst 4 x 256 bit)– 256B in DDR3 (min. burst 8 x 256 bit)
15
x8 x8 x8 x8 x8 x8 x8 x8 x8
x8 x8 x8 x8 x8 x8 x8 x8
x8 x8 x8 x8 x8 x8 x8 x8 x8
x8 x8 x8 x8 x8 x8 x8 x8 x8
280-bit wide data bus
![Page 16: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/16.jpg)
V-ECC for Chipkill
• Use 3 check symbol error codes– Single Symbol-error Correct and
Double Symbol-error Detect
• T1EC– 2 check symbols
– Detect up to 2 symbol error
• T2EC– 3rd check symbol
– Combined T1EC/T2EC provides Chipkill16
![Page 17: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/17.jpg)
V-ECC: ECC x4 configuration
• Use 8-bit symbol error code
– 2 bursts out of a x4 DRAM form an 8bit-symbol
• Modern DRAMs have minimum burst of 4 or 8
• 1 x4 ECC DIMM + 1 x4 Non-ECC DIMM
• Each DRAM access in DDR2 (burst 4)
– 64B data, 4B T1EC
– 2B T2EC is virtualized within memory namespace
• 32 T2ECs per 64B cache line
17
Virtualized within memory
x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4
x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4
136-bit wide data bus
Data
Data
T1EC
T2EC
![Page 18: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/18.jpg)
144-bit wide data bus
x8 x8 x8 x8 x8 x8 x8 x8 x8
x8 x8 x8 x8 x8 x8 x8 x8 x8
V-ECC: ECC x8 configuration
• Use 8-bit symbol error code
• 2 x8 ECC DIMMs
• Each DRAM access in DDR2 (burst 4)
– 64B data, 8B T1EC
– 4B T2EC is virtualized
• 16 T2ECs per 64B cache line
18
Data
Data
T1EC
T1EC
T2EC
Virtualized within memory
![Page 19: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/19.jpg)
Flexible Error Protection
• Single HW with V-ECC can provide– Chipkill-detect, Chipkill-correct, and
Double chipkill-correct
– Use different T2EC for different pages
• Reliability – Performance tradeoff
• Maximize performance/power efficiency with Chipkill-Detect
• Stronger protection at the cost of additional T2EC access
19
Chipkill-Detect
Chipkill-Correct
Double Chipkill-Correct
ECC x4 0B 2B 4B
ECC x8 0B 4B 8B
![Page 20: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/20.jpg)
EVALUATION
20
![Page 21: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/21.jpg)
Simulator/Workload
• GEMS + DRAMsim– An out-of-order SPARC V9 core– Exclusive two-level cache hierarchy– DDR2 800MHz – 12.8GB/s (128-bit wide data path)
• 1 channel 4 ranks
• Power model– WATTCH for processor power – scaled to 45nm– CACTI for cache power – cacti 45nm– Micron model for DRAM power – commodity DRAMs
• Workloads– 12 data intensive applications
from SPEC CPU 2006 and PARSEC– Microbenchmarks: STREAM and GUPS
21
![Page 22: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/22.jpg)
0.94
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg
SPEC 2006 PARSEC
Baseline x4 ECC x4 ECC x8
0.94
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
ST
RE
AM
GU
PS
Normalized Execution Time
• Less than 1% penalty on average
• Performance penalty
– Spatial locality
– Write-back traffic
0.94
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg
SPEC 2006 PARSEC
Baseline x4 ECC x4 ECC x8
0.94
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
ST
RE
AM
GU
PS
0.94
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg
SPEC 2006 PARSEC
Baseline x4 ECC x4 ECC x8
0.94
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
ST
RE
AM
GU
PS
![Page 23: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/23.jpg)
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg
SPEC 2006 PARSEC
Baseline x4 ECC x4 ECC x8
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
ST
RE
AM
GU
PS
System Energy Efficiency
• Energy Delay Product (EDP) gain
– ECC x4: 1.1% on average
– ECC x8: 12.0% on average
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg
SPEC 2006 PARSEC
Baseline x4 ECC x4 ECC x8
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
ST
RE
AM
GU
PS
1.23
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
ST
RE
AM
GU
PS
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg
SPEC 2006 PARSEC
Baseline x4 ECC x4 ECC x8
20%17%
10%12%
![Page 24: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/24.jpg)
0.96
1.00
1.04
1.08
1.12
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Normalized Execution Time
Flexible Error Protection
0.96
1.00
1.04
1.08
1.12
1234567
0.60
0.70
0.80
0.90
1.00
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
bzip2 hmmer mcf libq omnet milc lbm sphinx3 canneal dedup fluid freq avg
SPEC 2006 PARSEC
Normalized EDP
0.60
0.70
0.80
0.90
1.00
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
Chip
kill
Dete
ct
Chip
kill
Corr
ect
2 C
hip
kill
Corr
ect
STREAM GUPS
Chipkill-Detect
Chipkill-Correct
Double Chipkill-Correct
![Page 25: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/25.jpg)
Conclusion
• Virtualized ECC– Two-tiered error protection, virtualized T2EC
• Improved system energy efficiency with chipkill– Reduce DRAM power consumption by 27%– Improve system EDP by 12%
• Performance penalty – 1% on average
• Error protection even for Non-ECC DIMMs– Can be used for GPU memory error protection
• Flexibility in error protection– Adaptive error protection level by user/system demand– Cost of error protection is proportional to protection level
25
![Page 26: Virtualized and Flexible ECC for Main Memory...0 4 EA: 0x0540 0 5 Wr: 0x0540 ECC Address Translation Unit: fast PA to EA translationWrite: update data, T1EC, and T2ECDon’t need T2EC](https://reader035.fdocuments.in/reader035/viewer/2022071010/5fc8c8dead4f6e1ebc0a7dac/html5/thumbnails/26.jpg)
Virtualized and Flexible ECC for Main Memory
Doe Hyun Yoon and Mattan Erez
Dept. Electrical and Computer Engineering
The University of Texas at Austin
26