Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan...

16
Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan

Transcript of Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan...

Page 1: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

Self-* SystemsCSE 598B

Paper title: Dynamic ECC tuning for caches

Presented by: Niranjan Soundararajan

Page 2: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

2

Abstract

On-chip caches are increasing in sizes– Protection needed in order to store correct data.– ECC serves as an efficient means to protect the

data ECC has its own overhead

– Area: Extra space for its logic– Latency: ECC computations take time

This work deals with reducing latency involved in ECC computation.– Track the cache lines frequently accessed. – Dynamically turn ECC computation on and off for

specific cache lines.

Page 3: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

3

Background

Information redundancyData in the processing core is protected by schemes such as RMT (Redundant Multi-threading) [1][2].ECC protection is easy to implement for on-chip caches. Also size of the caches prevent them from being replicated [3].

Current evaluation shows raw FIT (Failures In Time) rate numbers for latches and SRAM cells to vary between 0.001 – 0.01 FIT/bit. This value increases with elevation. At 1.5 km – FIT/bit is 3.5x while at 10 km (airplanes) – FIT/bit is 100x larger [4][5][6][7].

Page 4: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

4

Background

As processor power dissipation becomes more and more important, supply voltages get reduced. This will greatly increase the FIT/bit [8][9].

“As an example, consider a 32 MB data cache. This cache has 222 quad-words. Let us assume that an SRAM cell has an average FIT rate of 0.001. The single-bit FIT rate for the entire cache is 0.001 * 222 * 72 = 3.02 * 105,i.e. the MTTF is 109 / (3.02 * 105) = 3311 hours” [3].

Consider the case of large multiprocessor systems with tens of megabytes of caches. Protection becomes an important issue if the systems are involved in critical computations like space research and flight control.

Page 5: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

5

Background

All these data point out that cache data must be protected. ECC is the best way to protect SRAM.

This work addresses some of the problems related to applying ECC for caches that need to operate at low latency like the L1 caches.

Page 6: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

6

Motivation

ECC overhead [10]– Increase in area due to circuitry – 11% (approx.

15mm2)– Increase in latency – 10% (approx 5 ns)

Applications show temporal locality in accessing cache lines. By dynamically turning ECC on and off for cache lines, latency of cache access gets reduced. Since the frequency of operations is going to be high, the time between individual accesses is going to be less.

Page 7: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

7

Motivation

Chance of error affecting data is less– Due to frequency of operations– Cache lines with high temporal locality is

less compared to total number of cache lines.

Page 8: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

8

Benchmarks

wupwise

0

100

200

300

400

500

600

0 10000 20000 30000 40000 50000 60000 70000 80000

cycle time

cach

e lin

e

vpr-route

0

100

200

300

400

500

600

0 10000 20000 30000 40000 50000 60000 70000

cycle time

cach

e lin

e

Page 9: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

9

Benchmarks

Parser

0

100

200

300

400

500

600

270000 280000 290000 300000 310000 320000 330000

Cycle time

cach

e lin

e

Perlbmk

0

100

200

300

400

500

600

274000 284000 294000 304000 314000 324000 334000 344000 354000 364000

Cycle time

Cach

e lin

e

Page 10: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

10

Benchmarks

Gcc

0

100

200

300

400

500

600

460000 480000 500000 520000 540000 560000

Cycle time

Cac

he li

ne

Gzip

0

100

200

300

400

500

600

195000 205000 215000 225000 235000 245000

Cycle time

Cac

he li

ne

Page 11: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

11

Self-Tuning Implementation

Keep track of cache line access. After every 5000 cycles, tune the ECC of cache lines to turn on or off.

Overhead:– Keeping track of cache line access: Simple,

fast counters make implementation easy.– Tuning ECC for lines: Simple average

computation and turning ECC on for lines with more activity than the average.

Page 12: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

12

Implementation

Implementation simplified if – Counters maintained for a set of cache

lines.– ECC tuning done at this granularity.

Granularity can be at 10, 20 … 100 lines.

Page 13: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

13

Self-tuning Results

From the graphs we see the temporal locality. Based on these results, ECC was turned off for the lines with high locality.

BENCHMARK RELATIVE ACCESS

FREQUENCY

OVERHEAD

WUPWISE 8.3 0.5%

VPR-ROUTE 5.9 2.5%

PARSER 11.9 2.7%

PERLBMK 6.3 2.5%

GCC 2.7 1.9%

GZIP 4.7 3.2%

Page 14: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

14

Conclusion

ECC is indispensable as chip reliability reduces and maintaining correct data becomes an issue.

Processor-Memory bottleneck is an eternal issue. Increasing cache latency through ECC protection creates further problems.

This work tries to reduce cache (protected by ECC) latency using a scheme to dynamically turn ECC on and off.

Page 15: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

15

References

[1] S. S. Mukherjee, M. Kontz, and S. K. Reinhardt, “Detailed Design and Evaluation of Redundant Multithreading Alternatives,” ISCA, 2002.

[2] S. S. Mukherjee, C. T. Weaver, J. Emer, S. K. Reinhardt, and T. Austin, “A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor,” MICRO, December 2003.

[3] S. S. Mukherjee, T. Fossum, J. Emer, and S. K. Reinhardt, “Cache Scrubbing in Microprocessors: Myth or Necessity?” 10th International Symposium on Pacific Rim Dependable Computing (PRDC), Papeete, Tahiti, March 2004.

[4] J.F.Ziegler, “Terrestrial cosmic rays,” IBM J. of Research and Development, pp. 19 – 39, Vol. 40, No. 1, Jan. 1996.

[5] Y.Tosaka, S.Satoh, K.Suzuki, T.Suguii, H.Ehara, G.A.Woffinden, and S.A.Wender, “Impact of Cosmic Ray Neutron Induced Soft Errors, on Advanced Submicron CMOS circuits,” VLSI Symposium on VLSI Technology Digest of Technical Papers, 1996.

Page 16: Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.

16

References

[6] T.Karnik, B.Bloechel, K.Soumyanath, V.De, and S.Borkar, “Scaling trends of Cosmic Rays induced Soft Errors in static latches beyond 0.18µ ,” Symposium on VLSI Circuits Digest of Technical Papers, 2001.

[7] S.Hareland, J. Maiz, M.Alavi, K.Mistry, S.Walstra, and C.Dai, “Impact of CMOS Scaling and SOI on soft error rates of logic processes,” Symposium on VLSI Technology Digest of Technical Papers, 2001.

[8]Robert Baumann, “Soft Errors in Commercial Semiconductor Technology: Overview and Scaling Trends,” IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals, pp. 121_01.1 – 121_01.14, April 7, 2002.

[9] P.Shivakumar, M.Kistler, S.W.Keckler, D.Burger, and L.Alvisi, “Modeling the Effect of Technology Trends on the Soft Error Rate of Combinatorial Logic,” Dependable Systems and Networks, 2002.

[10] H. L. Kalter et al., “A 50-ns 16-Mb DRAM with a 10 ns data rate and on-chip ECC,” IEEE J. Solid-State Circuits, vol. 25, pp. 1118–1128, Oct. 1990.