Pushing Performance, Efficiency and Scalability of Microprocessors
description
Transcript of Pushing Performance, Efficiency and Scalability of Microprocessors
![Page 1: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/1.jpg)
Pushing Performance, Efficiency and Scalability of Microprocessors CERCS IAB Meeting, Fall 2006Gabriel Loh
![Page 2: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/2.jpg)
Research Overview
• Funding from state of GA, Intel, MARCO
• Currently 2 PhD students, 2 MS– Active undergrad research as well
• Collaborations– Universities: PSU, UO, Rutgers– Industry: Intel, IBM
![Page 3: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/3.jpg)
Research Focus
• “Near-term” microprocessor design issues– ~ 5-year time scale– Power/performance/complexity– Traditional uniprocessor performance– Multi-core performance
• “Longer-term”– Keeping Moore’s Law alive for the longer
term– Primarily, 3D integration for now
![Page 4: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/4.jpg)
Scaling Performance and Efficiency• Multi-cores are here, but single-
thread perf still matters– Intel Core 2 Duo is multi-core, but…– Single core is more OOO than ever
• Larger instruction window, improved branch prediction, speculative load-store ordering, wider pipe and decoders
– But power also really matters• Lower clock speeds, different channel length
transistors, more uop fusion, …
![Page 5: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/5.jpg)
Research Focus
• Maximum performance within bounds– Bounds = power, area, TDP, …
• Single-core performance helps multi-core performance, too– For future multi-core systems, need to strike a
good balance between 1T and MT
• Most of our research is at the uarch level– Caches, branch predictors, instruction
schedulers, memory queue design, memory dependence prediction, etc.
![Page 6: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/6.jpg)
Highlight: Traditional Caching [MICRO’06]
• Well known that different apps respond differently to different replacement policies
• Previous work in the OS domain has described adaptive replacement with provable bounds on performance
• Adapted techniques for on-chip caches
![Page 7: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/7.jpg)
Idea…
![Page 8: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/8.jpg)
Adaptive Cache Implementation
• Theoretical Guarantees– Miss rate provably bounded to be within
a factor of two of the better algorithm
In practice,it’s much better
![Page 9: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/9.jpg)
Current Research
• Working on multi-core generalizations of adaptive caching and other ways to manage shared resources
• Uniprocessor microarchitecture– Scalable memory scheduling [MICRO’06]– Memory dependence prediction
[HPCA’06]– Branch prediction […]– And more…
![Page 10: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/10.jpg)
Longer-Term Processor Scaling
• Limitations/Obstacles– Wire scaling
• Latency/performance• Power
– Feature size• Lithography, parametric variations
– Off-chip communication
![Page 11: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/11.jpg)
3D Integration
• Wire– Power/perf.
• Off-chip• Feature size
– Limitations, variations
ActiveLayer 1
ActiveLayer 2
MetalLayers 1
Die-to-DieVias
Die/Wafer Stacking
MetalLayers 2
Less RC faster, lower-power
![Page 12: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/12.jpg)
Example: Caches
Simplified 2D SRAM Array 3D Bitline Stacking
Wordline length halved
• in our studies, WL was critical for latency
3D Wordline Stacking
Bitline length halved
• BL reduction has greater impact on power savings• Split decoder no activity stacking
We’ve studieda wide varietyof other CPU
building blocks
![Page 13: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/13.jpg)
Uarch-level 3D design
Example: 4-die significance-partitioned datapathUse uarch prediction mechanism for early determination of width
Smaller footprint faster and lower-power
Width-based gating even lower power,
close to original power density
Overall: 47% performance gain atonly 2 degree temperature increase
![Page 14: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/14.jpg)
3D Research Summary
• Circuit-level [ICCD’05,ISVLSI’06,ISCAS’06,GLSVLSI’06]
• Uarch-level [MICRO’06 (w/ ),HPCA’07]
• Tutorial papers [JETC’06]
• Tutorial [MICRO’06]
• Tools [DATE’06,TCAD’07] w/ GTCAD &
• Parametric Variations w/ Jim Meindl
• Funding, equip from ,
![Page 15: Pushing Performance, Efficiency and Scalability of Microprocessors](https://reader033.fdocuments.in/reader033/viewer/2022042703/5681400a550346895dab4440/html5/thumbnails/15.jpg)
Summary
• loh@cc• http://www.cc.gatech.edu/~loh
• Lots of exciting work going on here