Peter M. Kogge: CSE Dept. University of Notre Dame [email protected] Kanad Ghose: CS Dept.
description
Transcript of Peter M. Kogge: CSE Dept. University of Notre Dame [email protected] Kanad Ghose: CS Dept.
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
1
Morphable Computer Architecturesfor Highly Energy Aware Systems:
PACC Kickoff: May 23, 24, 2000; Scottsdale, AZ
Peter M. Kogge: CSE Dept. University of Notre Dame [email protected]
Kanad Ghose: CS Dept.SUNY-Binghamton; [email protected]
Nikzad “Benny” Toomarian: Center for Integrated Space Microsystems (CISM)
Jet Propulsion Lab; [email protected]
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
2
MORPHMORPH: Dynamic Low Energy Architectures
Profiles
Baseline
Morphable Node
Data Placement
Adaptive Algorithms
Run-time
Demo & Eval
0 6 mo 1 yr 18 mo 2 yr
New Ideas• Multi-cluster microarchitecture to allow dynamic changes in energy expended per cycle• Energy efficient ISA extensions to process data more energy efficiently• Energy efficient morphable memory hierarchies• Adaptive algorithms to select best configuration• Energy aware run-time which can reconfigure system
MORPHAdds An
““Energy Gear”Energy Gear”to
Embedded Systems
IMPACT• Changes focus to energy, not power, management •Adds extra degrees of freedom to dynamic energy control• Provides an inherently more energy efficient architecture• Designed with real embedded missions in mind
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
3
Why is PACC Important?
Real world: limited energy sources Renewable energy: 12-15 watts at high
noon Fixed capacity batteries for off-peak
sunlight or emergencies in shade Multiple operational modes, all
compute/energy constrained Movement: collision avoidance Spectroscopy: data gathering vs
analysis Communication: compression vs
transmission Today:
Select computers for peak performance needs
Limited ability to “downshift”
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
4
The Future at the Low End: Microexplorers
1997
COMMUNICATION
COMPUTING
10 kg 1 kg2002?
100 gm2007?
10 gm2012?
SENSORS
ADVANCED MOBILITY
POWER NAVIGATION
STRUCTURE
TEMPERATURECONTROL
Extremely limited energy sources => Peak computing only when absolutely necessary
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
5
Distributed Sensors Penetrators
Integrated InflatableSailcraft
Nano-Rovers
Nano-Spacecraft
Hydrobot
RLV
Atmospheric Probes
“Larger” Systems Have More Diverse Energy/Performance Profiles
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
6
Recasting The Classical Power Equation
Power = 1/2 x C x x V2
Energy/sec Logic transitions/sec
Energy/cycle x cycles/sec transitions/cycle x cycles/sec
EnergyPerCycle = 1/2 x C x N x V2
EPC is independent of clock rate!Lowering EPC is our focus!
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
7
Why is This Important?
Power = EPC x F Performance = IPC x F Today’s designs: Performance/Power = IPC/EPC
EPC & IPC are fixed at design time (other than voltage scaling) THUS: Ratio is fixed at design time Only runtime “knobs” are V and F
Real embedded scenarios: Short periods of very high peak performance need => high IPC Followed by long periods of much lower performance need
Result: long periods of lower performance still running at inefficient EPC!!
F = cycles/second
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
8
This Project:A “Morphable” System Architecture
Today’s microarchitectures: EPC = IPCk where k>>1 Our approach:
Inherently lower EPC (lower k) With variable IPC (in turn varying EPC)
Thus IPC/EPC can be varied dynamically Lowering IPC lowers EPC even more
Result: additional runtime “knobs” to run-time energy management Adjust configuration so IPC x F matches performance needs Reap energy savings of lower EPC
Allow systems to change the “Energy Gear” on demand!
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
9
The Team
SUNY-BINGHAMTON• Morphable Caches, RFs• Energy Eff VLIW archs• Supporting compiler techniques
UNIVERSITYOF NOTRE DAME
• Morphable multi-cluster architecture• “At the sense amps” ISA extension• Runtime with hooks for dynamic morphing control
JET PROPULSIONLABORATORY
• Scenarios & benchmarks• Baseline characterizations• Runtime adaptation algorithms
Energy AwareData Placement
Overall Goals:• Architectures with variable IPC, EPC• Tools & S/W to manage morphing• Realistic demonstrations
Peter KoggeVincent FreehJay Brockman
Nikzad ToomarianMohammed MojarradiSavio Chau
Kanad Ghose
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
10
Project Components
Morphable, inherently low EPC design Memory system allowing both width and placement shaping Dynamic algorithms to select best “shape” for current
energy/performance profile Augmented run-time to allow dynamic reconfiguration
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
11
Our Background
NSF MIPS: Inherently Low Power Architectures The Multi-cluster microarchitecture Cache-In-Memory Energy Efficient Caches
IEEC Binghamton: Reducing power on interconnects DARPA Processing-In-Memory Projects: HTMT & DIVA
Utilizing wide bandwidth on-chip storage macros Data placement in deep memory hierarchies Multi-threading
NASA X2000: highly scalable low power systems for deep space missions Evolvable Computing Program: adaptive algorithms to select system
parameters to meet some mission objective
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
12
How Power Explodeswith Conventional Designs
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
13
Starting A Solution:Multi Cluster Architecture
Fetch
Decode
Register File
DataCache
Fetch
Decode
R ename
Issue W indow
Register File
Bypass
DataCache
memoryd isambiguation
Fetch
Decode
Renameand steering
Issue Window
Register File
Bypass
DataCache
RAW
RAB
memorydisambiguation
Issue Window
Register File
Bypass
DataCache
RAW
RAB
memorydisambiguation
One Cluster
(a) Simple Pipeline (b) Classical Superscalar (c) New Multi Cluster
Problem: single large centralized register files with many ports Solution: multiple smaller
register files with few ports
IssueWidth(IW)
EPC/IPC ~ (IW)k
k as high as 1.9
w(IW/w)k
<< (IW)kw Clusters
IW/w
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
14
Multi-Cluster vs Conventional Results
1x6
1x41x
8
4x4
2x6
Conventional
Up to 1/2 the energy at same IPC, or 20% better IPC at same energy
2x4
4x2
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
15
Insertion into PACC
Implement CPU as nominal 4 cluster configuration Modify Instruction Issue to target variable # of clusters
Equivalent need for separating memory disambiguation units
Make this a runtime settable parameter Unused clusters turned off
Additional CPU options Implement selected subset of “wide word” & VLIW-like operations
within a cluster Utilize unused clusters for additional concurrent threads
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
16
Another Starting Point:Low Energy Caches & Register Files
Approach: exploit locality to reduce energy requirements of on-chip storage resources:
Example: multiple line buffers:
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
17
Storage System Morphs
Exploit locality to reduce dynamic AND static energy dissipations of on chip storage resources: Selective substrate biasing to reduce leakage – reverse body bias
removed when storage component is accessed Clustered data placement to maximize access to each partition within
on-chip and off-chip RAMs Compiler/OS prefetching to avoid/reduce turn-on delay
Changeable Widths of Interconnect & Storage Resources Sub-banking for caches and on-chip/off-chip RAM FU-driven selection of activation width of dispatch buffer and
reservation stations, data register files Operand-width driven activation of FU slices
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
18
ISA Extensions with Energy Reduction Potential
VLIW-like multiple move instructions Use compiler to optimize number of moves/energy Useful for many signal processing loops, numerical computations
“Wide word” multiple operation per instruction Utilize existing bandwidth more completely
Inclusion of simultaneous multi-threading extensions Allow for pipelines without costly hazard detection/forwarding
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
19
Run-Time Considerations
Application must have freedom to provide expected energy/performance of code requests for levels of service
But, only run-time sees global picture All current running applications & their requests Existing energy/power resources and mission profiles Measurements on current activities
Run-time modifications: changing the “energy gear” Number of clusters per thread Number of threads Active width of on-chip storage resources & substrate biases Active width of off-chip memory & interfaces Placement of data within hierarchy
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
20
Determining the Gear:Reconfiguration Algorithms
Approach: Use powerful parallel searches (e. g. genetic algorithms, neural
nets, etc.), possibly including hardware, to determine the optimal performance.
Payoff: Achieve high autonomy on-board spacecraft The best schedule for highest science return with lowest power
consumption Maintain functionality under changes in operating conditions
Objective: Develop reconfigurable computing
capability which will allow: Self-reconfiguration and
adaptation to unforeseen conditions Faster, cheaper development cycles
Outgrowth of JPL’s Evolvable Computing Program
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
21
Program Plan
Profiles
Baseline
Morphable Node
Data Placement
Adaptive Algorithms
Run-time
Demo & Eval
0 6 mo 1 yr 18 mo 2 yr
Optional 3rd year: high level design & demo on FPLA or MOSIS prototype of run-time investigation of needed program development environment demo in JPL test bed analysis for insertion into real JPL mission
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
22
Expected Deliverables
Benchmark suite & corresponding mission energy profiles Detailed morphable architecture System simulator with energy & performance projections &
evaluation against profiles Demonstration of data placement & architectural adaptation
algorithms Specification of energy aware run-time & API
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
23
Some Recent References
Zyuban, Victor and Peter M. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” submitted to IEEE Trans. On Computers
Zyuban, Victor and Peter M. Kogge, "Optimization of High-Performance Super-Scalar Architectures for Energy-Delay Product," accepted for ISPLED 2000
K. Ghose, “Reducing Energy Requirements for Instruction Issue and Dispatch in Superscalar Processors”, accepted for ISLPED 2000
K. Ghose and M. B. Kamble, “Reducing Power in Superscalar Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation”, ISPLED’99, pp. 70-75.
Zyuban, Victor and Peter M. Kogge, "The Energy Complexity of Register Files,” ISPLED’98, pp.305-310.
K. Ghose and M. B. Kamble “Energy-efficient Cache Organizations for Superscalar Processors”, Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98
Zyuban, Victor and Peter M. Kogge, "Split Register File Architecture for Inherently Lower Power Architectures," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98.
Zawodny, Jason T., Jay B. Brockman, Peter M. Kogge, Eric Johnson, "Cache-In-Memory: A Lower Power Alternative," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98.
M.B. Kamble and K. Ghose, “Analytical Energy Dissipation Models for Low Power Caches, “ ISPLED’97, pp. 143-148.
M.B. Kamble and K. Ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” IEEE 10-th. Int’l. Conf. on VLSI Design, Jan. 1997, pp. 261-267.
May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt
24
“Just enough energy”