Peter M. Kogge: CSE Dept. University of Notre Dame [email protected] Kanad Ghose: CS Dept.

May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt

1

Morphable Computer Architecturesfor Highly Energy Aware Systems:

PACC Kickoff: May 23, 24, 2000; Scottsdale, AZ

Peter M. Kogge: CSE Dept. University of Notre Dame [email protected]

Kanad Ghose: CS Dept.SUNY-Binghamton; [email protected]

Nikzad “Benny” Toomarian: Center for Integrated Space Microsystems (CISM)

Jet Propulsion Lab; [email protected]


2

MORPHMORPH: Dynamic Low Energy Architectures

Profiles

Baseline

Morphable Node

Data Placement

Adaptive Algorithms

Run-time

Demo & Eval

0 6 mo 1 yr 18 mo 2 yr

New Ideas• Multi-cluster microarchitecture to allow dynamic changes in energy expended per cycle• Energy efficient ISA extensions to process data more energy efficiently• Energy efficient morphable memory hierarchies• Adaptive algorithms to select best configuration• Energy aware run-time which can reconfigure system

MORPHAdds An

““Energy Gear”Energy Gear”to

Embedded Systems

IMPACT• Changes focus to energy, not power, management •Adds extra degrees of freedom to dynamic energy control• Provides an inherently more energy efficient architecture• Designed with real embedded missions in mind


3

Why is PACC Important?

Real world: limited energy sources Renewable energy: 12-15 watts at high

noon Fixed capacity batteries for off-peak

sunlight or emergencies in shade Multiple operational modes, all

compute/energy constrained Movement: collision avoidance Spectroscopy: data gathering vs

analysis Communication: compression vs

transmission Today:

Select computers for peak performance needs

Limited ability to “downshift”


4

The Future at the Low End: Microexplorers

1997

COMMUNICATION

COMPUTING

10 kg 1 kg2002?

100 gm2007?

10 gm2012?

SENSORS

ADVANCED MOBILITY

POWER NAVIGATION

STRUCTURE

TEMPERATURECONTROL

Extremely limited energy sources => Peak computing only when absolutely necessary


5

Distributed Sensors Penetrators

Integrated InflatableSailcraft

Nano-Rovers

Nano-Spacecraft

Hydrobot

RLV

Atmospheric Probes

“Larger” Systems Have More Diverse Energy/Performance Profiles


6

Recasting The Classical Power Equation

Power = 1/2 x C x x V2

Energy/sec Logic transitions/sec

Energy/cycle x cycles/sec transitions/cycle x cycles/sec

EnergyPerCycle = 1/2 x C x N x V2

EPC is independent of clock rate!Lowering EPC is our focus!


7

Why is This Important?

Power = EPC x F Performance = IPC x F Today’s designs: Performance/Power = IPC/EPC

EPC & IPC are fixed at design time (other than voltage scaling) THUS: Ratio is fixed at design time Only runtime “knobs” are V and F

Real embedded scenarios: Short periods of very high peak performance need => high IPC Followed by long periods of much lower performance need

Result: long periods of lower performance still running at inefficient EPC!!

F = cycles/second


8

This Project:A “Morphable” System Architecture

Today’s microarchitectures: EPC = IPCk where k>>1 Our approach:

Inherently lower EPC (lower k) With variable IPC (in turn varying EPC)

Thus IPC/EPC can be varied dynamically Lowering IPC lowers EPC even more

Result: additional runtime “knobs” to run-time energy management Adjust configuration so IPC x F matches performance needs Reap energy savings of lower EPC

Allow systems to change the “Energy Gear” on demand!


9

The Team

SUNY-BINGHAMTON• Morphable Caches, RFs• Energy Eff VLIW archs• Supporting compiler techniques

UNIVERSITYOF NOTRE DAME

• Morphable multi-cluster architecture• “At the sense amps” ISA extension• Runtime with hooks for dynamic morphing control

JET PROPULSIONLABORATORY

• Scenarios & benchmarks• Baseline characterizations• Runtime adaptation algorithms

Energy AwareData Placement

Overall Goals:• Architectures with variable IPC, EPC• Tools & S/W to manage morphing• Realistic demonstrations

Peter KoggeVincent FreehJay Brockman

Nikzad ToomarianMohammed MojarradiSavio Chau

Kanad Ghose


10

Project Components

Morphable, inherently low EPC design Memory system allowing both width and placement shaping Dynamic algorithms to select best “shape” for current

energy/performance profile Augmented run-time to allow dynamic reconfiguration


11

Our Background

NSF MIPS: Inherently Low Power Architectures The Multi-cluster microarchitecture Cache-In-Memory Energy Efficient Caches

IEEC Binghamton: Reducing power on interconnects DARPA Processing-In-Memory Projects: HTMT & DIVA

Utilizing wide bandwidth on-chip storage macros Data placement in deep memory hierarchies Multi-threading

NASA X2000: highly scalable low power systems for deep space missions Evolvable Computing Program: adaptive algorithms to select system

parameters to meet some mission objective


12

How Power Explodeswith Conventional Designs


13

Starting A Solution:Multi Cluster Architecture

Fetch

Decode

Register File

DataCache

Fetch

Decode

R ename

Issue W indow

Register File

Bypass

DataCache

memoryd isambiguation

Fetch

Decode

Renameand steering

Issue Window

Register File

Bypass

DataCache

RAW

RAB

memorydisambiguation

Issue Window

Register File

Bypass

DataCache

RAW

RAB

memorydisambiguation

One Cluster

(a) Simple Pipeline (b) Classical Superscalar (c) New Multi Cluster

Problem: single large centralized register files with many ports Solution: multiple smaller

register files with few ports

IssueWidth(IW)

EPC/IPC ~ (IW)k

k as high as 1.9

w(IW/w)k

<< (IW)kw Clusters

IW/w


14

Multi-Cluster vs Conventional Results

1x6

1x41x

8

4x4

2x6

Conventional

Up to 1/2 the energy at same IPC, or 20% better IPC at same energy

2x4

4x2


15

Insertion into PACC

Implement CPU as nominal 4 cluster configuration Modify Instruction Issue to target variable # of clusters

Equivalent need for separating memory disambiguation units

Make this a runtime settable parameter Unused clusters turned off

Additional CPU options Implement selected subset of “wide word” & VLIW-like operations

within a cluster Utilize unused clusters for additional concurrent threads


16

Another Starting Point:Low Energy Caches & Register Files

Approach: exploit locality to reduce energy requirements of on-chip storage resources:

Example: multiple line buffers:


17

Storage System Morphs

Exploit locality to reduce dynamic AND static energy dissipations of on chip storage resources: Selective substrate biasing to reduce leakage – reverse body bias

removed when storage component is accessed Clustered data placement to maximize access to each partition within

on-chip and off-chip RAMs Compiler/OS prefetching to avoid/reduce turn-on delay

Changeable Widths of Interconnect & Storage Resources Sub-banking for caches and on-chip/off-chip RAM FU-driven selection of activation width of dispatch buffer and

reservation stations, data register files Operand-width driven activation of FU slices


18

ISA Extensions with Energy Reduction Potential

VLIW-like multiple move instructions Use compiler to optimize number of moves/energy Useful for many signal processing loops, numerical computations

“Wide word” multiple operation per instruction Utilize existing bandwidth more completely

Inclusion of simultaneous multi-threading extensions Allow for pipelines without costly hazard detection/forwarding


19

Run-Time Considerations

Application must have freedom to provide expected energy/performance of code requests for levels of service

But, only run-time sees global picture All current running applications & their requests Existing energy/power resources and mission profiles Measurements on current activities

Run-time modifications: changing the “energy gear” Number of clusters per thread Number of threads Active width of on-chip storage resources & substrate biases Active width of off-chip memory & interfaces Placement of data within hierarchy


20

Determining the Gear:Reconfiguration Algorithms

Approach: Use powerful parallel searches (e. g. genetic algorithms, neural

nets, etc.), possibly including hardware, to determine the optimal performance.

Payoff: Achieve high autonomy on-board spacecraft The best schedule for highest science return with lowest power

consumption Maintain functionality under changes in operating conditions

Objective: Develop reconfigurable computing

capability which will allow: Self-reconfiguration and

adaptation to unforeseen conditions Faster, cheaper development cycles

Outgrowth of JPL’s Evolvable Computing Program


21

Program Plan

Profiles

Baseline

Morphable Node

Data Placement

Adaptive Algorithms

Run-time

Demo & Eval

0 6 mo 1 yr 18 mo 2 yr

Optional 3rd year: high level design & demo on FPLA or MOSIS prototype of run-time investigation of needed program development environment demo in JPL test bed analysis for insertion into real JPL mission


22

Expected Deliverables

Benchmark suite & corresponding mission energy profiles Detailed morphable architecture System simulator with energy & performance projections &

evaluation against profiles Demonstration of data placement & architectural adaptation

algorithms Specification of energy aware run-time & API


23

Some Recent References

Zyuban, Victor and Peter M. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” submitted to IEEE Trans. On Computers

Zyuban, Victor and Peter M. Kogge, "Optimization of High-Performance Super-Scalar Architectures for Energy-Delay Product," accepted for ISPLED 2000

K. Ghose, “Reducing Energy Requirements for Instruction Issue and Dispatch in Superscalar Processors”, accepted for ISLPED 2000

K. Ghose and M. B. Kamble, “Reducing Power in Superscalar Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation”, ISPLED’99, pp. 70-75.

Zyuban, Victor and Peter M. Kogge, "The Energy Complexity of Register Files,” ISPLED’98, pp.305-310.

K. Ghose and M. B. Kamble “Energy-efficient Cache Organizations for Superscalar Processors”, Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98

Zyuban, Victor and Peter M. Kogge, "Split Register File Architecture for Inherently Lower Power Architectures," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98.

Zawodny, Jason T., Jay B. Brockman, Peter M. Kogge, Eric Johnson, "Cache-In-Memory: A Lower Power Alternative," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98.

M.B. Kamble and K. Ghose, “Analytical Energy Dissipation Models for Low Power Caches, “ ISPLED’97, pp. 143-148.

M.B. Kamble and K. Ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” IEEE 10-th. Int’l. Conf. on VLSI Design, Jan. 1997, pp. 261-267.


24

“Just enough energy”

Peter M. Kogge: CSE Dept. University of Notre Dame [email protected] Kanad Ghose: CS Dept.

Documents

Transcript of Peter M. Kogge: CSE Dept. University of Notre Dame [email protected] Kanad Ghose: CS Dept.