Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… ·...

46
Reduce FPGA Power With Automatic Optimization & Power-Efficient Design Vaughn Betz & Sanjay Rajput

Transcript of Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… ·...

Page 1: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Reduce FPGA Power With Automatic Optimization &

Power-Efficient Design

Vaughn Betz & Sanjay Rajput

Page 2: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Silicon vs. Software Comparison

(1) The Xilinx ISE XPower Tool Crashes

100%

80%

60%

40%

20%

0%

20%

-40%

-60%

-80%

-100%Undere

stim

ation

Overe

stim

ation

3%

3%

20%

14%

13% 20%

20%

26%

7%

34%

28%

130%

187%

25%

5% 1

4%

N/A(1)

N/A(1)

-100%

-86%

-86%

-86%

-84%

-84%

-82%

-75% -6

3%

-55% -4

2%

-14%

-17%

-19%

-5%

-1%

-5%

-2%

-12% -6% -3%

-6%

100%

80%

60%

40%

20%

0%

20%

-40%

-60%

-80%

-100%

Perc

ent

Err

or

Undere

stim

ation

Overe

stim

ation

3%

3%

20%

14%

13% 20%

20%

26%

7%

34%

28%

130%

187%

25%

5% 1

4%

N/A(1)

N/A(1)

-100%

-86%

-86%

-86%

-84%

-84%

-82%

-75% -6

3%

-55% -4

2%

-14%

-17%

-19%

-5%

-1%

-5%

-2%

-12% -6% -3%

-6%

Previous Power Net Seminar…

Quartus II PowerPlay: The Most Accurate Power Estimation Tool

Quartus® II PowerPlay

ISE XPower

Page 3: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Accurate Models Lead to Intelligent Decisions

� Intelligent decisions on power &

thermal management

− Accurate power models are critical

� For FPGA software to make intelligent power

optimization decisions

− Accurate power models are critical

Page 4: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Questions We’ll Answer Today

� How do I estimate my design’s power?

� How does power optimization work?

� How effective is Quartus II PowerPlay in reducing power?

� What are design techniques for reducing power?

� How can I ensure I’m using the best CAD settings for power?

Page 5: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Power Analysis Overview

Getting the Best Accuracy

Page 6: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Early Power

Estimate

Power Estimation

Post Place & Route Power

Estimates

� Accurate power estimates & analysis allow− Time to optimize system

− FPGA CAD software to optimize design power− Design decisions that reduce power

Actual Power(Too Late!)

Page 7: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Early Power Estimator

Page 8: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Early Power Estimator Accuracy

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

Stratix® II design

EP

E/Q

uart

us d

yn

am

ic p

ow

er

� Dynamic power accuracy limited by− Lack of place & route data

− User entry: Resources, toggle rates, enable rates

+/-20% Error vs. Power Analyzer With Perfect User Data Entry

Page 9: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Quartus II Power Analyzer

� More accurate than Early Power Estimator− Has precise resource counts

− Has power models for each RAM/ALM/DSP mode

− Considers exact routing used

� Best accuracy: Simulate for toggle rates

� Reduced accuracy: Vectorless analysis for

toggle rates

Page 10: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Example: Power vs. RAM Mode

8700+ Designs Covering All FPGA Elements

M4K RAM M512 RAM MRAM

Dyn

am

ic p

ow

er

(mW

/Mh

z)

0

1

2

3

4

5

6

7

8

9

10

RAM configuration

Quartus II EstimateEP2S60 Measurement

Page 11: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Quartus II Automatic Power Optimization

Settings, Algorithms and Results

Page 12: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Power Optimization Overview

Timing

Critical?

Yes

No

Min

Delay

Min

Area

Timing-Driven Compiler

Timing

Critical?

Yes

No

Min

Delay

Min

Area

Power-Driven Compiler

PowerCritical?

Yes

No

Min

Power

Page 13: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

PowerPlay Power Optimization

� Power-driven synthesis & fitting

� Normal effort (default)− Does not increase compile time

− No reduction in design performance

− No need for toggle rate data from simulation

� Extra effort − Larger power reduction

− May increase compile time

− All timing optimization enabled, but performance loss possible

− Best with toggle rate data from simulation

Page 14: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

PowerPlay Power Optimization

Automatic, but less accurate

Requires testbench,

more accurate

Evaluate Power

RTLSimulation

Power-Driven Fit

Design

Power Report

VectorlessEstimation

HardwareMeasurement

Gate-Level Simulation+ Power Analyzer

Estimate Toggle Rates

Normalor

Extra effort

Power-DrivenSynthesis

Page 15: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Three Components of Power

67%

22%

11%

Core Dynamic

Core Static

I/O

Dynamic Power DominantFocus of Power Optimization

99 Customer Designs,Stratix II Devices

Page 16: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Core Dynamic Power

*Digital Signal Processing (DSP) Block power: 5% of Dynamic power for designs that use DSP blocks

99 Customer Designs, Stratix II Devices

Routing38%

ALMCombinational

19%

ALMRegisters

18%

RAM Blocks14%

Clock Networks9%

DSP Blocks2%*

Page 17: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Power-Optimal RAM Mapping

�Power inefficient to uselarge physical RAM for small memory or vice versa

�Tri-matrix memory gives 3 physical RAM sizes

�Quartus II automatically picks best physical RAM

LargeRAM

Small RAM

MRAM

M4KMediumRAM M4K

M512

Page 18: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

RAM Enable Optimization

� Shut RAM down when unused � less power

− Convert read/write enable to clock enable

RAM

Core

Read Addr

Read Data

Read En

Write Addr

Write Data

Write En

Read Clk Write Clk

Page 19: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Power-Optimized RAM Mapping

Power Efficient Option

16

2:4Decoder

4 256x16 M4K RAMs

Default Option

16

4 1Kx4 M4K RAMs

1K X 16 RAM

Page 20: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

RAM Power Reduction (Stratix II)� Normal effort: 6% average savings

� Extra effort: 21% average savings

-90%

-80%

-70%

-60%

-50%

-40%

-30%

-20%

-10%

0%

10%

Design

RA

M d

yn

am

ic p

ow

er

ch

an

ge

Normal Effort

Extra Effort

Page 21: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Power-Driven Place & Route� Minimize capacitance of high-toggling signals

� Without violating timing constraints

Power Optimize

100 Million Toggle/s20 Million Toggle/s

Page 22: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Clock Shut Down Hardware

Blue: Clock

Required

Only Red

Partsof Clock

NetworkToggle

� Stratix II: Can shut down clock at 3 levels of tree− Top-level: Shut down 1/8 of clock tree

− Next-level: 1/500 of clock tree

− Bottom-level: 1/5000 of clock tree

Page 23: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Placement to Reduce Clock Power

Power Optimize

Group Clocks for Maximum Shutdown

Clocking Legal, Timing Optimized

Page 24: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Experimental Flow

Source HDL

Compare Dynamic Power

Synthesis & Fittingwith PowerPlay

Synthesis & Fitting

33 Designs:Customer & Internal Altera

Quartus II 5.1Quartus II 5.0 SP1

AL14

Page 25: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Slide 24

AL14 I don't see any results. Is it slide 25 and 26?Amy Lee, 02/12/2005

Page 26: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Stratix II Results: Normal EffortLogic

IntensiveDSP

IntensiveRAM

Intensive Balanced

-60%

-50%

-40%

-30%

-20%

-10%

0%

Dyn

am

ic P

ow

er

Red

ucti

on

vs.

Qu

art

us 5

.0

Page 27: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Stratix II Results: Extra Effort

0%

-60%

-50%

-40%

-30%

-20%

-10%

Dyn

am

ic P

ow

er

Red

ucti

on

vs.

Qu

art

us 5

.0

Logic Intensive

DSP Intensive

RAMIntensive Balanced

Page 28: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Stratix II Results Summary

PowerOptimization Flow

Dynamic Power vs.

QII 5.0

Fmax

Change

Compile Time

Change

Effort: Normal -15% -- --

Effort: ExtraToggle rates: Simulation

-21% -2% +9%

Effort: ExtraToggle rates: Vectorless

-18% -2% +9%

Effort: ExtraToggle rates: SimulationEasy timing constraint

-25% -- +100%

Page 29: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Designing for Lower Power

Six Tips for Reducing Core Power

Page 30: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

1. Use Hard IP Blocks

� Synthesis tool inferring all the hard blocks it should?− Check with floor-plan editor or HDL viewer

� DSP Blocks− Less power than logic elements except for small

multiplies (e.g. 5x5)

− Use all the DSP logic (not just multipliers):� Multiplier-accumulator, complex-multiplier, finite

impulse response sample chaining, etc.

− Use altmult_accum MegaFunction if synthesis not inferring

Page 31: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

1. Use Hard IP Blocks (Cont.)

� RAM blocks− Usually inferred by synthesis

− Use altsyncram MegaFunction if necessary

� Shift registers− Many toggling signals: Power inefficient

− Medium to large shift registers: Implement in FIFOs

− Use altshift_taps MegaFunction if necessary

Page 32: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

2. Synthesize for Area

� Less logic usually means less power

− Fewer signals to toggle

− If sufficient timing margin, tell synthesis tool to optimize for area

Page 33: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

3. Help Shut Down RAM Blocks

� Specify read-enable & write-enable signals on

your RAMs whenever possible

− PowerPlay will convert to clock enables

− Completely shuts down RAM on many cycles

� Leave RAM Block Type = Auto− Power optimizer will choose best RAM block

Page 34: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

4. Use Clock Enables on Logic

� When clock enable is low1. Register + downstream logic don’t toggle

2. Clock inside a logic array block (LAB) is shut off

� Even when clock enable is not needed for

functionality, it can save power

Page 35: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

5. Minimize Glitching

� Some logic produces many transitions/cycle

− For example, CRC/parity, combinational multipliers

− Find “Glitchy” logic via power analyzer report

− Register inputs & outputs of “Glitchy” logic

− Insert pipeline registers if possible

Downstream

Logic

DQ

Page 36: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

6. Shut Down Idle Clocks

� Entire clock domain unused in some cycles

− Use the altclkctrl MegaFunction to safely gate the clock

− Shuts down entire clock tree � lower power than

a clock enable on all registers

Page 37: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Power Optimization Advisor & Design Space Explorer

“Your FAEs in Quartus”

Page 38: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Power Optimization Advisor

� Explains power analysis best practices

� Power optimization suggestions− In priority order

� Highlights recommended settings not enabled

on your design

� Button to correct settings

� Located in Tools Menu

Page 39: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Power Optimization Advisor

Page 40: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Design Space Explorer

� Searches Quartus options

to find best implementation

− “Search for Lowest Power”

− Finds settings that minimize power & meet timing

− quartus_sh --dse or Tools Menu

Page 41: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Summary & Wrap Up

Page 42: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Altera’s PowerPlay Suite

� Most accurate power analysis tools

� Automatic power optimization− Industry first

− Highly effective

− 20% to 25% dynamic power reduction on average

� Power Optimization Advisor guides you to

lower power

Better Tools: Lower Power & No Surprises

Page 43: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Stratix II vs. Virtex-4 Power

Logic intensive

DSP intensive

RAMintensive Balanced

-70%

-60%

-50%

-40%

-30%

-20%

-10%

0%

10%

20%

Dyn

am

ic P

ow

er:

Str

ati

x II vs

. V

irte

x-4

Page 44: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

Power & Altera Hardware

� Stratix II− Lowest dynamic power & lowest total power

for 90nm high-density FPGAs

� Cyclone™ II− Lowest power of any FPGA

� HardCopy® II− Smooth migration to lower power

− Typical: 2X to 3X reduction vs. FPGA

Page 45: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Copyright © 2005 Altera Corporation

� Quartus II Handbook Volume 2, Chapter 9. -

Power Optimization

www.altera.com/quartus2

� Stratix II power page

www.altera.com/stratix2power

� Cyclone II power page

www.altera.com/cyclone2power

� Power management resource center

www.altera.com/power

Resources & More Information

Page 46: Reduce FPGA Power With Automatic Optimization & Power ...vaughn/papers/power_optimization_2… · Automatic power optimization − Industry first − Highly effective − 20% to 25%

Thank You

For More InformationVisit www.altera.com