Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides ›...

44
Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles [email protected] Partially supported by NSF Grants CCR-0096383, and CCR-0306682, and Altera under the California MICRO program UCLA UCLA Joint work with Deming Chen, Lei He, Fei Li, Yan Lin

Transcript of Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides ›...

Page 1: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Architecture and Synthesis for Power-Efficient FPGAs

Jason CongUniversity of California, Los Angeles

[email protected]

Partially supported by NSF Grants CCR-0096383, and CCR-0306682, and Altera under the California MICRO program

UCLAUCLA

Joint work with Deming Chen, Lei He, Fei Li, Yan Lin

Page 2: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Outline

IntroductionUnderstanding Power Consumption in FPGAsArchitecture Evaluation and Power OptimizationLow Power SynthesisConclusions

Page 3: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Why? FPGA is Known to be Power Inefficient!

FPGA consumes 50-100X more powerWhy do we care about power optimization for FPGAs ?!

Source:[Zuchowski, et al, ICCAD02]

Page 4: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

FPGA Advantages

Short TAT (total turnaround time)No or very low NRE

Page 5: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

ASICs Become Increasingly Expensive

Traditional ASIC designs are facing rapid increase of NRE and mask-set costs at 90nm and below

Source: EETimes

7.512

40

60

$0.0

$0.5

$1.0

$1.5

$2.0

$2.5

250nm 180nm 130nm 100nm

Tot

al C

ost f

or M

ask

Set (

$M)

0

$10

$20

$30

$40

$50

$60

Cos

t/Mas

k ($

K)

Process (um) 2.0 … 0.8 0.6 0.35 0.25 0.18 0.13 0.10

Single Mask cost ($K) 1.5 1.5 2.5 4.5 7.5 12 40 60

# of Masks 12 12 12 16 20 26 30 34

Mask Set cost ($K) 18 18 30 72 150 312 1,000 2,000

Page 6: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Our Research

Power EfficientFPGAs

Circuit Design

Fabric Design

System Design

Synthesis Tools

Page 7: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Outline

IntroductionUnderstanding Power Consumption in FPGAsArchitecture Evaluation and Power OptimizationLow Power SynthesisConclusions

Page 8: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

FPGA Architecture

Programmable IO

KLUTInputs D FF

Clock

Out

BLE# 1

BLE# N

NOutputs

I Inputs

Clock

I

N

Programmable Logic

Programmable Routing

Page 9: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

BC-Netlist

BC-NetlistGenerator

Power Simulator

Power

BLIF

Logic Optimization(SIS)

Tech-Mapping (RASP)

Timing-Driven Packing (TV-Pack)

Placement & Routing (VPR)

SLIF

DelayArea

Arch Spec

BLIF

Logic Optimization(SIS)

Tech-Mapping (RASP)

Timing-Driven Packing (TV-Pack)

Placement & Routing (VPR)

SLIF

DelayArea

Evaluation Framework – fpgaEva-LP

fpgaEva-LP [Li, et al, FPGA’03]

Page 10: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

BC-Netlist Generator

Mapped Netlist Layout

Buffer Extraction

Netlist Generation for Logic Clusters

Capacitance Extraction

Delay Calculation

BC-Netlist

Back-annotation

Page 11: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Mixed-level Power Model – Overview

Dynamic powerSwitching power Short-circuit power

Related to signal transitions

Functional switchGlitch

Dynamic

Interconnect & clock

Macro-modelMacro-modelStatic

Switch-level model

Macro-model

Logic Blockcomponents

power sources

Static PowerSub-threshold leakage Gate leakageReverse biased leakage

Depending on the input vector

Page 12: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Cycle-Accurate Power Simulator

Mixed-level Power Model

Post-layout extracted delay & capacitance

Random Vector Generation

BC-Netlist

Cycle Accurate Power Simulation with Glitch Analysis

All cycles finished?

No

Power Values

Yes∑ ∑∈ ∈

+=activei idlej

sacycle nEnEE

)()(

Page 13: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Logic Block Power19%

Interconnect Power59%

Clock Power22%

Power Breakdown

Interconnect power is dominant

Cluster Size = 12, LUT Size = 4

Clock Power15%

Interconnect Power45%

Logic Block Power40%

Cluster Size = 12, LUT Size = 6

Page 14: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Power Breakdown (cont’d)

Leakage Power42%

Dynamic Power58%

Dynamic Power48%

Leakage Power52%

Leakage power becomes increasingly important (100nm)

Cluster Size = 12, LUT Size = 4 Cluster Size = 12, LUT Size = 6

Page 15: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Outline

IntroductionUnderstanding Power Consumption in FPGAsArchitecture Evaluation and Power Optimization

Architecture Parameter SelectionDual-Vdd/Dual-Vt FPGA Architecture

Low Power Synthesis with Dual-VddConclusion

Page 16: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Total Power along LUT and Cluster Size Changes

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

3 4 5 6 7

LUT Size

Tota

l FPG

A P

ower

(nor

mal

ized

ge

omet

ric m

ean)

Cluster Size = 4Cluster Size = 6Cluster Size = 8Cluster Size = 10Cluster Size = 12

Routing architecture: segmented wire with length of 4, and 50% tri-state buffers in routing switches

Page 17: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Routing Architecture Evaluation

Page 18: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Architecture of Low-power and High-performance

0.78651.02680.88651.0502

Cluster size 12,LUT size 4,

Wire segment length 4,100% buffered routing

switches

High-performance

(Et3)

1.00800.89090.99040.9653

Cluster size 10, LUT size 4,

wire segment length 4,25% buffered routing switches

Low-power(E3t)

Et3E3tDelay (t)

Energy (E)

Best FPGA architectureApplications

Arch. Parameter selection leads to 10% power/delay trade-offUniform FPGA fabrics provide limited power-performance tradeoffNeed to explore heterogeneous FPGA fabrics, e.g. dual-Vt and dual-Vdd fabrics

Page 19: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Outline

IntroductionUnderstanding Power Consumption in FPGAsArchitecture Evaluation and Power Optimization

Architecture Parameter SelectionDual-Vdd/Dual-Vt FPGA Architecture [Li, et al, FPGA’04]

Low Power Synthesis with Dual-VddConclusion

Page 20: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Dual-Vdd LUT DesignDual-Vdd technique makes use of the timing slack to reduce power

VddH devices on critical path performanceVddL devices on non-critical paths powerAssume uniform Vdd for one LUT

Threshold voltage Vt should be adjusted carefullyfor different Vdd levels

To compensate delay increaseTo avoid excessive leakage power increase

Page 21: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Vdd/Vt-Scaling for LUTsThree scaling schemes

Constant-Vt scalingFixed-Vdd/Vt-ratio scalingConstant-leakage scaling

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1.3v 1.0v 0.9v 0.8vVdd (V)

Del

ay (n

s)

constant Vtfixed-Vdd/Vt-ratioconstant leakage

0

1

2

3

4

5

6

7

8

9

10

1.3v 1.0v 0.9v 0.8v

Vdd (V)

Leak

age

Pow

er (

uW)

constant Vtfixed-Vdd/Vt-ratioconstant leakage

Constant-leakage scaling obtains a good tradeoffuseful for both single-Vddscaling and dual-Vdd design

Page 22: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Dual-Vt LUT DesignLUT is divided into two parts

Part I: configuration cells high VtPart II: MUX tree and input buffers normal Vt (decided by constant-leakage Vdd-scaling)

Configuration SRAM cellsContent remains unchanged after configurationRead/write delay is not related to FPGA performance

Use high Vt ~40% of VddMaintain signal integrityReduce SRAM leakage by 15Xand LUT leakage by 2.4XIncrease configuration time by 13%

Page 23: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Pre-Defined Dual-Vt FabricPower saving

11.6% for combinational circuits14.6% for sequential circuits

12.4%0.180spla9.4%0.0927seq

power savingpower (watt)

11.6%Avg.

14.7%0.256pdc9.4%0.0753misex311.6%0.059ex5p17.3%0.179ex101010.7%0.234des12.3%0.0536apex49.3%0.108apex28.5%0.0798alu4

arch-SVDT (Dual Vt)

arch-SVST (Single Vt)Circuit

Table1 Combinational circuits

14.0%0.0351tseng10.2%0.261s38484

power savingpower (watt)

14.6%Avg.

11.7%0.307s3841713.4%0.0736s29819.2%0.190frisc16.3%0.140elliptic14.5%0.134dsip19.7%0.0391diffeq14.8%0.632clma12.3%0.148bigkey

arch-SVDT (Dual Vt)

arch-SVST(Single Vt)circuit

Table2 Sequential circuits

Page 24: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Dual-Vdd FPGA FabricGranularity: logic block (i.e., cluster of LUTs)

Smaller granularity => intuitively more power savingBut a larger implementation overhead

Layout pattern: pre-defined dual-Vdd patternRow-based or interleaved patternRatio of VddL/VddH blocks is 2:1 (benchmark profiling)

Interconnect uses uniform VddH

L-block: VddL

H-block: VddH

Page 25: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Simple Design Flow for Dual-Vdd FabricBased on traditional design flow, but with new steps

Step I: LUT mapping (FlowMap) + P & Rassuming uniform VddH (using VPR)

Step II: Dual-Vdd assignment based on sensitivity

Setp III: Timing driven P & R considering pre-defined dual-Vdd pattern (modified VPR)

Page 26: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Comparison Between Vdd-Scaling and Dual-Vdd

For high clock frequency, dual Vdd achieves ~6% total power saving (~18% logic power saving)For low clock frequency, single-Vdd scaling is betterStill a large gap between ideal dual-Vdd and real case

Ideal dual-Vdd is the result without layout pattern constraint

circuit: alu4

0.03

0.04

0.05

0.06

0.07

0.08

0.09

65 75 85 95 105 115 125

Max. Clock Frequency (MHz)

Power (watt)

arch-SVDT (Vdd Scaling)

arch-DVDT(ideal case)

arch-DVDT(pre-defined Vdd)

1.3v

1.0v

0.9v

1.3v/0.8v

1.0v/0.8v0.9v/0.8v

1.5v

1.5/1.0v

1.3/1.0v

1.0/0.9v

1.5v/1.0v

1.3/0.9v

Page 27: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Vdd-Programmable Logic BlockPower switches for Vdd selection and power gatingOne-bit control is needed for Vdd selection, but two-bit control power gating

Page 28: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Experimental Results with Vdd-Programmable Blocks

Power v.s. performanceCircuit: alu4

0.03

0.04

0.05

0.06

0.07

0.08

0.09

65 75 85 95 105 115 125

clock frequency (MHz)

total power (watt)

arch-SV (Vdd scaling)arch-DV (configurable Vdd)arch-DV (ideal case)arch-DV (pre-defined Vdd)

1.3

v

1.0v

1.5v/1.0v

1.3v/0.8v

1.0v/0.8v

1.5v/1.0v

1.3v/0.9v

1.0v/0.8v

1.5v/0.8v

1.3v/0.8v

1.0v/0.9v

1.5v

0.9v/0.8v

1.0v/0.8v

1.3v/0.8v

1.5v/1.0v

Page 29: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Outline

IntroductionUnderstanding Power Consumption in FPGAsArchitecture Evaluation and Power OptimizationLow Power SynthesisConclusions

Page 30: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Low Power Synthesis for Dual Vdd FPGAs

FPGA architecture with dual-Vdds adds new layout constraints for synthesis toolsNovel synthesis tools are required to support the architecture

Technology mapping [Chen, et al, FPGA’04]Circuit clustering [Chen, et al, ISLPED’04]

Page 31: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Technology Mapping for Low-Power FPGAs with Dual Vdds

ac

d

yxz

b

w

e

fg

Cut Enumeration:

Topological Order from PIs to POs.

Delay 1, Power 1

Delay 2, Power 2

Optimal Delay = 1

Power = 1.5

Optimal Delay = 2

Power = 2.5

Delay 2, Power 3.2

Delay 2, Power 3.5

Delay 2, Power 2.5

Optimal Delay = 1

Power = 1

Optimal Delay = 1

Power = 1

Represent 1 case: single high Vdd case

Page 32: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Dual-Vdd Cases

Consider:Converter delay & powerVddL LUT delay & powerVddH LUT delay & power

a c

d

yxz

b

w

eTargetLUT

Cases Input LUT Target LUT Converter1 VddL VddL No2 VddL VddH Yes3 VddH VddL No4 VddH VddH No

Input LUT

Four extra cases for dual-Vddconsideration

Produce these four cases for each cut and nodeMore tradeoff solution points Smaller power requires larger delaySmaller delay requires larger power

Page 33: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Low Vdd LUTHigh Vdd LUT

Mapping Solution Generation

From POs to PIsCritical path driven by VddHLUTNon-critical paths can be driven by VddL LUT, guided by low power

ac

d

yxz

b

w

e

fg

Page 34: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Two Types of Required Times

VddL VddH

33.2

R

x y

If R is using VddH:

converter

Req’d times

Mapped LUTs

1.7 = 2.0 - 0.3

Critical path

If R is using VddL:

Critical path

1.8 2.0 Req’d times propagated back

Req’d time of R is 1.7

Req’d time of R is 1.8

To be mapped

Each node maintains two req’d times:

Propagated separately

Interact with each other

Page 35: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Experimental Results

- 2.10%- 1.29%0.56%- 4.04%

Real powerEst'ed powerTotal edgesMapping area

SVmap (Single high Vdd) compared to Emap [Lamoureux, ICCAD03]

Mapping area considerably betterEstimated power very close to the real power reported after P&R

- 9.44%- 10.72%- 11.63%v1.3 - v1.0v1.3 - v0.9v1.3 - v0.8v1.3

DVmapSVmap

DVmap (dual Vdd) compared to SVmap

v1.3 as VddH and v0.8 as VddL is the best combination

Page 36: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Circuit Clustering with Dual VddsGiven:

A mapped FPGA designAn FPGA architecture with Dual-

Vdd configurable logic blocksGoal:

Cluster the LUTs into logic blocksAssign voltages to the logic blocks

such that the design hasOptimal delay Minimum power

Constraints:Logic Block Inputs ≤ KLogic Block Size ≤ MLogic Block Outputs ≤ MLUT delay = dL or dHInter-block edge delay = D

Input = 5Size = 3Output = 2

LUT

LUT LUT

LUTLUT

LUTLUT

Page 37: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Cluster Enumeration – An Example

m n o p q

r s

t To get a cluster of size 6 on LUT t

Get 1 node on r, 4 on s, then merge with t …., and

Get 2 nodes on r, 3 on s …

Common nodesPIs to POs

Dynamic Programming

Get 3 on r …

Page 38: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Solution Generation

m n o p q

r s

t

Cluster s1

Cluster s2

Solution propagation similar as [Vaishnav, ICCAD’99]

Delay, power and voltage (form solution points) propagate through the clusters and nodes iteratively

Try to get solutions for Cluster t1

Get solutions for sGet solutions for r

Page 39: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Solution Curve on Node rGood solutions: Any two delay-power-vdd points

(D1, P1, V1) and (D2, P2, V2)if D1 > D2, then P1 < P2if D1 < D2, then P1 > P2

02468

1012

0 1 2 3 4 5 6 7

H LH

Lpower

Delay

H105L9.85.4H8.246.1L86.2

VddPowerDelay

Good delay-power-vdd points The corresponding solution curve

Page 40: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Solution Propagation

02468

1012

0 1 2 3 4 5 6 7

H LH Lp

ower

Delay

02468

1012

0 1 2 3 4 5 6 7

H LH Lp

ower

Delay

Delay-power-vdd curve for r Delay-power-vdd curve for s

Consider:Converter VddL LUTVddH LUTEdge delay

All the good solutions are generated

All the inferior solutions are pruned away0

2

4

6

8

10

12

14

16

18

0 1 2 3 4 5 6 7 8 9

Delay

power

LL

LLH

H

Delay-power-vdd curve for cluster t1

Page 41: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Two Theorems

The algorithm gets the minimum number of solution points, W, optimally for each node

W is upper bounded by Lwhere L = level(v) for node v

The algorithm is delay and power optimal for trees and delay optimal for directed acyclic graphs (DAGs) with dual-Vdd FPGAs

22 )1( +L

Page 42: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Experimental Results Summary

- 18.4%- 19.5%- 20.3%v1.3 - v1.0v1.3 - v0.9v1.3 - v0.8v1.3

Dual VddSingle Vdd

Dual-Vdd Clustering results compared to the Single high-Vdd Clustering results

v1.3 as high Vdd and v0.8 as low Vdd is the best combination among the three

Page 43: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

Outline

IntroductionUnderstanding Power Consumption in FPGAsArchitecture Evaluation and Power OptimizationLow Power SynthesisConclusions

Page 44: Architecture and Synthesis for Power-Efficient FPGAscadlab.cs.ucla.edu › ~cong › slides › edp04_lp_fpga_webversion.pdfWhy? FPGA is Known to be Power Inefficient! FPGA consumes

ConclusionsFPGA power consumption

Majority on programmable interconnectsLeakage is significant

FPGA architecture optimization for powerArchitecture parameter tuning has a limited impactUsing high Vt for configuration SRAM cells is helpfulUsing programmable dual Vdd for logic blocks is helpful

Power-efficient FPGA architectures introduce interesting CAD problems

Dual-Vdd mappingDual-Vdd clustering

Up to 20% power saving reported using these algorithms