VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV...

52
6\VWHPRQD&KLS 6\VWHPRQD&KLS $&DVHIRU+HWHURJHQHRXV $&DVHIRU+HWHURJHQHRXV $UFKLWHFWXUHV $UFKLWHFWXUHV Jan M. Rabaey Jan M. Rabaey BWRC BWRC University of California @ Berkeley University of California @ Berkeley http:// http:// bwrc bwrc . . eecs eecs . . berkeley berkeley . . edu edu With contributions from Richard Newton and many others

Transcript of VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV...

Page 1: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

6\VWHP�RQ�D�&KLS�6\VWHP�RQ�D�&KLS�$�&DVH�IRU�+HWHURJHQHRXV$�&DVH�IRU�+HWHURJHQHRXV$UFKLWHFWXUHV$UFKLWHFWXUHV

Jan M. RabaeyJan M. Rabaey

BWRCBWRC

University of California @ BerkeleyUniversity of California @ Berkeleyhttp://http://bwrcbwrc..eecseecs..berkeleyberkeley..eduedu

With contributions from Richard Newton and many others

Page 2: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

De ns ity Ac c e s s Time

(Gbits /c m2) (ns )

DRAM 8.5 10

DRAM (Lo g ic ) 2.5 10

S RAM (Cac he ) 0.3 1.5

Density Max. Ave. Power Clock Rate(Mgates/cm2) (W/cm2) (GHz)

Custom 25 54 3Std. Cell 10 27 1.5

Gate Array 5 18 1Single-Mask GA 2.5 12.5 0.7

FPGA 0.4 4.5 0.25

Design at a CrossroadDesign at a CrossroadSilicon technology tracking Silicon technology tracking Moore’s Moore’s LawLaw

Die Area: 2.5x2.5 cmVoltage: 0.6 - 0.9 VTechnology: 0.07 µm 15 times denser

than today2.5 times power

density5 times clock rate

Silicon in 2010Silicon in 2010

Page 3: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Design at a CrossroadDesign at a CrossroadApplications beat Applications beat Moore’s Moore’s LawLaw

Algorithmic Complexity

Moore’s Law as appliedto processors in Si.

(factor 2 every 18 months)Log

Com

plex

ity

Time1982 1992 2002 2012

2G

3G

1G

Cellular generations

Source: R. Subramanian, Mophics Tech. Inc

Page 4: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Design at a CrossroadDesign at a CrossroadThe Productivity GapThe Productivity Gap

1

Logi

c Tr

ansi

stor

s pe

r Chi

p

(K)

P

r odu

c ti v

it yTr

ans .

/ Sta

ff - M

ont h

10

100

1,000

10,000

100,000

1,000,000

10,000,000

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

Logic Transistors/Chip

Transistor/Staff MonthSource: SEMATECHSource: SEMATECH

1 98 1

1 98 3

1 98 5

1 98 7

1 98 9

1 99 1

1 99 3

1 99 5

1 99 7

1 99 9

2 00 3

2 00 1

2 00 5

2 00 7

2 00 9

xxx

x xx

x

2.5µ

.10µ

.35µ

Page 5: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Design at a crossroadDesign at a crossroad

System-on-a-ChipSystem-on-a-Chip

RAM

500 k Gates FPGA+ 1 Gbit DRAMPreprocessing

Multi-

SpectralImager

µCsystem+2 GbitDRAMRecog-nition

Ana

log

64 SIMD ProcessorArray + SRAM

Image Conditioning100 GOPS

●● Embedded applications whereEmbedded applications wherecost, performance, and energycost, performance, and energyare the real issues!are the real issues!

●● DSP and control intensiveDSP and control intensive

●● Mixed-modeMixed-mode

●● Combines programmable andCombines programmable andapplication-specific modulesapplication-specific modules

●● Software plays crucial roleSoftware plays crucial role

Page 6: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Distributed Approach to InformationThe Distributed Approach to InformationProcessingProcessing

Page 7: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Changing MetricsThe Changing Metrics

●● Power and/or EnergyPower and/or Energy have become dominant have become dominantdriversdrivers–– Limiting factor for performance and reliability inLimiting factor for performance and reliability in

wall-plugged applicationswall-plugged applications

–– Enabler for wide-spread use of distributedEnabler for wide-spread use of distributedcomputing and data accesscomputing and data access

●● Energy reduction requires joint optimizationEnergy reduction requires joint optimizationprocess between application andprocess between application andimplementationimplementation

Page 8: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Changing MetricsThe Changing Metrics

●● Cost of fabrication facilities and mask making hasCost of fabrication facilities and mask making hasincreased significantlyincreased significantly–– NRE cost of new design has increased significantlyNRE cost of new design has increased significantly

●● Physical effects (Physical effects (parasiticsparasitics, reliability issues, power, reliability issues, powermanagement) are increasingly significant in themanagement) are increasingly significant in thedesign processdesign process–– These must now be considered explicitly at the circuit levelThese must now be considered explicitly at the circuit level

●● Design complexity, and “context complexity” isDesign complexity, and “context complexity” issufficiently high that design verification is a majorsufficiently high that design verification is a majorlimitation on time-to-marketlimitation on time-to-market

Page 9: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Changing MetricsThe Changing Metrics

Flexibility

Power

Cost

Performance as a Functionality Constraint(“Just-in-Time Computing”)

Page 10: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare

“Femme se“Femme se coiffant coiffant””Pablo Pablo Ruiz PicassoRuiz Picasso19401940

Page 11: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The System-on-a-Chip NightmareThe System-on-a-Chip Nightmare

Bridge

DMA CPU DSP

MemCtrl.

MPEG

C I O O

System Bus

PeripheralBus

Control Wires

CustomInterfaces

The “Board-on-a-Chip”Approach

Page 12: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

System-on-a-ChipSystem-on-a-ChipA Renaissance in DesignA Renaissance in Design

ApplicationsApplicationsMultimediaConsumerCommunications

ImplementationImplementationFabricsFabricsSilicon substrateSilicon fabrics

DesignDesignMethodologyMethodologyHard+Soft

Aart De GeusDAC’99

ConvergenceConvergence

Page 13: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

An Architectural RenaissanceAn Architectural Renaissance

Embedded ARM-8Microprocessor

(Hard IP)

Tensilica Synthesized andConfigurable µProcessor

(Soft IP)

Courtesy of ARM, Tensilica Inc

Page 14: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

An Architectural RenaissanceAn Architectural Renaissance

CPU

IO

8 Vector Pipes (+ 1 spare)

Memory (512 Mbits / 64 MBytes)

Memory (512 Mbits / 64 MBytes)

Cross-bar

Switch

V-IRAM: An integrated Vector Processor for Media Processing[Patterson et all]

0.13 µm CMOS1 GHz

16 GFLOPS64 GOPS

(projected)

“Very-Short Instruction Word” Processors“Very-Short Instruction Word” Processors

Page 15: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

An Architectural RenaissanceAn Architectural Renaissance

DSPCore

Memory

MCUCore

WCDMA

CDMAIS-136

GSM

Fixed logic…

MorphICsMorphICs Dynamically Reconfigurable Architecture (DRA) Processor Dynamically Reconfigurable Architecture (DRA) Processor

DRA ProcessorDRA Processor

Software programmableHardware reconfigurable

Software

Download

WCDMA (mode, param)

CDMA (mode, param)

WTDMA (mode, param)

TDMA (mode, param)

• SIM Card• Handset Memory• POS Programming• Network Download• OTA Download

Realizes cost, size and power targets similar to traditional core+hardwired

Page 16: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

An Architectural RenaissanceAn Architectural Renaissance

Philips Nexperia NX-2700A programmable HDTVmedia processor

Combines Trimedia VLIW withConfigurable media co-processors

Page 17: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Architectural ChoicesArchitectural Choices

µP

Prog Mem

MACUnit

AddrGenµP

Prog Mem

µP

Prog M em

Satellite

ProcessorDedicated

Logic

Satellite

Processor

Satellite

Processor

GeneralPurpose

µP

Software

DirectMapped

Hardware

HardwareReconfigurable

Processor

ProgrammableDSP

Fle

xibi

lity

1/Efficiency

Page 18: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Energy-Flexibility GapThe Energy-Flexibility Gap

Embedded ProcessorsSA1100.4 MIPS/mW

ASIPsDSPs 2 V DSP: 3 MOPS/mW

DedicatedHW

Flexibility (Coverage)

Ene

rgy

Eff

icie

ncy

MO

PS/

mW

(or

MIP

S/m

W)

0.1

1

10

100

1000

ReconfigurableProcessor/Logic

Pleiades10-80 MOPS/mW

Page 19: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

DSPCPU

MPEG

MemCtrl.

C

I O O

DMA

Bridge

An Architectural RenaissanceAn Architectural Renaissance

DSP MPEGCPUDMA

C MEM I O

Example: “The Silicon Example: “The Silicon BackplaneBackplane””(Sonics, Inc)(Sonics, Inc)

Open CoreProtocolTM

SiliconBackplaneAgentTM

Communications-based DesignCommunications-based DesignGuaranteed Bandwidth

Arbitration

Page 20: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Programming the PlatformProgramming the Platform

Estimation andEstimation andEvaluationEvaluation

Component AssemblyComponent Assemblyand Synthesisand Synthesis

MicroarchitectureMicroarchitecture

ArchitectureArchitectureAlgorithmAlgorithm

SoftwareSoftwareImplementationImplementation

CompilationCompilation

ApplicationApplication

What is theWhat is theProgrammer’sProgrammer’s

Model?Model?

Primary focus of SIA GSRCDesign and Test Focus Center

Page 21: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Fast Design Space ExplorationFast Design Space Exploration

Output: Estimate, Profile

ArchitectureParameters

RetargetableEstimator

...Architectural

Choices

...

Application (Generic C code)

ParameterizedArchitecture

Model

Designer’s Input :Architect

Profiler

Example:Retargetable estimation[Ghazal]

Page 22: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

A Case StudyA Case StudyTheThe Integrated CMOS RadioIntegrated CMOS Radio

Page 23: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Trends in Wireless SystemsTrends in Wireless Systems

●● Towards better spectrum utilizationTowards better spectrum utilization–– using aggressive signal and protocol processingusing aggressive signal and protocol processing

•• Examples: Examples: multimulti-user detection, -user detection, multimulti-antenna arrays-antenna arrays

–– adaptive, multi-functional networksadaptive, multi-functional networks•• Example: IMTS2000 / UMTS (3G)Example: IMTS2000 / UMTS (3G)

●● Towards ubiquitous wireless networkingTowards ubiquitous wireless networking–– Example: Example: BluetoothBluetooth, , HomeRFHomeRF, , FireFlyFireFly

Resulting requirementsResulting requirementshigh performance, low-energy,high performance, low-energy, adaptivity adaptivity and flexibility and flexibility

Page 24: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Issues in Single-Chip Radio DesignIssues in Single-Chip Radio Design

Physical+ RF

Mac/Data Link

NetworkApplication'DWD'DWD

Data Acquisition

DataEncoding

DataFormatting

Mod/Demod

UI

&RQWURO&RQWURO

Synchron-ization

SlotAllocation

CallSetup

Data and Time Granularity

nsecµsecmsecsecbitspacketsstreamssource data

RadioRadio

Page 25: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Software RadioThe Software Radio

A/D ConverterD/A Converter

DSP

●● Idea: Digitize (Idea: Digitize (widebandwideband) signal at antenna and use) signal at antenna and usesignal processing to extract desired signalsignal processing to extract desired signal

●● Leverages of advances in technology, circuit design,Leverages of advances in technology, circuit design,and signal processingand signal processing

●● Software solution enables flexibility and Software solution enables flexibility and adaptivityadaptivity,,but at huge price in power and costbut at huge price in power and cost

●● 16 bit A/D converter at 2.2 16 bit A/D converter at 2.2 GHz GHz dissipates 1 to 10 Wdissipates 1 to 10 W

Page 26: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Mostly Digital RadioThe Mostly Digital Radio

DigitalBasebandReceiver

RF input(fc = 2GHz)

LNA

cos[2π(2GHz)t]

RF filter

chip boundary

I (50MS/s)

Q (50MS/s)

A/D

A/D

sin[2π(2GHz)t]

Analog Digital

Page 27: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Software-Definable RadioThe Software-Definable Radio

AD

Multi-model Analog RF

Timing recovery

phone

bookJava VM

ARQ

Keypad,Display

Control

FiltersAdaptiveAntenna

Algorithms

Equalizers MUD

Accelerators(bit level)

analog digital

DSP core

uC core

(ARM)

Logic

?????????????

Page 28: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

A Trend Towards GP A Trend Towards GP DSPsDSPs??

Time

100

1K

10K

GeneralPurpose

DSPMeg

aMac

s

FixedFunction

ADSL

Cable modemVDSL

HDD (read)

LAN (phy)

VON,VB Modems,HDD (servo),

Wireless baseband

G-lite

Source: TI

Page 29: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Single-Chip Single-Chip DSPs DSPs are Lagging ...are Lagging ...

1

10

100

1000

10000

1980 1985 1990 1995 2000

Year

Meg

amac

s

DSP Trend: x 1.4/year

Moore’s law: x 1.58/year

Source: TI

DSPs

While algorithms are beating While algorithms are beating Moore’s Moore’s law!law!

Page 30: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

And Computation Seems Almost for FreeAnd Computation Seems Almost for Free

0.25 0.25 µµm CMOS processm CMOS process

●● Area: 0.18 mmArea: 0.18 mm22

●● PerfPerf: 25 MCMACS/sec @ 1V: 25 MCMACS/sec @ 1V

●● Energy: 40 Energy: 40 MCMACsMCMACs//mWmW

a b

c d

p_r p_i

a+jb

c+jd

p_r + jp_i

a b

d c

12 x 12 Complex Multiplier12 x 12 Complex Multiplier

Page 31: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Energy Trends inEnergy Trends in DSPs DSPs

C15 @ 3v

C52 @ 3v

32010 @ 5v

C25 @5v

ATT16xx @2.7V

1v DSP

C52 @ 5v

0.5v DSP

2v DSP

C5x @ 2v

C15 @ 5v

0.001

0.01

0.1

1

10

100

1000

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

Year

mW

/MIP

S

DSPPower

Gene’sLaw

Factor 1.6 reductionper year

Source: TI

Page 32: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Implementation Trade-offThe Implementation Trade-off

Signal Update BlockAcquisition andTiming Recovery Signal Update Block

AdaptivePilot

Correlator

AdaptiveData

Correlator

C0 CL-1

Digital Baseband

Sk

...

Data Out

Receiver

ChannelCoefficientEstimates

AdaptivePilot

Correlator

Dat

a In

300 million multiplications/sec357 million add-sub’s/sec

Page 33: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Adaptive Multi-User DetectionAdaptive Multi-User DetectionA Direct Mapping ApproachA Direct Mapping Approach

Correlator

Power and area are dominated by MACs and multipliesOnly 36% of power of DSP-processor solution going into arithmetic

Page 34: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Reconfigurable Computing:Reconfigurable Computing:Merging Efficiency and VersatilityMerging Efficiency and Versatility

“Hardware” customized tospecifics of problem.

Direct map of problemspecific dataflow, control.

Circuits “adapted” asproblem requirementschange.

Spatially programmed connection of processing elements.Spatially programmed connection of processing elements.

Page 35: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

A New Look at Architectures —A New Look at Architectures —Heterogeneous ReconfigurationHeterogeneous Reconfiguration

ReconfigurableReconfigurableLogicLogic

ReconfigurableReconfigurableDatapathsDatapaths

adder

buffer

reg0

reg1

muxCLB CLB

CLBCLB

DataMemory

InstructionDecoder

&Controller

DataMemory

ProgramMemory

Datapath

MAC

In

AddrGen

Memory

AddrGen

Memory

ReconfigurableReconfigurableArithmeticArithmetic

ReconfigurableReconfigurableControlControl

Bit-Level Operationse.g. encoding

Dedicated data pathse.g. Filters, AGU

Arithmetic kernelse.g. Convolution

RTOSProcess management

Page 36: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Multi-granularity Reconfigurable Architecture:Multi-granularity Reconfigurable Architecture:The Berkeley The Berkeley PleiadesPleiades Architecture Architecture

Communication Network

ControlProcessor

ArithmeticProcessor

ArithmeticProcessor

ArithmeticProcessor

ConfigurableDatapath

ConfigurableLogic

Configuration Bus

Network Interface

DedicatedArithmetic

Configuration

Satellite ProcessorSatellite Processor

• Computational kernels are “spawned” to satellite processors• Control processor supports RTOS and reconfiguration• Order(s) of magnitude energy-reduction over traditional programmable architectures

Page 37: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Matching Computation and ArchitectureMatching Computation and Architecture

AddressGen AddressGen

Memory Memory

MAC MAC

ControlProcessor

L CG

Convolution

Two models of computation:communicating processes + data-flow

Two architectural models:sequential control+ data-driven

Page 38: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Example: Covariance Matrix ComputationExample: Covariance Matrix Computation

f o r ( i =1 ; i <=l e ng t h; i ++) {f o r ( k=i ; k<=l e ngt h; k++) { phi [ i ] [ k] = phi [ i - 1 ] [ k- 1 ] +

i n[ NP- i ] *i n[ NP- k] - i n[ NA- 1 - i ] *i n[ NA- 1- k] ;

} }

Ad drGen

Mem :i n

MPY

Ad drGen

Mem:p h i

ALU

ALU

Page 39: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Reconfigurable Kernels for W-CDMAReconfigurable Kernels for W-CDMA

●● Dominant kernel Dominant kernel MM((MMTTXX) requires) requiresarray of array of MACs MACs and segmentedand segmentedmemoriesmemories

●● Additional operations such asAdditional operations such assqrtsqrt(x), 1/x, and Trellis decoding(x), 1/x, and Trellis decodingmay be implemented using FPGAmay be implemented using FPGAor or cordic cordic satellitesatellite

Page 40: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Data-driven SynchronizationData-driven SynchronizationBased on Finite StreamsBased on Finite Streams

●● “Smart” satellites able to handle data inputs of different types“Smart” satellites able to handle data inputs of different types

●● Support of multi-dimensional signal processingSupport of multi-dimensional signal processing

●● Introduction of data types: scalars, vectors, matricesIntroduction of data types: scalars, vectors, matrices

1

11

1

nnMPY MPY

n

n1MAC

Page 41: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Impact of Architectural ChoiceImpact of Architectural Choice

187 0

Stro

ngA

RM

131

Nor

mal

ized

Ene

rgy

/ sta

ge [n

J]

TM

S32

0C2x

x

E n ergy/stage

49

TM

S320

LC

54x

1000

100

10000

1 0

21u

Stro

ngA

RM

10uN

orm

aliz

ed D

elay

/sta

ge [s

]

TM

S320

C2x

x

D elay/stage

3.8u

TM

S320

LC

54x

10u

1u

100u

1 00n

18.5

TM

S320

LC

54x

Nor

mal

ized

Ene

rgy*

Del

ay /

stag

e [J

s*e-

14]

10

1

100

1000 E nergy *D elay/stag e

137

TM

S32

0C2x

x0 .1

397 0

Stro

ngA

RM

10000Example: 16 point ComplexRadix-2 FFT (Final Stage)

13

570n 0.75

Plei

ades

Plei

ades

Plei

ades

Page 42: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Adaptive Multi-User Detector for W-CDMAAdaptive Multi-User Detector for W-CDMAPilot Pilot Correlator Correlator Unit Using LMSUnit Using LMS

AG

MULSUB

ADDMEM

MEM

MEM

MEMAG

MUL

MUL

MUL

Filter

Coefficient Update

MEM

MEMAG

ACC

ACC

MAC

MAC

MUL

MUL

SUB

SUB

MULSUB

ADD

MUL

MUL

MUL

SUB

SUB

alt

alt

alt

alt

alt

alt

alt

s_r

s_i

y_r

y_iADD

ADD

Zmf_r

Zmf_i

s_r

s_iZmf_r

Zmf_i

y_r

y_i

Page 43: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Architecture ComparisonArchitecture ComparisonLMS LMS Correlator Correlator at 1.67 at 1.67 MSymbolsMSymbols Data Rate Data RateComplexity: 300 Complexity: 300 MmultMmult/sec and 357 /sec and 357 MaccMacc/sec/sec

Note: TMS implementation requires 36 parallel processors to meet data rate -validity questionable

16 Mmacs/mW!

Page 44: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

MaiaMaia: Reconfigurable : Reconfigurable BasebandBasebandProcessor for WirelessProcessor for Wireless

• 0.25um tech: 4.5mm x 6mm

• 1.2 Million transistors

• 40 MHz at 1V

• 1 mW VCELP voice coder

• Hardware

• 1 ARM-8

• 8 SRAMs & 8 AGPs

• 2 MACs

• 2 ALUs

• 2 In-Ports and 2 Out-Ports

• 14x8 FPGA

Page 45: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Fast Design Space ExplorationFast Design Space ExplorationInterconnect ModelsInterconnect Models

N Inputs

B Buses

M Outputs

Multi-Bus

cluster

cluster

cluster

Hierarchical MeshMesh

Module

Model:Model:•• Interconnect energy and delay model Interconnect energy and delay model•• Algorithm mapping Algorithm mapping•• Graph-based place and route Graph-based place and route

Page 46: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Design Methodology and FlowDesign Methodology and Flow

●● Requires Requires architecture explorationarchitecture exploration over overheterogeneous implementation fabricsheterogeneous implementation fabrics

●● Should support Should support refinement refinement and and co-co-designdesign of hardware and software, as of hardware and software, aswell as behavior and architecturewell as behavior and architecture

●● Should consider all important metrics,Should consider all important metrics,and present and present PDA PDA (Power-Delay-Area)(Power-Delay-Area)perspectiveperspective

Page 47: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Software Methodology FlowSoftware Methodology Flow

Algo rithms

Ke rnel De te ction

Es timation/Explo ratio n

Partitio ning

S oftware Co mpilationRe c onfig . Hardware Mapping

Inte rfac e Code Gene ratio n

Po we r & Timing Es timation of Vario us Kerne l Imple mentations

PDA Mo de ls

Pre mappe dKerne ls

Acc e le rato rµproc &

Behavio ral

C++ Module Librarie s

C++

SUIF+ C-IF

Page 48: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Hardware-Software ExplorationHardware-Software Exploration

Macromodel call

Page 49: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Implementation Fabrics forImplementation Fabrics forProtocolsProtocols

BU

FMemory

Slot_Set_Tbl2x16

addr

BU

F

slot_set<31:0>

Slot_no<5:0>

Slotstart

Pktend

RACHreq

RACHakn

W_ENA

R_ENAupdate

idle

writereadslotset

RACH

idle

A protocol =Extended FSM

Intercom TDMA MAC

Page 50: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

Intercom TDMA MACIntercom TDMA MACImplementation alternativesImplementation alternatives

●● ASIC: 1V, 0.25 ASIC: 1V, 0.25 µµm CMOS processm CMOS process

●● FPGA: 1.5 V 0.25 FPGA: 1.5 V 0.25 µµm CMOS low-energy FPGAm CMOS low-energy FPGA●● ARM8: 1 V 25 MHz processor; n = 13,000ARM8: 1 V 25 MHz processor; n = 13,000

●● Ratio: 1 - 8 - >> 400Ratio: 1 - 8 - >> 400

ASIC FPGA ARM8Power 0.26mW 2.1mW 114mWEnergy 10.2pJ/op 81.4pJ/op n*457pJ/op

Page 51: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

The Software-Defined RadioThe Software-Defined Radio

ReconfigurableDataPath

FPGA Embedded uP

Dedicated FSM

DedicatedDSP

Page 52: VWHP RQ D &KLS˛ $&DVHIRU+HWHURJHQHRXV …bwrcs.eecs.berkeley.edu/faculty/jan/JansWeb/ewExternalFiles... · FPGA 0.4 4.5 0.25 Design at a Crossroad Silicon technology tracking Moore’s

●● Technology scaling is redefining the term “complexity”Technology scaling is redefining the term “complexity”

●● System-on-a-Chip fosters renaissance in processorSystem-on-a-Chip fosters renaissance in processorarchitecture, opening the door for new models andarchitecture, opening the door for new models andcombinations thereof:combinations thereof:Component and Communication Based DesignComponent and Communication Based Design

●● SOC driven by new set of metrics: how to simultaneouslySOC driven by new set of metrics: how to simultaneouslyoptimize optimize flexibility, cost, energy, and performanceflexibility, cost, energy, and performance??

●● Reconfigurable architecturesReconfigurable architectures provide tantalizing provide tantalizingcombination of flexibility and efficiencycombination of flexibility and efficiency

●● Numerous solutions for addressing the data-intensiveNumerous solutions for addressing the data-intensivecomponent of the software-defined radio — component of the software-defined radio — the nextthe nextchallenge is controlchallenge is control

Summary and PerspectiveSummary and Perspective