Embedded Computer Architecture - engbloms.se · »Interrupt latency ... 29 November 2002 Embedded...

52
Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Uppsala Unive Unive rsity rsity & Virtutech Inc. & Virtutech Inc. [email protected] [email protected] [email protected] [email protected] Embedded Embedded Systems Systems Computer Computer Architecture Architecture tech virtutech virtu tech virtu tech virtu 29 November 2002 Embedded Computer Architecture 2 Embedded Embedded Systems Systems

Transcript of Embedded Computer Architecture - engbloms.se · »Interrupt latency ... 29 November 2002 Embedded...

Jakob Engblom, PhDJakob Engblom, PhDUppsala Uppsala UniveUniversityrsity & Virtutech Inc.& Virtutech Inc.

[email protected]@[email protected]@virtutech.com

EmbeddedEmbedded Systems Systems ComputerComputer

ArchitectureArchitecture

techvirtutechvirtutechvirtutechvirtu

29 November 2002 Embedded Computer Architecture 2

Embedded Embedded SystemsSystems

29 November 2002 Embedded Computer Architecture 3

Embedded SystemsEmbedded Systems

It is a It is a snakesnake!!

No, a No, a wallwall!!

No, a No, a pillar!pillar!

No, it is a No, it is a treetrunktreetrunk!!

You’re You’re all all wrongwrong, it is a , it is a

fan!fan!

Now what Now what is this is this elephant thingelephant thing??

29 November 2002 Embedded Computer Architecture 4

Embedded SystemsEmbedded Systems

““A computer that doesn’t A computer that doesn’t look like a computer”look like a computer”

Interacts with worldInteracts with world

Primitive or no user interfacePrimitive or no user interface

Part of other productsPart of other products

29 November 2002 Embedded Computer Architecture 5

Embedded SystemsEmbedded Systems

Single purpose productsSingle purpose productsNot Not general purposegeneral purpose like desktop PCslike desktop PCsDo one thing very efficientlyDo one thing very efficiently

Software very important:Software very important:Gives character to productGives character to product

Used to differentiate inside a “platform”Used to differentiate inside a “platform”

Can be changed lateCan be changed lateProcessor cheaper than special HWProcessor cheaper than special HWTToday, dominates dev costoday, dominates dev cost

29 November 2002 Embedded Computer Architecture 6

"Desktop"2%

"Embedded"98%

Processor MarketProcessor Market

Embedded Embedded = most= most processors!processors!200 million PC and server200 million PC and server8000 million embedded8000 million embedded

29 November 2002 Embedded Computer Architecture 7

Processor MarketProcessor Market

Processors: Processors: 50% of all 50% of all semiconductor revenuesemiconductor revenueExplains why everyone Explains why everyone wants to do processorswants to do processors

3232--bit dominantbit dominant30% of total 30% of total semiconductorssemiconductors

PC processors: PC processors: 50% of CPU revenue50% of CPU revenue15% of total 15% of total semiconductorssemiconductorsAMD and Intel share itAMD and Intel share it

32-bit

16-bit

8-bit

4-bit

DSP

32-bit

16-bit

8-bit4-bitDSP

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Units Money

29 November 2002 Embedded Computer Architecture 8

RealReal--Time SystemTime System

Timing as important as resultTiming as important as resultHard realHard real--time:time:

Hard deadlinesHard deadlinesDead if missed deadlineDead if missed deadlineWorstWorst--casecase

Soft realSoft real--time:time:Fuzzier deadlinesFuzzier deadlinesCan miss some deadlinesCan miss some deadlinesAverageAverage--casecase

29 November 2002 Embedded Computer Architecture 9

RealReal--Time SystemsTime Systems

Embedded and RealEmbedded and Real--TimeTimeSynonymous?Synonymous?

Most embedded Most embedded systems are systems are realreal--timetimeMost realMost real--time time systems are systems are embeddedembedded

embeddedembedded

realreal--timetime

embedded embedded realreal--timetime

29 November 2002 Embedded Computer Architecture 10

Simple Embedded Simple Embedded SystemsSystems

8-bit Hitachi H8/30032 kB ROM, 32 kB RAM

Standard microcontroller chip

Byte-code machine, sensor drivers, …

8-bit Intel 8051, standard microcontroller

Behavior, talk, IR communications

29 November 2002 Embedded Computer Architecture 11

Fun App: Smart Beer GlassFun App: Smart Beer Glass

88--bbit, 8it, 8--pin pin PIC processorPIC processor

Capacitive Capacitive senssensor for or for fluid levelfluid level

InduInductive coil for ctive coil for RF ID activation RF ID activation

& power& power

CPU and reading coil in the table. Reports the level of fluid in the glass, alerts servers when close to empty

ContContactless actless transmission of transmission of

power and power and readingsreadings

29 November 2002 Embedded Computer Architecture 12

No Upgrades PossibleNo Upgrades Possible

Once a product ships…Once a product ships…

…it often cannot be serviced…it often cannot be servicedNo download abilityNo download abilityNo writable persistent storageNo writable persistent storageNo disksNo disksNo loaderNo loader

Software is writeSoftware is write--onceonce

(There are exceptions)(There are exceptions)

29 November 2002 Embedded Computer Architecture 13

Consumer ElectronicsConsumer Electronics

Heterogeneous Heterogeneous multiprocessormultiprocessor

88--bit Atmel AVR for UI, games, …bit Atmel AVR for UI, games, …1616--bit fixedbit fixed--point TI C54 DSP for point TI C54 DSP for GSM coding, radio interface, … GSM coding, radio interface, … 3232--bit ARM7 in Bluetooth modulebit ARM7 in Bluetooth module+ maybe ARM7 in IRDA interface+ maybe ARM7 in IRDA interface

All in custom chipsAll in custom chipsSoftware is large:Software is large:

16 MB of code in control part16 MB of code in control partPlus signal processing codePlus signal processing code

29 November 2002 Embedded Computer Architecture 14

AutomAutomotiveotive

Multiple networksMultiple networksCAN for body CAN for body electronics: 30+ nodeselectronics: 30+ nodesCAN for engine control: CAN for engine control: few nodesfew nodesLIN for instrumentsLIN for instruments

Many processorsMany processorsUp to 100Up to 100

Large diversity in processor types:Large diversity in processor types:88--bit CPUs (PIC, HC08) for door locks, lights, etc. bit CPUs (PIC, HC08) for door locks, lights, etc. 1616--bit CPUs (C167, HC11, HC12) for most functionsbit CPUs (C167, HC11, HC12) for most functions3232--bit CPUs (PPC,V850) for engine control, airbagsbit CPUs (PPC,V850) for engine control, airbags

Total amount of code: 40Total amount of code: 40--50 MB50 MB

29 November 2002 Embedded Computer Architecture 15

AutomotiveAutomotive

Form follows functionForm follows functionProcessing where the action isProcessing where the action isArchitecture given by applicationArchitecture given by applicationSensors and actuators distributedSensors and actuators distributed

Heterogeneous systemsHeterogeneous systemsMany Many different makes of different makes of CPUsCPUsStandardizedStandardized at the at the networknetwork/bus/bus

29 November 2002 Embedded Computer Architecture 16

Timing AspectsTiming Aspects

Interrupt latencyInterrupt latencyImportant criterion for embeddedImportant criterion for embeddedA few clock cycles at mostA few clock cycles at mostMeasure of RTOS performanceMeasure of RTOS performance

RealReal--Time = predictabilityTime = predictabilityInIn--order pipelinesorder pipelinesSRAM instead of cachesSRAM instead of cachesLockable cachesLockable cachesSeveral small CPUs instead of one bigSeveral small CPUs instead of one big

29 November 2002 Embedded Computer Architecture 17

Military Military ShShipboardipboardStandard multiprocessor UltraSparc servers for radar, target tracking, combat control, …

Many CPUs in missiles, gun controls, engines, …

29 November 2002 Embedded Computer Architecture 18

Mobile Phone Base StationMobile Phone Base Station

Handle signalsHandle signalsData streams to and from Data streams to and from phonesphonesMassively parallel systemMassively parallel systemThousands of DSP tasksThousands of DSP tasksPerfect parallel scalabilityPerfect parallel scalability

Custom or standard Custom or standard DSPsDSPsUp to 8 Up to 8 DSPsDSPs on a single chipon a single chip

29 November 2002 Embedded Computer Architecture 19

Embedded Embedded processingprocessing

29 November 2002 Embedded Computer Architecture 20

IntegrationIntegration

A single chip:A single chip:CPU CoreCPU CoreIntegrated memoryIntegrated memoryIntegrated peripheralsIntegrated peripheralsIntegrated servicesIntegrated services

Goal:Goal:System on one chipSystem on one chipNo external HWNo external HWFit application “perfectly” Fit application “perfectly”

CPUCore

RAM(small)

ROM(big)

UA

RT

A/D

Tim

er

LC

D D

Outside World

29 November 2002 Embedded Computer Architecture 21

ProcessorsProcessors: Wide Span: Wide Span

BitwidthsBitwidths: 4 to 64 bits: 4 to 64 bitsMost common: 8 bit (4G units)Most common: 8 bit (4G units)3232--bit growing fastestbit growing fastest32/6432/64--bit outnumbers desktopbit outnumbers desktop

Frequency: DC to Frequency: DC to 22 GhzGhz

Memory: From Memory: From 0.5 kB to 5 MB0.5 kB to 5 MB

Power: Power: mWmW (and up)(and up)

1/30 to 10 instructions per cycle1/30 to 10 instructions per cycle

29 November 2002 Embedded Computer Architecture 22

Devices on the ChipDevices on the Chip

Interface with the worldInterface with the worldDigital I/ODigital I/OAnalog/Digital conversionAnalog/Digital conversionDigital/Analog conversionDigital/Analog conversion

CommunicationsCommunicationsCAN networksCAN networksEthernet networksEthernet networksRadioRadioSerial ports (UART, USART)Serial ports (UART, USART)USB, FireWire, ... USB, FireWire, ...

29 November 2002 Embedded Computer Architecture 23

Devices on the ChipDevices on the Chip

TimersTimersTrigger interruptsTrigger interruptsWatchdogsWatchdogs

GraphicsGraphicsLCD driversLCD drivers2D/3D graphics acceleration2D/3D graphics acceleration

BusesBusesOnOn--chipchip:: between devices: AMBA, … between devices: AMBA, … OffOff--chip: PCI, chip: PCI, HyperTransportHyperTransport, , RapidIORapidIO … …

29 November 2002 Embedded Computer Architecture 24

TrendsTrends

MarketMarket3232--bit market is growing fastbit market is growing fastDSPsDSPs are growing fastare growing fast

TechnologyTechnologyConfigurable processorsConfigurable processorsConfigurable logic as helpConfigurable logic as helpFusion of DSP and microcontrollersFusion of DSP and microcontrollersMore complex architecturesMore complex architecturesHigher integration on each chipHigher integration on each chipMultiprocessors onMultiprocessors on--aa--chipchip

29 November 2002 Embedded Computer Architecture 25

TrendsTrends

Hardware to softwareHardware to softwareIncrease flexibility, lower costIncrease flexibility, lower costSoftware on fast processor can equal HWSoftware on fast processor can equal HW

Software to hardwareSoftware to hardwareBetter power consumption & performanceBetter power consumption & performanceDesign custom hardware for applicationDesign custom hardware for application

HardwareHardware--software software codesigncodesignDelay division HW/SW to late in projectDelay division HW/SW to late in projectObtain “optimal” HW/SW divisionObtain “optimal” HW/SW division

29 November 2002 Embedded Computer Architecture 26

Control Control vsvs DataData

Control plane:Control plane:MicrocontrollersMicrocontrollersDecisionDecision--makingmaking“Integer applications”“Integer applications”UI of a phone, packet routing, …UI of a phone, packet routing, …

Data plane:Data plane:Move or process dataMove or process dataPerformance is keyPerformance is keySignal processing, multimedia, … Signal processing, multimedia, … Floating/fixed pointFloating/fixed point

29 November 2002 Embedded Computer Architecture 27

On-chip bus

SystemSystem--onon--aa--chipchip

Integration Integration extremeextreme

Thanks to modern Thanks to modern semiconductorssemiconductors

Entire product Entire product on a chipon a chipOne or more One or more processors, processors, accelerators, …accelerators, …

DSP

LCD driver

CPU

Blu

eto

oth

GSM Radio

Code memory

Data mem

29 November 2002 Embedded Computer Architecture 28

PacPackagingkaging

29 November 2002 Embedded Computer Architecture 29

Packaging of ProcessingPackaging of Processing

MicroprocessorMicroprocessorStandard standStandard stand--alone processor chipalone processor chip

MicrocontrollerMicrocontrollerProcessor plus devicesProcessor plus devices

ASIPASIPApplicationApplication--Specific Integrated Processor Specific Integrated Processor

ASICASICApplicationApplication--Specific Integrated CircuitSpecific Integrated Circuit

FPGAFPGAFieldField--Programmable Gate ArrayProgrammable Gate Array

29 November 2002 Embedded Computer Architecture 30

MicrocontrollersMicrocontrollers

Classic embedded hardwareClassic embedded hardwareStandard partsStandard parts

Quite broad application domainsQuite broad application domainsSold in large seriesSold in large seriesDefined by hardware vendorsDefined by hardware vendorsAs cheap as a single dollarAs cheap as a single dollar

Single processor + devicesSingle processor + devicesHuge number of variantsHuge number of variantsUsually intended for control planeUsually intended for control plane

Mic

roco

ntr

olle

rs

29 November 2002 Embedded Computer Architecture 31

Example: PIC 12CE674Example: PIC 12CE674Memory arch:Memory arch: HarvardHarvard

Program memory:Program memory: 2048 x 14 (OTP/Flash)2048 x 14 (OTP/Flash)

EEPROM:EEPROM: 16 bytes16 bytes

RAM:RAM: 128 bytes128 bytes

ADC channels:ADC channels: 4 (8 bits)4 (8 bits)

I/O ports:I/O ports: 66

Timers:Timers: One 8One 8--bit, One WDTbit, One WDT

Clock:Clock: onchiponchip crystal, 10MHzcrystal, 10MHz

Package:Package: 8 pins (Pentium 4:8 pins (Pentium 4:700700 pins)pins)

Cost:Cost: <<$1.00 (Pentium 4:>$200.00)$1.00 (Pentium 4:>$200.00)

29 November 2002 Embedded Computer Architecture 32

Example: AT91M42800AExample: AT91M42800A

ARM7TDMI 32ARM7TDMI 32--bit corebit coreStatic design: 0 to 33 Static design: 0 to 33 MhzMhz

MemoryMemory8 8 kBkB SRAM on chipSRAM on chipExternal memory interface, 8/16 bit interfaceExternal memory interface, 8/16 bit interface

DevicesDevices6 timers6 timers2 serial ports2 serial ports

JTAG debug interfaceJTAG debug interfaceAbout 0.5 W powerAbout 0.5 W powerAbout 18 USDAbout 18 USD

144 Pin package144 Pin packageOne of 13 AT91 One of 13 AT91 variantsvariants

29 November 2002 Embedded Computer Architecture 33

ASIPsASIPs / / ASSPsASSPs

ApplicationApplication--specific specific integrated/standard processorintegrated/standard processor

Targeting a particular niche marketTargeting a particular niche marketMore targeted than microcontrollerMore targeted than microcontrollerDomainDomain--specific acceleratorsspecific accelerators

Usually more upscaleUsually more upscale3232--bit processorsbit processorsMultiprocessors Multiprocessors Expensive peripheralsExpensive peripheralsExternal memory assumedExternal memory assumedHigher performance, includes dataHigher performance, includes data--planeplane A

SIP

/ A

SS

P

29 November 2002 Embedded Computer Architecture 34

Example: Example: PowerQUICCPowerQUICC IIIIII

MotorolaMotorolaTarget marketTarget market

CommunicationsCommunications

Processing Processing PowerPC e500PowerPC e500666666--1000 1000 MhzMhz256 256 kBkB L2 cacheL2 cache

NetworkingNetworkingCPM module, RISCCPM module, RISC--based microcodebased microcode

About 160 USDAbout 160 USD

Features

Capabilities

256Multichannel HDLC (from MCC2)

2Utopia II ATM (from FCC)

2Ethernet 10/100/1000

3Ethernet, 10/100 (from FCC)

4Ethernet, 10 (from SCC)

2Ethernet 10/100/1000 controller

1RapidIO controller

1PCI-X/PCI controller

11DDR Memory controller

1I2C controller

1Serial Peripheral Interface (SPI)

2Serial Management Controller (SMC)

2Multi-Channel Controller (MCC2)

3Fast Communications Controller (FCC)

4Serial Communications Controller (SCC)

29 November 2002 Embedded Computer Architecture 35

Example: C167CSExample: C167CS

InfineonInfineon

Target MarketTarget MarketAutomotive controlAutomotive control

ProcessingProcessing1616--bit C16x corebit C16x core44--stage simple pipelinestage simple pipeline40 40 MhzMhz operationoperation16 MB memory space, 16 MB memory space, including ROM, RAM, including ROM, RAM, devicesdevices

144 pin package144 pin packageTolerates Tolerates --40 C to +125 C40 C to +125 C

About 25 USDAbout 25 USD

1Synchronous Serial Comms (SSC)

8 kBExtension Internal RAM (XRAM)

3 kBFast General Internal RAM (IRAM)

Devices

External Ports

32 kBROM

Memory

116-bit ports from devices

88-bit ports from devices

2CAN interfaces

2x16Capture/Compare Channels

1USART

24+8Analog-Digital Converter Channels

1Pulse-Width Modulator (PWM)

1Watch-Dog Timer (WDT)

5General-Purpose Timers (GPT)

2CAN 2.0b controllers

29 November 2002 Embedded Computer Architecture 36

Example: TI OMAP 5910Example: TI OMAP 5910

Texas InstrumentsTexas InstrumentsTarget marketTarget market

DataData--intense realintense real--timetimeAudio, biometrics, etc.Audio, biometrics, etc.

Processing Processing DualDual--core chipcore chipARM925T 150 ARM925T 150 MhzMhzTI C55 DSP 150 TI C55 DSP 150 MhzMhz

Power 230 Power 230 mWmWPrice 32 USDPrice 32 USD

ARM shared devices

ARM private devices

System devices

DSP shared devices

DSP private devices

C55xDSP Core

24k I$

64k data SRAM

96k instrSRAM

ARM925CPU Core

16k I$

8k D$

MMU

192k Shared SRAM

MemCtrl

75 Mhz

LCD Ctrl

USB 1.1LCD controllerMMC/SDcard intfcamera interface keyboard interfaceRTCI2C8 serial ports3 UARTs14 GPIO pins

USB 1.1USB 1.1LCD controllerLCD controllerMMC/MMC/SDcardSDcard intfintfcamera interface camera interface keyboard interfacekeyboard interfaceRTCRTCI2CI2C8 serial ports8 serial ports3 3 UARTsUARTs14 GPIO pins14 GPIO pins

29 November 2002 Embedded Computer Architecture 37

ASICsASICs

ApplicationApplication--specific specific integrated circuitintegrated circuit

Fully custom hardwareFully custom hardwareCustom for your applicationCustom for your applicationAs small or large as necessaryAs small or large as necessary

CharacteristicsCharacteristicsExpensive to developExpensive to develop

10s of engineers, often 100s10s of engineers, often 100s

Large series necessary to pay offLarge series necessary to pay offAt least 100 000 units necessary on averageAt least 100 000 units necessary on average

Mostly for large companiesMostly for large companiesTypically, they become Typically, they become SoCsSoCs A

SIC

29 November 2002 Embedded Computer Architecture 38

ASIC ComponentsASIC Components: ”IP”: ”IP”

IP BlocksIP BlocksIntellectual PropertyIntellectual PropertyCompanies sell pieces of hardwareCompanies sell pieces of hardware

Examples:Examples:CPU CoresCPU CoresMemoryMemoryBusesBusesNetwork interfacesNetwork interfacesAccelerator circuitsAccelerator circuits

DSP

LCD

driver

CPU

Blu

eto

oth

GSM

Radio

Code memory

Data

mem

29 November 2002 Embedded Computer Architecture 39

CPU CoresCPU Cores

The biggest “IP” businessThe biggest “IP” business

Biggest players:Biggest players:ARM (bestARM (best--selling 32selling 32--bit bit architecturearchitecture))MIPS (and its licensees)MIPS (and its licensees)

Crowded fieldCrowded fieldNew companies appear monthlyNew companies appear monthly““FablessFabless” semiconductor companies” semiconductor companiesTuned for a particular applicationTuned for a particular application

29 November 2002 Embedded Computer Architecture 40

Hard Hard vsvs Soft IPSoft IP

Hard IP:Hard IP:Customer buys a core as black boxCustomer buys a core as black boxExamples: ARM & MIPSExamples: ARM & MIPSGives good performanceGives good performanceHides trade secretsHides trade secrets

Soft IP:Soft IP:Get HDL code for the componentGet HDL code for the componentExamples: ARCExamples: ARC & & TenTensilicasilicaIntegrate with own or other Integrate with own or other logiclogicLoses some performance Loses some performance

29 November 2002 Embedded Computer Architecture 41

IP CIP Coreore: ARM 926EJ: ARM 926EJ--SS

Core Core ””macrocellmacrocell””CPU core, caches, bus interface, MMCPU core, caches, bus interface, MMUU as a packas a packageage

Instruction sets:Instruction sets:Von Neumann architectureVon Neumann architecture3232--bit ARM v5TE ISAbit ARM v5TE ISA1616--bit THUMB ISAbit THUMB ISAJava Java bytecodesbytecodes via via JazelleJazelle

Processing power:Processing power:Five stage pipeline, scaling toFive stage pipeline, scaling to 180180--270 270 MhzMhz8 8 kBkB icacheicache and 8 and 8 kBkB dcachedcache

Power: 0.2 to 0.9 Power: 0.2 to 0.9 mW/MhzmW/Mhz (P4: >35 (P4: >35 mW/MhzmW/Mhz))

MMU: for MMU: for SymbianSymbian, Windows CE, Linux , Windows CE, Linux

29 November 2002 Embedded Computer Architecture 42

IP Core: MIPS 24kIP Core: MIPS 24k

MacrocellMacrocell like the ARM926like the ARM926Processor, cache, memory interface, MMU, TLBsProcessor, cache, memory interface, MMU, TLBs

Instruction sets:Instruction sets:MIPS16eMIPS16eMIPS32 MIPS32 User extensions possible, via “User extensions possible, via “CorExtendCorExtend””

Performance:Performance:88--stage scalar pipeline, up to 550 stage scalar pipeline, up to 550 MhzMhzConfigurable cache, up to 64kB L1 I$ & D$Configurable cache, up to 64kB L1 I$ & D$Dynamic branch predictionDynamic branch prediction

Aimed at multiprocessor Aimed at multiprocessor SoCsSoCsCache coherency protocol standardCache coherency protocol standardAlmost equivalent to a 1990’s server processorAlmost equivalent to a 1990’s server processor

29 November 2002 Embedded Computer Architecture 43

Example: Ericsson BluetoothExample: Ericsson Bluetooth

ARM for protocol stackARM for protocol stack

Memory for the codeMemory for the code

Special hardware for RF partsSpecial hardware for RF parts

USB and serial connectionsUSB and serial connections

MarketMarketAiming at huge volumesAiming at huge volumesComponent in mobile phones etc.Component in mobile phones etc.

29 November 2002 Embedded Computer Architecture 44

Producing Your ASICProducing Your ASIC

Old way: “Old way: “InhouseInhouse””Build your own fab (everyone did!)Build your own fab (everyone did!)

New way: ”Silicon Foundries”New way: ”Silicon Foundries”Fabs are getting very expensiveFabs are getting very expensiveSpecialized fab companiesSpecialized fab companiesSell manufacturing capacitySell manufacturing capacityExamples: TSMC, UMC, IBM, TIExamples: TSMC, UMC, IBM, TICustomers: Nvidia, ATI, Sun, Cisco Customers: Nvidia, ATI, Sun, Cisco

=Rise of ”fabless” companies=Rise of ”fabless” companies

29 November 2002 Embedded Computer Architecture 45

FullFull--Custom SystemsCustom Systems

Volumes are high enoughVolumes are high enough

Needs are special enoughNeeds are special enough

InIn--house processor designhouse processor design

Examples:Examples:Ericsson APZ (now defunct)Ericsson APZ (now defunct)Cisco Toaster3 network proc (NPU)Cisco Toaster3 network proc (NPU)Ericsson Ericsson FlexASICFlexASIC DSPDSP

29 November 2002 Embedded Computer Architecture 46

Cisco Toaster3Cisco Toaster38 clusters of 2 8 clusters of 2

processors processors eacheach

Each TMC Each TMC is a is a VLIW machine VLIW machine

with 74 bit with 74 bit instructions, 2k instructions, 2k instructions in instructions in local memorylocal memory

Total caTotal capacity: pacity: about 5 GOps, at about 5 GOps, at around 160 Mhzaround 160 Mhz

Two 32Two 32--bit bit ALUs and three ALUs and three

control/data control/data movement units movement units

per TMCper TMC

Image from Microprocessor Report, Oct 2002

29 November 2002 Embedded Computer Architecture 47

Cisco Toaster3Cisco Toaster3

Massive Massive multiprocessingmultiprocessing

16 cores on a chip16 cores on a chip4 chips in serial4 chips in serialRouting:Routing:

10 10 GbpsGbps@ 20 @ 20 Mpackets/sMpackets/s1000 ops per packet 1000 ops per packet passing throughpassing through

29 November 2002 Embedded Computer Architecture 48

FPGAFPGA

Field Programmable Gate ArrayField Programmable Gate ArrayReconfigurable hardware: “soft logic”Reconfigurable hardware: “soft logic”

“Program” is circuit layout“Program” is circuit layoutCan be changed after Can be changed after iniinitial loadtial load

Kilos to Megs of Kilos to Megs of ””gates” availablegates” available

Competitor to Competitor to ASICsASICsMore expensive per unit, More expensive per unit, but no startbut no start--up cost for manufacturingup cost for manufacturingLess flexible, slightly slowerLess flexible, slightly slowerPerfect for lowPerfect for low--volume productsvolume products

FP

GA

29 November 2002 Embedded Computer Architecture 49

FPGA ArchitectureFPGA Architecture

Computation cellsComputation cellsProgrammable Programmable functionfunction

Adder, Logic Adder, Logic funcsfuncs, ..., ...Memory, Registers, ... Memory, Registers, ...

Input/Output cellsInput/Output cells

InterconnectInterconnectReconfigurableReconfigurableProgrammableProgrammable

29 November 2002 Embedded Computer Architecture 50

FPGA ArchitectureFPGA Architecture

Computation cellsComputation cellsLookLook--Up TableUp Table

Arbitrary 4Arbitrary 4--input, input, 11--output functionoutput function

CoarseCoarse--grainedgrainedLots of functionalityLots of functionalitySeveral Several LUTsLUTsPlus flipPlus flip--flops etc.flops etc.

FineFine--grainedgrainedLittle functionalityLittle functionality

ConfigRAM

LUT

29 November 2002 Embedded Computer Architecture 51

FPGFPGA with CPU CoresA with CPU Cores

CPU onCPU on--board FPGAboard FPGAHW accelerate critical HW accelerate critical tasks in FPGA tasks in FPGA fabfabricricData pumps in FPGAData pumps in FPGAControl in CPUControl in CPU

Cool new possibilitiesCool new possibilitiesReconfigure FPGA onlineReconfigure FPGA onlineAdapt to workloadsAdapt to workloads

CPU

29 November 2002 Embedded Computer Architecture 52

Soft CPUs in FPGAsSoft CPUs in FPGAs

Processor in the FPGA fabricProcessor in the FPGA fabric”Soft” processor”Soft” processorSpecial design considerationsSpecial design considerations

ExamplesExamplesAltera NiosAltera NiosXilinx MicroblazeXilinx MicroblazeResearch projectsResearch projects

Västerås ARM clone Västerås ARM clone Leon processor also prototypedLeon processor also prototyped

29 November 2002 Embedded Computer Architecture 53

ExamplesExamples

Altera Apex 20kCAltera Apex 20kC“Volume”“Volume”3030k to 1.5M gatesk to 1.5M gates

XilinxXilinx VirtexVirtex IIII: : “High“High--end”end”11--4 PPC405 cores 4 PPC405 cores (optional)(optional)10M gates10M gatesPrice at about $1000Price at about $1000

AlteraAltera StratixStratix“Advanced”“Advanced”10 10 MbitMbit RAMRAM28 DSP elements28 DSP elements100000 LE100000 LE1300 user I/O pins1300 user I/O pinsOptimized for Optimized for NiosNios

ATMEL FPSLIC: ATMEL FPSLIC: “Low“Low--end”end”AVR 8AVR 8--bit CPUbit CPU5050kk gatesgates

29 November 2002 Embedded Computer Architecture 54

Instruction Instruction SetsSets

29 November 2002 Embedded Computer Architecture 55

IS IS ArchiArchitetecturesctures

New life for old architecturesNew life for old architecturesZ80, 6502, 8051, PIC, …., Z80, 6502, 8051, PIC, …., 6800068000--ColdFireColdFire

New career for failed desktopsNew career for failed desktopsMIPS, PowerPCMIPS, PowerPC

Fresh architecturesFresh architecturesAVR, AVR, dsPICdsPIC, V850, SH, …, V850, SH, …

Digital signal processingDigital signal processingC5xxx, BlackFin, MSA, 56000, Oak, ... C5xxx, BlackFin, MSA, 56000, Oak, ...

29 November 2002 Embedded Computer Architecture 56

Instruction SetsInstruction Sets

Code Size importantCode Size importantVariable instruction lengthVariable instruction length

Common instructions shortCommon instructions shortShort and long branchesShort and long branchesRISC machines with 16RISC machines with 16--64 bit instructions64 bit instructionsLimited immediate operand sizesLimited immediate operand sizesTwoTwo--operand rather than threeoperand rather than three--operandoperand

Compact and powerful instructionsCompact and powerful instructionsPush/pop multiplePush/pop multipleSwitchSwitch

29 November 2002 Embedded Computer Architecture 57

Instruction SetsInstruction Sets

SpecialSpecial--purpose instructionspurpose instructionsDigital Signal ProcessingDigital Signal ProcessingBitBit--manipulationmanipulation

Set bit in memory, test bit in memorySet bit in memory, test bit in memorySeveral memory accesses per instructionSeveral memory accesses per instruction

ApplicationApplication--specificspecificFuzzy logic support (68HC12)Fuzzy logic support (68HC12)Table interpolation (68300)Table interpolation (68300)

Or even designed by customers!Or even designed by customers!

Do useful things=powerfulDo useful things=powerful

29 November 2002 Embedded Computer Architecture 58

Instruction SetsInstruction Sets

Compressed instruction setsCompressed instruction setsARM/Thumb & MIPS16ARM/Thumb & MIPS161616--bit encoding of (parts of) bit encoding of (parts of) 3232--bit instruction setsbit instruction setsPerforms better on narrow busesPerforms better on narrow busesLimitations in ARMLimitations in ARM//Thumb:Thumb:

Only access to 8 registersOnly access to 8 registersNo system operationsNo system operationsNo multiplyNo multiply--accumulateaccumulateNo general conditional execution No general conditional execution

29 November 2002 Embedded Computer Architecture 59

Instruction Sets: Code SizeInstruction Sets: Code Size

Some data on code size:Some data on code size:

Thumb ARM 386 8088 68020 SPARC

eqntott 10608 16768 17640 19106 20542 22256

0.63 1.00 1.05 1.14 1.23 1.33

xlisp 26388 40768 28097 29401 46746 44648

0.65 1.00 0.69 0.72 1.15 1.10

espresso 72596 109923 125686 137194 131854 142752

0.66 1.00 1.14 1.25 1.20 1.30

Source: Microprocessor Report, March 1995

29 November 2002 Embedded Computer Architecture 60

Instruction Sets: Code SizeInstruction Sets: Code Size

ARM Thumb: fixed 16ARM Thumb: fixed 16--bit sizebit sizeSaves 28% compared to 32Saves 28% compared to 32--bit ARMbit ARMRuns 20% slower than 32Runs 20% slower than 32--bit ARMbit ARM

ARM Thumb 2: mixed 16/32ARM Thumb 2: mixed 16/32Saves 26% compared to 32Saves 26% compared to 32--bit ARMbit ARMRuns 2% slower than 32Runs 2% slower than 32--bit ARMbit ARM(Note that some new instructions are (Note that some new instructions are introduced)introduced)

Conclusion: mixed length good!Conclusion: mixed length good!Source: Microprocessor Report, June 2003

29 November 2002 Embedded Computer Architecture 61

Instruction Sets: Code SizeInstruction Sets: Code Size

Compiler makes a differenceCompiler makes a differenceCompiler

ProgramA B C D

1 4316 4929 4974 5214

2 16826 18176 26705 15968

3 1632 2594 3450 3244

4 5514 13804 22694 15000+

Source: IAR Internal Benchmarking

29 November 2002 Embedded Computer Architecture 62

Instruction Sets: SIMDInstruction Sets: SIMD

Many applications see gains Many applications see gains from SIMD/Vector computationfrom SIMD/Vector computation

Add SIMD to regular ISAAdd SIMD to regular ISAMotorola Motorola AltivecAltivecARM SIMD extensionsARM SIMD extensionsMIPS have it tooMIPS have it toox86 MMXx86 MMX--SSESSE--SSE2SSE2--3Dnow!3Dnow!SPARC VISSPARC VIS

29 November 2002 Embedded Computer Architecture 63

Instruction Sets: SIMDInstruction Sets: SIMDTargetTarget

MotorolaMotorolaPPC 7455 (G4+)PPC 7455 (G4+)1 1 GhzGhz

EEMBC EEMBC TelemarkTelemark suitesuiteNetworking suiteNetworking suite

OOTB:OOTB:OutOut--ofof--thethe--boxbox

OPT:OPT:Manually tuned to use Manually tuned to use AltivecAltivec

Overall/Average:Overall/Average:33--4 times speed up 4 times speed up can be expectedcan be expected

35,1

0

1

2

3

4

5

6

7

8

9

10

Auto

corr

1

Convo

lutio

n 1

Bit

allo

c 1

FF

T 1

Vite

rbi 1

OS

PF

1

Route

1

Pack

et 5

12

OOTB OPT

29 November 2002 Embedded Computer Architecture 64

Instruction Sets: DSPInstruction Sets: DSP

Pure Pure DSPsDSPsNot additions to regular Not additions to regular ISAsISAs

Very specialized for DSP workVery specialized for DSP workKnown & narrow class of problemsKnown & narrow class of problemsOptimize for particular algorithmsOptimize for particular algorithms

CategoriesCategoriesVLIW vs. Regular VLIW vs. Regular Fixed vs. Floating PointFixed vs. Floating PointStationary vs. MobileStationary vs. Mobile

29 November 2002 Embedded Computer Architecture 65

Instruction Sets: DSPInstruction Sets: DSP

TI C64xxTI C64xxFixedFixed--point, 8point, 8--way VLIWway VLIW700700--1000 1000 MhzMhz, “Fastest DSP”, “Fastest DSP”Stationary applicationsStationary applications

TI C55xxTI C55xxSingle pipeline, complex instructionsSingle pipeline, complex instructionsUp to 300 Mhz approx.Up to 300 Mhz approx.Mobile phonesMobile phones

29 November 2002 Embedded Computer Architecture 66

Instruction Sets: DSPInstruction Sets: DSP

Assume very regular workloadsAssume very regular workloadsZeroZero--overhead loop instructionsoverhead loop instructionsBuilt to wade through large data setsBuilt to wade through large data sets

Register setsRegister setsAccumulators (often 40 bits)Accumulators (often 40 bits)Data registers (often 16 bits)Data registers (often 16 bits)Address registers (16 to 32 bits)Address registers (16 to 32 bits)

Addressing modesAddressing modesIndex registersIndex registersPost & Post & preincrementpreincrementBitBit--reverse addressingreverse addressingGoal: more parallelizable work per instructionGoal: more parallelizable work per instruction

29 November 2002 Embedded Computer Architecture 67

Instruction Sets: DSPInstruction Sets: DSP

Example instructions from C55:Example instructions from C55:”Finite impulse response filter””Finite impulse response filter”

FIRSADD Xmem, Ymem, Cmem, ACx, ACyFIRSADD Xmem, Ymem, Cmem, ACx, ACy

OperationOperation::ACy = ACy + (ACx * Cmem)ACy = ACy + (ACx * Cmem)ACx = (Xmem << #16) + (Ymem << #16)ACx = (Xmem << #16) + (Ymem << #16)

”Conditional add or sub””Conditional add or sub”ADDSUBCC Smem, ACx, TCx, ACyADDSUBCC Smem, ACx, TCx, ACy

Operation:Operation:If If TCxTCx = 1, then = 1, then ACyACy = = ACxACx + (+ (SmemSmem << #16)<< #16)If If TCxTCx = 0, then = 0, then ACyACy = = ACxACx -- ((SmemSmem << #16)<< #16)

Cmem, Xmem, Ymem: memory accesses +

address updating

CmemCmem, , XmemXmem, , YmemYmem: : memory accesses + memory accesses +

address updatingaddress updating

C55 DSP has three independent data buses, X, Y, and C

C55 C55 DSP has three DSP has three independent data independent data buses, X, Y, and Cbuses, X, Y, and C

Special condition register

Special Special condition condition registerregister

29 November 2002 Embedded Computer Architecture 68

Instruction Sets: ConfigureInstruction Sets: Configure

Configurable instruction setsConfigurable instruction setsAdapt to needs of applicationAdapt to needs of applicationUser can specialize the processorUser can specialize the processorLess waste on generalityLess waste on generalityFast evolution of instruction setsFast evolution of instruction sets

Traditionally:Traditionally:Chip manufacturers determine Chip manufacturers determine instruction sets aimed at some nicheinstruction sets aimed at some nicheSlow evolution of instruction setsSlow evolution of instruction sets

29 November 2002 Embedded Computer Architecture 69

Instruction Sets: ConfigureInstruction Sets: Configure

SubsetSubsettingtingThere is a limited and predefined set of There is a limited and predefined set of instructions availableinstructions availableEasy to compile for: restrict code Easy to compile for: restrict code gengenRemove instructions to simplify coreRemove instructions to simplify core

AdditionAdditionFFreedomreedom to to invent instructionsinvent instructionsTool chain: assemblyTool chain: assembly, C compilers, C compilersGenuine development of Genuine development of ISAsISAs

29 November 2002 Embedded Computer Architecture 70

Configurable Instruction SetsConfigurable Instruction Sets

Tight integration:Tight integration:Add to regular pipelineAdd to regular pipelineAdditional functional unitsAdditional functional unitsAdding fineAdding fine--grained instructionsgrained instructions

Loose integration:Loose integration:Coprocessor interfaceCoprocessor interfaceSlower communicationSlower communicationOffloading of macroOffloading of macro--scale tasksscale tasksMethod to invoke accelerator circuitsMethod to invoke accelerator circuits

29 November 2002 Embedded Computer Architecture 71

Configurability TrendConfigurability Trend

PioneersPioneersTensilicaTensilica XtensaXtensaArc ArctangentArc ArctangentConfigurability as key selling pointConfigurability as key selling point

Added to general architecturesAdded to general architecturesMIPS: “MIPS: “CorExtendCorExtend””PowerPC: “PowerPC: “BookEBookE ASU”ASU”Usually less tight integrationUsually less tight integration

29 November 2002 Embedded Computer Architecture 72

Benefit of ConfigurabilityBenefit of ConfigurabilityTargetTarget

XtensaXtensa IIIIII200 200 MhzMhz

EEMBC EEMBC TelemarkTelemark suitesuiteNetworking suiteNetworking suite

OOTB:OOTB:OutOut--ofof--thethe--boxbox25k gate core25k gate core

OPT:OPT:Tuned codeTuned code25k base core gates25k base core gates18k extra 18k extra instrinstr gatesgates100k DSP 100k DSP coproccoproc37k 37k configconfig gatesgates

SpeedupsSpeedups

Benchmark OOTB OPT Telemark overall 1 37

Autocorr 1 9

Convolution 1 1249

Bit alloc 1 34

FFT 1 24 Viterbi GSM 1 14

29 November 2002 Embedded Computer Architecture 73

ConfConfiguration Toolsiguration Tools

instruction set choices

Gate and memory size

counters

29 November 2002 Embedded Computer Architecture 74

Memory Memory SystemsSystems

29 November 2002 Embedded Computer Architecture 75

Microcontroller MemoryMicrocontroller Memory

RAM:RAM:Small (32 bytes and up)Small (32 bytes and up)Stacks, variables, loaded codeStacks, variables, loaded code

ROM ROM (or FLASH)(or FLASH)::Large (2kB and up)Large (2kB and up)Programs, constant dataPrograms, constant data

NonNon--volatile memoryvolatile memorySettings, writable code areasSettings, writable code areas

29 November 2002 Embedded Computer Architecture 76

Typical Memory TypesTypical Memory Types

CPUCore

Icache

ROM / FLASH

Dcache

TCMSpecial

memory, like CAM

RAM

EEPROM

L2 CacheExternal FLASH

External RAM

TCM

29 November 2002 Embedded Computer Architecture 77

CachesCaches

Used on highUsed on high--end partsend parts3232--bit, 64bit, 64--bit, bit, DSPsDSPs

Not like desktop cachesNot like desktop cachesSmaller, usually only single levelSmaller, usually only single levelOften high (128 ways) assocOften high (128 ways) assoc

Due to lockingDue to locking

LockableLockableSets or lines can be locked in cacheSets or lines can be locked in cacheImprove predictabilityImprove predictability

Icache

Dcache

L2 Cache

29 November 2002 Embedded Computer Architecture 78

TightlyTightly--Coupled MemoryCoupled Memory

Used on highUsed on high--end partsend partsHolds data and/or instructionsHolds data and/or instructions

Instead of/in addition to cachesInstead of/in addition to cachesProgrammerProgrammer--controlledcontrolledFast & close like cachesFast & close like cachesIn memory map, or tagged like cachesIn memory map, or tagged like caches

Multiple banksMultiple banksBetter bandwidthBetter bandwidthWork in one, DMA data to/from otherWork in one, DMA data to/from other

Special memories:Special memories:ContentContent--addressable memory (CAM)addressable memory (CAM)Needs for particular applicationsNeeds for particular applications

TCM

TCM

CAM

29 November 2002 Embedded Computer Architecture 79

OOnn--Chip RAMChip RAM

HighHigh--end partsend partsUsually cachedUsually cachedFaster & cheaper than offFaster & cheaper than off--chip memorychip memory

LowLow--end partsend partsOnly data memory availableOnly data memory availableSpecial “zeroSpecial “zero--page” memorypage” memory

ZeroZero--page on 8page on 8--bit bit MCUsMCUsSmall memory with singleSmall memory with single--cyclecycle accessaccessUsually 8Usually 8--bit index, contains 256 bytesbit index, contains 256 bytesSmall, fast, instructions access the memorySmall, fast, instructions access the memoryOften useable as extension to register setOften useable as extension to register set

RAM

29 November 2002 Embedded Computer Architecture 80

FLASH/ROMFLASH/ROM

Code storage onCode storage on--chipchipFLASH: FLASH:

Speed like regular RAMSpeed like regular RAMRewritable, typically 1000 times or moreRewritable, typically 1000 times or more

ROM: ROM: Must be put in silicon masksMust be put in silicon masksLonger turnLonger turn--around timearound timeGuaranteed not to changeGuaranteed not to change

FLASH is replacing ROMs, fastFLASH is replacing ROMs, fast

ROM / FLASH

External FLASH

29 November 2002 Embedded Computer Architecture 81

EEPROMEEPROM””ElectricallyElectrically--Erasable Programmable ReadErasable Programmable Read--Only Memory”Only Memory”

Only writable persistent memory until FLASH Only writable persistent memory until FLASH appearedappearedInfinitely rewritableInfinitely rewritable

Store persistent dataStore persistent dataUser settingsUser settingsEncryption keysEncryption keysPhone numbers etc.Phone numbers etc.

Being replaced with FLASHBeing replaced with FLASHFaster & easier to writeFaster & easier to writeCheaper to manufacture, higher capacityCheaper to manufacture, higher capacity

EEPROM

29 November 2002 Embedded Computer Architecture 82

Memory ArchitectureMemory Architecture

Narrow busesNarrow busesSaves silicon area, power, complexitySaves silicon area, power, complexityEEspecially to offspecially to off--chip memorychip memory(Not true for some high(Not true for some high--performance parts)performance parts)

Small registers Small registers == small pointerssmall pointers1616--bit register can only hold 16bit register can only hold 16--bits bits Extend addressing using tricksExtend addressing using tricks

Banks (separate bank register)Banks (separate bank register)Segments (base + offset)Segments (base + offset)Memory remapping (virtual memory)Memory remapping (virtual memory)

29 November 2002 Embedded Computer Architecture 83

Hierarchy of PointersHierarchy of Pointers

FarFar

NearNear

TinyTiny

8 b

its

24 b

its

16 b

its

= on= on--chip zerochip zero--pagepage

88--bit/16bit/16--bitbit

Design for Design for small pointerssmall pointers

Visible to Visible to programmerprogrammer

Data placement Data placement Pointer typesPointer typesLinker & Linker & compilercompiler

29 November 2002 Embedded Computer Architecture 84

Memory ArchitectureMemory Architecture

HarvardHarvardTwo or more address spaces Two or more address spaces

ProgramProgramDataDataAccompanied by physical separationAccompanied by physical separation

Sometimes even more dividedSometimes even more divided

NULL pointerNULL pointer implementation?implementation?All addresses are valid... All addresses are valid...

29 November 2002 Embedded Computer Architecture 85

Memory ArchitectureMemory Architecture

Example: ATMEL AVR 8Example: ATMEL AVR 8--bbit MCUit MCU

FarFar

NearNear

TinyTiny256 B256 BRAMRAM

RegistersRegistersI/OI/O

Code spaceCode space

DDataata spspaceace

Read constants?Read constants?Different instructions!Different instructions!Very slow processVery slow processHard to compile forHard to compile forCopy to RAM?Copy to RAM?

FarFar

NearNear

29 November 2002 Embedded Computer Architecture 86

Banked MemoryBanked Memory

Extend addressing beyond N bitsExtend addressing beyond N bitsLike 8086/80286 segmentsLike 8086/80286 segments

Concept:Concept:Separate memoriesSeparate memoriesMapped to same set of addressesMapped to same set of addressesOne memory at a time accessibleOne memory at a time accessibleEasier for code than for dataEasier for code than for data

Selecting banks:Selecting banks:Write value in bankWrite value in bank--switch registerswitch register

29 November 2002 Embedded Computer Architecture 87

Bank 0Bank 0

_bank0

__bank

_bank1 _bank2 _bank3

8 bi

ts

16 b

its

Code Code memorymemory

16 b

its

__constptr Synthetic pointer to any

data bank

Hardware pointer = efficient

Bank 1Bank 1 Bank 2Bank 2 Bank 3Bank 3

Banked MemoryBanked Memory

Example: Microchip PIC familyExample: Microchip PIC family

29 November 2002 Embedded Computer Architecture 88

PowerPowerAspectsAspects

29 November 2002 Embedded Computer Architecture 89

Why is Power Important?Why is Power Important?

BatteryBattery--powered applicationspowered applicationsLonger battery life is very desirableLonger battery life is very desirable

Automotive applicationsAutomotive applicationsElectronics consumes up 30% of fuel!Electronics consumes up 30% of fuel!

LowLow--maintenance applicationsmaintenance applicationsPower=heat=cooling=moving partsPower=heat=cooling=moving parts

Server farmsServer farmsCooling & electricity Cooling & electricity are big costsare big costs

29 November 2002 Embedded Computer Architecture 90

CMOS PowerCMOS Power

Power Power = area*clock*voltage= area*clock*voltage22

Area: transistors that are switchingArea: transistors that are switchingClock: speed of switchingClock: speed of switchingVoltage: to keep runningVoltage: to keep running

Save power by minimizingSave power by minimizingClock speedClock speedActive areaActive areaFeed voltageFeed voltage

29 November 2002 Embedded Computer Architecture 91

Area ReductionArea Reduction

Simpler chips use less powerSimpler chips use less power88--bit CPUsbit CPUsSimple RISC like ARMSimple RISC like ARMVLIW instead of superscalarVLIW instead of superscalar

Turn off inactive unitsTurn off inactive unitsPipelines that are not usedPipelines that are not usedOnOn--chip memory and cacheschip memory and cachesSleep/Nap/Idle modes on CPUSleep/Nap/Idle modes on CPU

Remove unnecessary featuresRemove unnecessary features

29 November 2002 Embedded Computer Architecture 92

Clock SpeedsClock Speeds

Clock and voltage relatedClock and voltage relatedHigher operating frequency Higher operating frequency requires higher voltagerequires higher voltage

Use lower clock speedsUse lower clock speedsReduce speed until app barely worksReduce speed until app barely works

Use more processorsUse more processors1/2 speed = 1/4 power1/2 speed = 1/4 power2 CPUs @ 100 2 CPUs @ 100 MhzMhz = 1 CPU @ 200 = 1 CPU @ 200 MhzMhz, , but requires half the powerbut requires half the power

29 November 2002 Embedded Computer Architecture 93

Dynamic Voltage ScalingDynamic Voltage Scaling

Adjust CPU speed to workloadAdjust CPU speed to workloadReduce operating voltage when clock Reduce operating voltage when clock speed is reducedspeed is reducedCubic power savings possible!Cubic power savings possible!Analyze load to determine speed Analyze load to determine speed More advanced than sleep modesMore advanced than sleep modes

Special hardware required:Special hardware required:TransmetaTransmeta CrusoeCrusoe was was a a pioneerpioneerIntel Intel XscaleXscale, getting , getting commoncommon

29 November 2002 Embedded Computer Architecture 94

Power, Power, VoltageVoltage, , FrequencyFrequency

0

200

400

600

800

1000

1200

1400

1600

1800

400

600

800

1000

1200

Voltage (mV)

Power (mW)

Samsung HallaSamsung HallaARM 1020E ARM 1020E corecore66--stage stage pipeline (!)pipeline (!)0.13 0.13 um um process process Clock: Clock: 400 400 Mhz Mhz to 1.2 to 1.2 GhzGhz

(source: Microprocessor Report, Oct 16, 2002)

3x clock freq, 9x power!

29 November 2002 Embedded Computer Architecture 95

Other FactorsOther Factors

Manufacturing process:Manufacturing process:Smaller features=lower powerSmaller features=lower power

(0.13 micron mobile PIII, for example)(0.13 micron mobile PIII, for example)

Tweak process for lower powerTweak process for lower power

Development effort:Development effort:Tweak the lowTweak the low--level chip layoutlevel chip layout

(Classic (Classic StrongARMStrongARM))“Assembly language programming”“Assembly language programming”

Cannot be synthesized efficientlyCannot be synthesized efficiently= Not possible for IP blocks= Not possible for IP blocks

29 November 2002 Embedded Computer Architecture 96

System IssueSystem Issue

Much more than CPUMuch more than CPUDisplays, Displays, LEDsLEDs, … , … Memory, Disks, …Memory, Disks, …Radio interfaces, Networks, …Radio interfaces, Networks, …

Turn off unused peripheralsTurn off unused peripheralsGSM phones: 300 hours standby GSM phones: 300 hours standby vs. 60 minutes talk time vs. 60 minutes talk time Ericsson: reduce Ericsson: reduce frequency of LED blinkfrequency of LED blink

29 November 2002 Embedded Computer Architecture 97

Memory & PowerMemory & Power

Large area=high power:Large area=high power:Use smallest possible memoryUse smallest possible memory

Talking to DRAM is expensiveTalking to DRAM is expensiveUse onUse on--chip SRAM/ROMchip SRAM/ROMReduce external memory activityReduce external memory activityUse caches to keep activity internalUse caches to keep activity internal

Use energyUse energy--efficient efficient RAMsRAMsLowLow--power DRAM is coming!power DRAM is coming!RAMBUS is horribleRAMBUS is horrible

29 November 2002 Embedded Computer Architecture 98

Memory & PowerMemory & Power

codememory

datamemory

energy(nJ)

ratio

off-chip off-chip 115.8 100%

off-chip on-chip 51.6 44.6%

on-chip off-chip 76.5 66.1%

on-chip on-chip 16.4 14.2%

Power for a LOAD instruction on an Power for a LOAD instruction on an ARM7 development boardARM7 development board

Source: Compilation Techniques for Energy-, Code-Size-, and Run-Time Efficient Embedded Software (Marwedel et al 2001)

29 November 2002 Embedded Computer Architecture 99

Future Issues: StaticFuture Issues: Static

Dynamic PowerDynamic PowerDissipated when circuits activeDissipated when circuits activeDiscussion so farDiscussion so farDominant down to 0.13µ Dominant down to 0.13µ

Static Power Static Power ”Leakage current””Leakage current”Becoming significant at < 0.13µ Becoming significant at < 0.13µ Much harder to reduceMuch harder to reduce

Major problem looming!Major problem looming!

29 November 2002 Embedded Computer Architecture 100

Closing Closing RemarksRemarks

29 November 2002 Embedded Computer Architecture 101

This is where the action is!This is where the action is!

Fragmented marketFragmented marketNo dominant big player like PCsNo dominant big player like PCsIncredibly wide span of productsIncredibly wide span of products

TailorTailor--made, not massmade, not mass--producedproducedEverybody searches for perfect fitEverybody searches for perfect fit

High innovation in comp archHigh innovation in comp arch

Large number of new playersLarge number of new players

29 November 2002 Embedded Computer Architecture 102

AbbreviationsAbbreviations

DSP DSP Digital Signal ProcessorDigital Signal Processor

NPUNPUNetwork Processing UnitNetwork Processing Unit

MCUMCUMicrocontroller UnitMicrocontroller Unit

ASICASICApplicationApplication--Specific Integrated CircuitSpecific Integrated Circuit

FPGAFPGAFieldField--Programmable Gate ArrayProgrammable Gate Array