Advances in Designing Clockless DDigital Systemsigital Systemsnowick/async-overview-pt1-revd.pdf ·...
Transcript of Advances in Designing Clockless DDigital Systemsigital Systemsnowick/async-overview-pt1-revd.pdf ·...
Advances in Designing Clockless Digital Systems
Advances in Designing Advances in Designing ClocklessClockless Digital SystemsDigital Systems
Prof. Steven M. Prof. Steven M. [email protected]@cs.columbia.edu
Department of Computer ScienceDepartment of Computer ScienceColumbia UniversityColumbia UniversityNew York, NY, USANew York, NY, USA
#2
IntroductionIntroduction
Synchronous vs. Asynchronous Systems?Synchronous vs. Asynchronous Systems?
Synchronous Systems:Synchronous Systems: use a use a global clockglobal clockentire system operates entire system operates at fixedat fixed--raterate
uses uses “centralized control”“centralized control”
clock
#3
Introduction (cont.)Introduction (cont.)
Synchronous vs. Asynchronous Systems? (cont.)Synchronous vs. Asynchronous Systems? (cont.)
Asynchronous Systems:Asynchronous Systems: no global clockno global clock
components can operate atcomponents can operate at varying ratesvarying rates
communicate locallycommunicate locally via via “handshaking”“handshaking”
uses uses “distributed control” “distributed control”
“handshakinginterfaces”(channels)
#4
Trends and ChallengesTrends and Challenges
Trends in Chip Design: Trends in Chip Design: next decadenext decade“Semiconductor Industry Association (SIA) Roadmap”“Semiconductor Industry Association (SIA) Roadmap” (97(97--8)8)
Unprecedented Challenges:Unprecedented Challenges:complexity and scale (= size of systems)complexity and scale (= size of systems)
clock speedsclock speeds
power managementpower management
reusability & scalabilityreusability & scalability
“time“time--toto--market”market”
Design becoming unmanageable using a centralized Design becoming unmanageable using a centralized single clock (synchronous) approach….single clock (synchronous) approach….
#5
Trends and Challenges (cont.)Trends and Challenges (cont.)
1. Clock Rate:1. Clock Rate:
1980: 1980: several several MegaHertzMegaHertz
2001: 2001: ~750 ~750 MegaHertzMegaHertz -- 1+ 1+ GigaHertzGigaHertz2005:2005: several several GigaHertzGigaHertz
Design Challenge:Design Challenge:
“clock skew”:“clock skew”: clock must be clock must be nearnear--simultaneoussimultaneous across across entire chipentire chip
#6
Trends and Challenges (cont.)Trends and Challenges (cont.)
2. Chip Size and Density:2. Chip Size and Density:
Total #Transistors per Chip: Total #Transistors per Chip: 6060--80% increase/year80% increase/year~1970: ~1970: 4 thousand4 thousand (Intel 4004 microprocessor)(Intel 4004 microprocessor)
today: today: 5050--200+ million200+ million
2006 and beyond:2006 and beyond: towards 1towards 1 billion+billion+
Design Challenges:Design Challenges:system complexity, design time, clock distributionsystem complexity, design time, clock distributionclock will require 10clock will require 10--20 cycles to reach across chip20 cycles to reach across chip
#7
Trends and Challenges (cont.)Trends and Challenges (cont.)
3. Power Consumption3. Power Consumption
Low power: everLow power: ever--increasing demandincreasing demand
consumer electronics:consumer electronics: batterybattery--poweredpowered
highhigh--end processors:end processors: avoid expensive fans, packagingavoid expensive fans, packaging
Design Challenge:Design Challenge:
clock clock inherentlyinherently consumes power consumes power continuouslycontinuously
“power“power--down” techniques: complex, only partly effectivedown” techniques: complex, only partly effective
#8
Trends and Challenges (cont.)Trends and Challenges (cont.)
4. Time4. Time--toto--Market, Design ReMarket, Design Re--Use, ScalabilityUse, Scalability
Increasing pressure for faster Increasing pressure for faster “time“time--toto--market”.market”. Need:Need:reusable components:reusable components: “plug“plug--andand--play” designplay” design
flexible interfacing:flexible interfacing: under varied conditions, voltage scalingunder varied conditions, voltage scaling
scalable design:scalable design: easy system upgradeseasy system upgrades
Design Challenge:Design Challenge: mismatch w/ central fixedmismatch w/ central fixed--rate clockrate clock
#9
Trends and Challenges (cont.)Trends and Challenges (cont.)
5. Future Trends: “Mixed Timing” Domains5. Future Trends: “Mixed Timing” Domains
Chips themselves becoming Chips themselves becoming distributed systems….distributed systems….contain many subcontain many sub--regions, regions, operating at different speeds:operating at different speeds:
Design Challenge:Design Challenge: breakdown of single centralizedbreakdown of single centralizedclock controlclock control
#10
Asynchronous Design: Potential AdvantagesAsynchronous Design: Potential Advantages
Several Potential Advantages:Several Potential Advantages:
Lower PowerLower Powerno clockno clock components use power only “on demand”components use power only “on demand”
Robustness, ScalabilityRobustness, Scalability
no global timingno global timing “mix“mix--andand--match” variablematch” variable--speed componentsspeed components
composablecomposable/modular design style /modular design style “object“object--oriented”oriented”
Higher PerformanceHigher Performance
systems not limited to “worstsystems not limited to “worst--case” clock ratecase” clock rate
#11
Asynchronous Design: Some Recent DevelopmentsAsynchronous Design: Some Recent Developments
1. Philips Semiconductors:1. Philips Semiconductors:commercial use: 100 million commercial use: 100 million asyncasync chips for consumer electronics:chips for consumer electronics:
pagers, cell phones, smart cards, digital passports, automotive pagers, cell phones, smart cards, digital passports, automotive 33--4x lower power4x lower power,, less electromagnetic interference (“EMI”)less electromagnetic interference (“EMI”)
2. Intel:2. Intel:experimental: Pentium instructionexperimental: Pentium instruction--length decoder = “RAPPID” (1990’s)length decoder = “RAPPID” (1990’s)33--4x faster 4x faster than synchronous subsystemthan synchronous subsystem
3. Sun Labs:3. Sun Labs:commercial use: highcommercial use: high--speed speed FIFO’sFIFO’s in recent “Ultra’s” (memory access)in recent “Ultra’s” (memory access)
4. IBM Research:4. IBM Research:experimental: highexperimental: high--speed pipelines, filters, mixedspeed pipelines, filters, mixed--timing systemstiming systems
Recent Startups:Recent Startups: Fulcrum, Fulcrum, TheseusTheseus Logic, Handshake Solutions, Logic, Handshake Solutions, SilistrixSilistrix
#12
Asynchronous CAD Tools: Recent DevelopmentsAsynchronous CAD Tools: Recent Developments
DARPA’sDARPA’s “CLASS” Program:“CLASS” Program: ClocklessClockless InitiativeInitiative (2003(2003--07)07)
Goals:Goals:-- CAD tool: CAD tool: produce viable produce viable commercialcommercial--grade grade asyncasync tool flowtool flow-- demonstration: demonstration: a complex Boeing ASIC chipa complex Boeing ASIC chip
Participants:Participants:Lead (PI):Lead (PI): BoeingBoeingIndustrial participants:Industrial participants:
Philips (via Philips (via asyncasync incubated startup, “Handshake Solutions”)incubated startup, “Handshake Solutions”)TheseusTheseus Logic, Logic, CodetronixCodetronix
Academic participants:Academic participants:Columbia, UNC, UW, Yale, OSUColumbia, UNC, UW, Yale, OSU
Targets:Targets: cover wide “design space” cover wide “design space” –– very robust to highvery robust to high--speed circuitsspeed circuits
Columbia’s role:Columbia’s role: (i) high(i) high--speed pipelines, (ii) CAD optimizationsspeed pipelines, (ii) CAD optimizations
#13
Asynchronous Design: ChallengesAsynchronous Design: Challenges
Critical Design Issues:Critical Design Issues:
components must components must communicate cleanly:communicate cleanly: ‘hazard‘hazard--free’ designfree’ design
highlyhighly--concurrent designs:concurrent designs: much harder to verify!much harder to verify!
Lack of Automated “ComputerLack of Automated “Computer--Aided Design” Tools:Aided Design” Tools:
most commercial “CAD” tools targeted to synchronous most commercial “CAD” tools targeted to synchronous
#14
What Are CAD Tools?What Are CAD Tools?
Software programs to aid digital designers = Software programs to aid digital designers = “computer“computer--aided design” toolsaided design” tools
automatically automatically synthesize synthesize and and optimizeoptimize digital circuitsdigital circuits
Output:optimized circuit
implementation
Input:desired circuit
specificationCADTOOL
#15
Asynchronous Design ChallengeAsynchronous Design Challenge
Lack of Existing Asynchronous Design Tools:Lack of Existing Asynchronous Design Tools:
Most commercial “CAD” tools targeted to synchronousMost commercial “CAD” tools targeted to synchronous
Synchronous CAD tools: Synchronous CAD tools: major drivers of growth in microelectronics industry major drivers of growth in microelectronics industry
Asynchronous “chickenAsynchronous “chicken--andand--egg” problem:egg” problem:
few CAD tools few CAD tools less commercial use of less commercial use of asyncasync designdesign
especially lacking: tools for designing/especially lacking: tools for designing/optmzngoptmzng. large systems. large systems
#16
Overview: My Research AreasOverview: My Research Areas
CAD Tools for Asynchronous Controllers (CAD Tools for Asynchronous Controllers (FSM’sFSM’s))“MINIMALIST” Package“MINIMALIST” Package: for synthesis + optimization: for synthesis + optimization
Other Research Areas:Other Research Areas:
CAD Tools for Designing LargeCAD Tools for Designing Large--Scale Scale AsyncAsync SystemsSystems
MixedMixed--Timing Interface Circuits:Timing Interface Circuits:
for interfacing sync/for interfacing sync/asyncasync systemssystems
HighHigh--Speed Asynchronous PipelinesSpeed Asynchronous Pipelines
#17
CAD Tools for CAD Tools for AsyncAsync ControllersControllers
MINIMALIST:MINIMALIST: developed at Columbia University [1994developed at Columbia University [1994--] ] extensible CAD package for synthesis of extensible CAD package for synthesis of asynchronous controllersasynchronous controllersintegrates synthesis, optimization and verification toolsintegrates synthesis, optimization and verification toolsused in 80+ sites/17+ countries (being taught in IIT Bombay)used in 80+ sites/17+ countries (being taught in IIT Bombay)URL:URL: httphttp://://www.cs.columbia.edu/asyncwww.cs.columbia.edu/async
Includes several optimization tools: Includes several optimization tools: State MinimizationState MinimizationCHASM: CHASM: optimal state encodingoptimal state encoding22--Level HazardLevel Hazard--Free Logic MinimizationFree Logic MinimizationVerilogVerilog backback--endend
Key goal: Key goal: facilitate designfacilitate design--space explorationspace exploration
#18
Example: “PEExample: “PE--SENDSEND--IFC” (HP Labs)IFC” (HP Labs)Inputs:req-sendtreqrd-iqadbld-outack-pkt
Outputs:tackpeackadbld
0
1
2
7
3
4
5
6
8
9
10
req-send+ treq+ rd-iq+/adbld+
adbld-out+/peack+
rd-iq-/peack- adbld-
tack+
adbld-out- treq-rd-id+/ adbld+
adbld-out+/peack+
rd-iq-/ peack-adbld- tack-
adbld-out- treq+ ack-pkt+/ peack+ tack+
ack-pkt- treq-/peack- tack-
treq-/tack-
treq+/tack+
ack-pkt+/peack- tack-
adbld-out-treq- ack-pkt+/
peack+
req-send-/--
adbld-out-treq+ rd-iq+/
adbld+
From HP Labs“Mayfly” Project:
B.Coates, A.Davis, K.Stevens,“The Post Office
Experience: Designing a Large Asynchronous Chip”,
INTEGRATION: the VLSI Journal, vol. 15:3,pp. 341-66 (Oct. 1993)
#19
EXAMPLE (cont.):EXAMPLE (cont.):
Examples:
Design-Space Explorationusing MINIMALIST:
optimizing for area vs. speed
#20
CAD Tools for LargeCAD Tools for Large--Scale Asynchronous SystemsScale Asynchronous Systems
Target Architecture:Target Architecture:Input Specification:Input Specification:= “Control Data= “Control Data--flow Graph”flow Graph”
RegisterRegister RegisterRegister
Functional Functional UnitUnit
Functional Functional UnitUnit
Functional Functional UnitUnit
CtrlrCtrlr 11
CtrlrCtrlr 22
CtrlrCtrlr 33
control unitcontrol unit
StartStart
LoopLoop C< 0C< 0
B:=2dx+dxB:=2dx+dx C:=X<aC:=X<a
M:=U*X1M:=U*X1
EndloopEndloop
EndEnd
X:=X:=X+dxX+dx C:=X<aC:=X<a
Target:Target:-- synthesize synthesize distributed control distributed control -- 1 controller per functional unit1 controller per functional unit[Theobald/Nowick, IEEE Design Automation Conf. (2001)]
#21
MixedMixed--Timing InterfacesTiming Interfaces
AsynchronousDomain
SynchronousDomain 2
AsynchronousDomain
SynchronousDomain 1
Goal: provide low-latency communication between “timing domains”
Challenge: avoid synchronization errors
Goal: provide low-latency communication between “timing domains”
Challenge: avoid synchronization errors
#22
MixedMixed--Timing Interfaces: SolutionTiming Interfaces: SolutionAsync-Sync FIFO
AsynchronousDomain
SynchronousDomain 1
SynchronousDomain 2
Asy
nc-S
ync
FIFO
Sync
-Asy
ncFI
FO
… developed complete family of mixed-timing interface circuits[Chelcea/Nowick, IEEE Design Automation Conf. (2001)]
… developed complete family of mixed-timing interface circuits[Chelcea/Nowick, IEEE Design Automation Conf. (2001)]
Solution: insert mixed-timing FIFO’s ⇒ provide safe data transferSolution: insert mixed-timing FIFO’s ⇒ provide safe data transfer
AsynchronousDomain
Mixed-Clock FIFO’s
#23
HighHigh--Speed Asynchronous PipelinesSpeed Asynchronous Pipelines
global clock
NON-PIPELINED COMPUTATION: “datapath component” =adder, multiplier, etc.
SYNCHRONOUS
#24
HighHigh--Speed Asynchronous PipelinesSpeed Asynchronous Pipelines
global clock
SYNCHRONOUS
“PIPELINED COMPUTATION”: like an assembly line
no global clock ASYNCHRONOUS
#25
HighHigh--Speed Asynchronous PipelinesSpeed Asynchronous Pipelines
Goal: Goal: extremely fast extremely fast asyncasync datapathdatapath componentscomponents
speed:speed: comparable to comparable to fastestfastest existing synchronous designsexisting synchronous designs
additional benefits:additional benefits:
dynamically adapt to variabledynamically adapt to variable--speed interfaces: speed interfaces: voltage scaling!voltage scaling!
“elastic” processing of data in pipeline“elastic” processing of data in pipeline
no clock distributionno clock distribution
Contributions: Contributions: 3 new 3 new asyncasync pipeline stylespipeline styles [SINGH/NOWICK][SINGH/NOWICK]
MOUSETRAP:MOUSETRAP: static logicstatic logicHighHigh--Capacity/Capacity/LookaheadLookahead:: dynamic logicdynamic logic
Obtain multiObtain multi--GigaHertzGigaHertz speedsspeedsUsed by IBM, currently incorporated into Philips tool flowUsed by IBM, currently incorporated into Philips tool flow
#26
MOUSETRAP: A Basic FIFO (no computation)MOUSETRAP: A Basic FIFO (no computation)
Stages communicate using Stages communicate using transitiontransition--signaling:signaling:
reqN
ackN-1
reqN+1
ackN
Data Latch
Latch Controller
doneN
Data in Data out
En
Stage NStage N-1 Stage N+1
[Singh/Nowick, IEEE Int. Conf. on Computer Design (2001)]
#27
“MOUSETRAP” Pipeline: w/computation“MOUSETRAP” Pipeline: w/computation
logicData Latch
Latch Controller
doneN
logiclogic
reqreqNN
ackN-1
reqreqN+N+11
ackN
Stage N+1
delaydelay
Stage N
delaydelaydelaydelay
Stage N-1
Function Blocks:Function Blocks: use “synchronous” singleuse “synchronous” single--rail circuits (not hazardrail circuits (not hazard--free!)free!)“Bundled Data” Requirement:“Bundled Data” Requirement:
each each ““reqreq”” must arrive must arrive afterafter data inputs valid and stabledata inputs valid and stable
#28
#29
MOUSETRAP: A Basic FIFOMOUSETRAP: A Basic FIFOStages communicate using Stages communicate using transitiontransition--signaling:signaling:
reqN
ackN-1
reqN+1
ackN
Data Latch
Latch Controller
doneN
Data in Data out
En
1 transition1 transitionper data item!per data item!
Stage NStage N-1 Stage N+1
One Data Item