1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems,...
-
date post
22-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of 1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems,...
11
Bridging the gap between Bridging the gap between asynchronous designasynchronous design
and designersand designers
Peter A. BeerelPeter A. Beerel Fulcrum Microsystems,Fulcrum Microsystems,Calabasas Hills, CA, USACalabasas Hills, CA, USA
Jordi CortadellaJordi Cortadella Universitat PolitUniversitat Politèècnica decnica deCatalunya, Barcelona, SpainCatalunya, Barcelona, Spain
Alex KondratyevAlex Kondratyev Cadence Berkeley Labs,Cadence Berkeley Labs,Berkeley, CA, USABerkeley, CA, USA
22
1.1. Basic concepts on asynchronous circuit designBasic concepts on asynchronous circuit design
Tea BreakTea Break
2.2. Logic synthesis from concurrent specificationsLogic synthesis from concurrent specifications
3.3. Synchronization of complex systemsSynchronization of complex systems
LunchLunch
4.4. Design automation for asynchronous circuitsDesign automation for asynchronous circuits
Tea BreakTea Break
5.5. Industrial experiencesIndustrial experiences
OutlineOutline
33
Basic concepts on Basic concepts on asynchronous circuit designasynchronous circuit design
44
OutlineOutline
What is an asynchronous circuit ?What is an asynchronous circuit ?
Asynchronous communicationAsynchronous communication
Asynchronous design styles (Micropipelines)Asynchronous design styles (Micropipelines)
Asynchronous logic building blocksAsynchronous logic building blocks
Control specification and implementationControl specification and implementation
Delay models and classes of async circuitsDelay models and classes of async circuits
Channel-based designChannel-based design
Why asynchronous circuits ?Why asynchronous circuits ?
55
Synchronous circuitSynchronous circuit
R R R RCL CL CL
CLK
Implicit (global) synchronization between blocksClock period > Max Delay (CL + R)
66
Asynchronous circuitAsynchronous circuit
R R R RCL CL CL
Req
Ack
Explicit (local) synchronization:Req / Ack handshakes
77
Motivation for asynchronousMotivation for asynchronous
Asynchronous design is often unavoidable: Asynchronous interfaces, arbiters etc.
Modern clocking is multi–phase and distributed –and virtually ‘asynchronous’ (cf. GALS – next slide):
Mesachronous (clock travels together with data) Local (possibly stretchable) clock generation
Robust asynchronous design flow is coming(e.g. VLSI programming from Philips, Balsa fromUniv. of Manchester, NCL from Theseus Logic …)
88
Globally Async Locally Sync (GALS)Globally Async Locally Sync (GALS)
Local CLK
R RCL
Async-to-sync Wrapper
Req1
Req2
Req3
Req4
Ack3
Ack4Ack2
Ack1
Asynchronous World
Clocked Domain
99
Key Design DifferencesKey Design Differences
Synchronous logic design:
proceeds without taking timing correctness(hazards, signal ack–ing etc.) into account
Combinational logic and memory latches(registers) are built separately
Static timing analysis of CL is sufficient todetermine the Max Delay (clock period)
Fixed set–up and hold conditions for latches
1010
Key Design DifferencesKey Design Differences
Asynchronous logic design: Must ensure hazard–freedom, signal ack–ing,
local timing constraints Combinational logic and memory latches (registers)
are often mixed in “complex gates” Dynamic timing analysis of logic is needed to
determine relative delays between paths
To avoid complex issues, circuits may be builtas Delay-insensitive and/or Speed-independent (as discussed later)
1111
Verification and Testing DifferencesVerification and Testing Differences
Synchronous logic verification and testing: Only functional correctness aspect is verified and tested Testing can be done with standard ATE and at low
speed (but high–speed may be required for DSM)
Asynchronous logic verification and testing: In addition to functional correctness, temporal aspect is
crucial: e.g. causality and order, deadlock–freedom Testing must cover faults in complex gates
(logic+memory) and must proceed at normal operation rate
Delay fault testing may be needed
1212
Synchronous communicationSynchronous communication
Clock edges determine the time instants where data must be sampled
Data wires may glitch between clock edges(set–up/hold times must be satisfied)
Data are transmitted at a fixed rate(clock frequency)
1 1 0 0 1 0
1313
Dual railDual rail
Two wires with L(low) and H (high) per bit “LL” = “spacer”, “LH” = “0”, “HL” = “1”
n–bit data communication requires 2n wires
Each bit is self-timed
Other delay-insensitive codes exist (e.g. k-of-n)and event–based signalling (choice criteria: pin and power efficiency)
1 1
0 0
1
0
1414
Bundled dataBundled data
Validity signal Similar to an aperiodic local clock
n–bit data communication requires n+1 wires
Data wires may glitch when no valid
Signaling protocols level sensitive (latch) transition sensitive (register): 2–phase / 4–phase
1 1 0 0 1 0
1515
Example: memory read cycleExample: memory read cycle
Transition signaling, 4-phase
Valid address
Address
Valid data
Data
A A
DD
1616
Example: memory read cycleExample: memory read cycle
Transition signaling, 2-phase
Valid address
Address
Valid data
Data
A A
DD
1717
Asynchronous modulesAsynchronous modules
Signaling protocol:Signaling protocol:
reqin+ start+ [reqin+ start+ [computationcomputation] done+ reqout+ ackout+ ackin+] done+ reqout+ ackout+ ackin+reqin- start- [reqin- start- [resetreset] done- reqout- ackout- ackin-] done- reqout- ackout- ackin-
(more concurrency is also possible)(more concurrency is also possible)
Data IN Data OUT
req in req out
ack in ack out
DATAPATH
CONTROL
start done
1818
Asynchronous latches: C elementAsynchronous latches: C element
CA
BZ
A B Z+
0 0 00 1 Z1 0 Z1 1 1
Vdd
Gnd
A
A
A
AB
B
B
B
Z
Z
Z
[van Berkel 91]
Static Logic Implementation
1919
C-element: Other implementationsC-element: Other implementations
A
A
B
B
Gnd
Vdd
Z
A
A
B
B
Gnd
Vdd
Z
Weak inverter
Quasi-StaticDynamic
2020
Dual-rail logicDual-rail logic
A.t
A.f
B.t
B.f
C.t
C.f
Dual-rail AND gate
Valid behavior for monotonic environment
2121
Completion detection Completion detection
Dual-rail logic
•••
•••
C done
Completion detection tree
2222
Differential cascode voltage switch logic Differential cascode voltage switch logic
start
start
A.t
B.t
C.t
A.fB.fC.f
Z.tZ.f
done
3––input AND/NAND gate
N-type transistor network
2323
Examples of dual-rail designExamples of dual-rail design
Asynchronous dual-rail ripple-carry adder(A. Martin, 1991)
Critical delay is proportional to logN(N=number of bits)
32–bit adder delay (1.6m MOSIS CMOS): 11 ns versus 40 ns for synchronous
Async cell transistor count = 34versus synchronous = 28
More recent success stories (modularity and automatic synthesis) of dual-rail logic fromNull-Convention Logic (Theseus Logic)
2424
Bundled-data logic blocks Bundled-data logic blocks
Single-rail logic
•••
•••
delaystart done
Conventional logic + matched delay
2525
Micropipelines Micropipelines (Sutherland 89)(Sutherland 89)
C
Join Merge
Toggle
r1
r2
g1
g2
d1
d2
Request-Grant-Done (RGD)Arbiter
Call
r1
r2
ra
a1
a2Select
inoutf
outt
sel
inout0
out1
Micropipeline (2-phase) control blocks
2626
Micropipelines (Sutherland 89)Micropipelines (Sutherland 89)
L L L Llogic logic logic
Rin
Aout
C C
C C
Rout
Aindelay
delay
delay
2727
Data-path / ControlData-path / Control
L L L Llogic logic logic
Rin RoutCONTROL AinAout
2828
Control specificationControl specification
A+
B+
A–
B–
A
B
A inputB output
2929
Control specificationControl specification
A+
B–
A–
B+
A B
3030
Control specificationControl specification
A+
C–
A–
C+A
C
B+
B– B
C
3131
Control specificationControl specification
A+
C–
A–
C+A
C
B+
B–B
C
3232
Control specificationControl specification
CC
Ri
Ro
Ai
Ao
Ri+
Ao+
Ri-
Ao-
Ro+
Ai+
Ro-
Ai-
Ri Ro
Ao Ai
FIFOcntrl
3333
A simple filter: specificationA simple filter: specification
y := 0;loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x;end loop
RinAin
Aout Rout
ININ
OUTOUT
filter
3434
A simple filter: block diagramA simple filter: block diagram
x y+
controlRin
Ain
Rout
Aout
Rx AxRy Ay Ra Aa
ININOUTOUT
• x and y are level-sensitive latches (transparent when R=1)• + is a bundled-data adder (matched delay between Ra and Aa)• Rin indicates the validity of IN• After Ain+ the environment is allowed to change IN• (Rout,Aout) control a level-sensitive latch at the output
3535
A simple filter: control spec.A simple filter: control spec.
x y+
controlRin
Ain
Rout
Aout
Rx AxRy Ay Ra Aa
ININOUTOUT
Rin+
Ain+
Rin–
Ain–
Rx+
Ax+
Rx–
Ax–
Ry+
Ay+
Ry–
Ay–
Ra+
Aa+
Ra–
Aa–
Rout+
Aout+
Rout–
Aout–
3636
A simple filter: control impl.A simple filter: control impl.
C
Rin
Ain
Rx Ax RyAy AaRa
Aout
Rout
Rin+
Ain+
Rin–
Ain–
Rx+
Ax+
Rx–
Ax–
Ry+
Ay+
Ry–
Ay–
Ra+
Aa+
Ra–
Aa–
Rout+
Aout+
Rout–
Aout–
3737
Taking delays into accountTaking delays into account
x+
x–
y+
y–
z+
z– xz
yx’
z’
Delay assumptions:• Environment: 3 time units• Gates: 1 time unit
events: x+ x’– y+ z+ z’– x– x’+ z– z’+ y–
time: 3 4 5 6 7 9 10 12 13 14
3838
Taking delays into accountTaking delays into account
xz
yx’
z’
Delay assumptions: unbounded delays
events: x+ x’– y+ z+ x– x’+ y–
time: 3 4 5 6 9 10 11
very slow
failure !
x+
x–
y+
y–
z+
z–
3939
Gate vs wire delay modelsGate vs wire delay models
Gate delay model: delays in gates, no delays in wiresGate delay model: delays in gates, no delays in wires
Wire delay model: delays in gates and wiresWire delay model: delays in gates and wires
4040
Delay models for async. circuitsDelay models for async. circuits
Bounded delays (BD): realistic for gates and wires. Technology mapping is easy, verification is difficult
Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires.
Technology mapping is more difficult, verification is easy
Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires.
DI class (built out of basic gates) is almost empty
Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks).
In practice it is the same as speed independent
BD
SI QDI
DI
4141
Channel-Based DesignChannel-Based Design
Synchronization and communication between blocks Synchronization and communication between blocks implemented with handshaking using asynchronous channels by implemented with handshaking using asynchronous channels by
sending/receiving sending/receiving “data tokens” “data tokens”
Synchronous SystemSynchronous System Asynchronous SystemAsynchronous System
AsynchronousAsynchronous channelchannel
clockclock
4242
Channel Design – Single RailChannel Design – Single Rail
FeaturesFeatures One request wire One request wire
One wire per data bitOne wire per data bit
One acknowledgment wireOne acknowledgment wire
Has timing assumptionsHas timing assumptions
4-phase bundled-data channel
ReqAck
DataData stable
12
34
Req
Ack
Datasender receiver
4343
Channel Design: Dual Rail & 1-of-NChannel Design: Dual Rail & 1-of-NDual RailDual Rail Two wires per data bitTwo wires per data bit
One acknowledgment wireOne acknowledgment wire
Advantage:Advantage:Supports delay-insensitive designSupports delay-insensitive design
1-of-N1-of-N Generalization of dual-railGeneralization of dual-rail
4-phase 1-of-N channel
Ack
Data1
2
3
4Ack
Data(1-of-N)
sender receiver
DataDataTT DataDataFF Logical Logical ValueValue
00 00 ResetReset
00 11 00
11 00 11
11 11 InvalidInvalid
4444
Anatomy of a Channel-Based Anatomy of a Channel-Based Asynchronous DesignAsynchronous Design
Architecture is typically a multi-level hierarchy of Architecture is typically a multi-level hierarchy of communicating blockscommunicating blocks
BN-1 BN-2 BN-3
FAN-1 FAN-2 FAN-3 FA0
ASIC
Main FSM
Register Bank
Memory
Adder/Mult.
Subtract/Divider
Reg C
Reg B
Adder
Multiplier
Reg A
Yields a hierarchical netlist of cells, where at each level blocks communicate along channels
channels
leaf cells
4545
Asynchronous CellsAsynchronous Cells
DefinitionDefinition Smallest element that communicates with its neighbors along Smallest element that communicates with its neighbors along
asynchronous channelsasynchronous channels
FunctionalityFunctionality Reads a subset of input channels Reads a subset of input channels Computes F and writes to a subset of output channelsComputes F and writes to a subset of output channels
Linear PipelinesLinear Pipelines Only one input and one output channelOnly one input and one output channel
FInput
Channels
OutputChannels
F
4646
Cells for Cells for Non-Linear PipelinesNon-Linear Pipelines
FForkJoin
Conditional Split
Conditional Join
• Non-Linear PipelinesJoins and Forks
Conditional Joins: Read only some of the input channels
Conditional Splits: Write only to some of the output channels
F
FF
4747
Template-Based Leaf-Cell Design
• Each pipeline style (QDI, timed…) has a different blueprint
• Create a library using a blueprint to implement the lowest level
communicating blocks
RCDRCD
FLCDLCD
CC
Blueprint for a QDI N-input M-output pipeline stage
RCDRCD
FLCDLCD
CC
LCDLCD
2-input 1-output pipeline stage
RCDRCD
FLCDLCD
CC
RCDRCD
1-input 2-output pipeline stage
4848
Template-Based Leaf-Cell Design
• Pros
• Enables fine-grain 2-D pipelining yielding high-performance
• Simplifies logic synthesis by enabling simple control circuit
generation and re-use of typical datapath synthesis
• Leaf-cells can be layed-out and verified creating a leaf-cell
library, localizing timing assumptions
• Cons
• Unified template may not be optimal in all cases
• Particularly, less effective for non-pipelined architectures
with more complicated control
4949
Motivation (designer’s view)Motivation (designer’s view)
Modularity for system-on-chip design Plug-and-play interconnectivity
Average-case peformance No worst-case delay synchronization
Many interfaces are asynchronous Buses, networks, ...
5050
Motivation (technology aspects)Motivation (technology aspects)
Low power Automatic clock gating
Electromagnetic compatibility No peak currents around clock edges
Security No ‘electro–magnetic difference’ between logical ‘0’ and
‘1’in dual rail code
Robustness High immunity to technology and environment variations
(temperature, power supply, ...)
5151
DissuasionDissuasionConcurrent models for specification CSP, Petri nets, ...: no more FSMs
Difficult to design Hazards, synchronization
Complex timing analysis Difficult to estimate performance
Difficult to test No way to stop the clock
5252
But ... some successful storiesBut ... some successful stories
PhilipsAMULET microprocessorsSharpIntel (RAPPID)Start-up companies:
Theseus logic, Fulcrum Microsystems,Self–Timed Solutions
Recent blurb: It's Time for Clockless Chips, by Claire Tristram (MIT Technology Review, v. 104, no.8, October 2001: http://www.technologyreview.com/magazine/oct01/tristram.asp) ….