ASYNC07ASYNC07
High Rate Wave-pipelinedHigh Rate Wave-pipelinedAsynchronous On-chip Bit-serialAsynchronous On-chip Bit-serial
Data Link Data Link
R. Dobkin, T. Liran, Y. Perelman, R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar A. Kolodny, R. Ginosar
Technion – Israel Institute of TechnologyTechnion – Israel Institute of Technology
Electrical Engineering Department – VLSI LabElectrical Engineering Department – VLSI Lab
March 12, 2007March 12, 2007
2 ASYNC07
Presentation Outline
• Why Serial Link?
• Fast Asynchronous Serial Link• Transmitter, Fast LEDR Encoder
• Receiver, Fast Toggle Circuit
• Channel, Current Mode Async Signaling
• Performance
• Summary
3 ASYNC07
Serial Link Employment Benefits
• Why Serial Link?• Less interconnect area• Less routing congestion• Less coupling• Less power (depends on range)
• The relative improvement grows with technology scaling. The example on the right refers to: • Single gate delay serial link• Fully-shielded parallel link with
8 gate delay clock cycle• Equal bit-rate• Word width N=8
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
180 130 90 65 30 15
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
180 130 90 65 30 15
Parallel Link dissipates less power
Serial Link dissipates less power
Technology Node [nm]
Link Length [mm]
Parallel Link requires less area
Serial Link requires less area
4 ASYNC07
Serial Link Applications
• P2P long-range interconnect
• Long range NoC links
• Pin-limited on-chip module interfaces• Presently chips are pin-limited, and that will migrate
inside
• Cross-bar • Simpler routing and congestion
• Communications inside many-core CMPs
5 ASYNC07
Serial Link – Top Structure
• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)
• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers
Sender Receiver
Word Ack
Bit-Serial ChannelSynchr. Synchr.Serializer
& LEDREncoder
DeSerializer& LEDRDecoder
P
S
6 ASYNC07
Encoding –Two Phase NRZ LEDR
• Two Phase Non-Return-to-Zero Level Encoded Dual Rail • “delta” encoding (one transition per bit)
Uncoded (B)
State bit (S)
Phase bit (P)
0 0 1 1 0 0 0 0 1 0
( ),( )
( ),
B i i oddP i
B i i even
( ) ( )S i B i i
7 ASYNC07
Transmitter – Fast SR Approach
Transition
Generator
P
P
Parallel Load Interface
Load Enable
Uncoded Data
Shift-Register, SR(B) Beven
T0T0
Bodd
S
S
LEDREncoder
P
P
S
S
T90T90
Bodd
Beven
OT0
OT0
OT90
OT90
• Targeted Speed: One gate delay between bits
ASYNC078
T
XL[1] XL[0]
Data-0Data10
XL[W-2]
Data(W-1)(W-2)
Parallel Load Interface
NotConnected
OUT
Fast Asynchronous Shift Register
9 ASYNC07
-25
-20
-15
-10
-5
0
5
1.00E+09 1.00E+10 1.00E+11Frequency [Hz]
DVb=0DVb=0.05DVb=0.1DVb=0.15DVb=0.2DVb=0.25DVb=0.3
-8
-6
-4
-2
0
2
4
6
8
1.00E+09 1.00E+10 1.00E+11Frequency [Hz]
0.5 Full Swing0.6 Full Swing0.7 Full Swing0.8 Full Swing0.9 Full SwingFull Swing
BiasGain[dB]
Voltage SwingGain[dB]
Wave-pipelined Control Characteristics
• The highest speed (the single gate-delay cycle) relates to the pole of the Bode diagram
• This operating point results in signal degradation along the inverter chain
Single Gate Delay Rate
BIAS
SWING
25
30
35
40
45
50
55
60
65
0 5 10 15 20 25 30
Inverter Chain Length, N
Cut-off Frequency[GHz]
10 ASYNC07
Splitter Architecture
• The shift-register is partitioned into M shift-registers M slower operation in each shift-register
• Signal is no longer degraded • Single gate-delay operation is localized to output (input) stage only
Shift-Register for Odd Bits
Shift-Register for Even BitsMerge
PARALLEL LOAD
Shift-Register for Odd Bits
Shift-Register for Even BitsSplit
PARALLEL LOAD PARALLEL READ
PARALLEL READ
Transmitter Receiver
11
Transmitter Splitter Architecture
XLEVEN[N/2]
C90
XLODD[N/2]
C
XOR
P
BODD
BEVEN
P
S
S
BODD
BEVEN
LEDR ENCODER
MergeEVEN
MergeODD
SREVEN
SRODD
12 ASYNC07
Transmitter – SPICE Simulation (65nm node)
30 ps
C0
C90
BEVEN
BODD
C
P
S
60 ps
START BIT=1
1 0 0 0 1 0 1 0
0 0 1 0 1 0 1 Dummy
1 0 0 1 0 1 0 11 0 0 0 1 0 1 0
Simulations done at
ASYNC0713
Receiver
14
Receiver Splitter Architecture
XLEVEN[1]
XLODD[1]
SREVEN
SRODD
TOGGLEC
S
A
B
T
SEVEN
SODD
SPLIT
15 ASYNC07
Toggle Circuit
• Straightforward implementation (fundamental asynchronous state machine) is too slow (supports only ~1.5 gate delay cycle)
• Novel toggle: • Single gate delay operation support• Internal and output latches
T
A
B
16 ASYNC07
Channel
• Four transmission lines (DS-DE)
• High metal layers utilization• Metals 5-8 of 65nm process
• RLC modeled
• Careful layout• Small crosstalk
• Small relative variations
17 ASYNC07
S SP P SP
LEDR Interconnect Layout
18 ASYNC07
Differential Channel Driver and Receiver
• Current mode differential low-swing signaling
• Currents in opposite directions
• Controllable current return path
a
a
Driver
SA
i
i
b
o
o
z
z
Receiver
R
RP / S P / S
19 ASYNC07
Channel Characteristic Impedance
0 (1 )DCR j L
Z R R jj C
Based on data from BPTM. Drawn for constant R, L, C
• Z depends on F
• Voltage changes with F
• Fast changes voltage drifts
• The drifts bound the operating speed
F
Z
S
S
20 ASYNC07
Channel Driver with Adaptive Control
IN
OUT
• Compensates for Z changes• Turned on for low frequencies
Adaptive Control
Inertial Delay
21 ASYNC07
Adaptive Control – Simulation Example
• SPICE simulation setup:• 65nm technology, 4mm range, 67Gbps data rate
• RLC modeled channel (using Raphael-like three-dimensional field solver)
• Adaptive control is turned on only for low frequencies
Data
Adaptive Control
Currents
Low FrequencyTurns Adaptive Control On
22 ASYNC07
Channel Receiver Amplifier
B
IN
OUT
R
RB
23 ASYNC07
TX-SR
RX-SR
Channel Diff Pair
Performance
• SPICE simulation show correct operation at target data cycle of 15ps (65nm technology node)
• Power for 67Gbps 4mm 16-bit word link under 100% utilization:• Total power: 150mW• Channel differential pair: 18mW• Leakage power: 4mW
(due to low VT transistors employment)
• Power reduction• Deeper split ( M power reduction)• Circuit optimizations• Circuit shut down during idle states
24 ASYNC07
In-Die Variations
• Splitter architecture• High-speed operation localized to input and output stages
• High-speed components design and verification• Monte-Carlo simulations (>5)• 26 PVT Corners• Iterative design with legging and sizing for sensitive
transistors
• Asynchronous structure• Supports any slow down• Minimal time separation between successive bits must be
provided!
25 ASYNC07
Summary
• High speed Serial Link requires special circuits:• Fast serializers and de-serializers
• Wave-pipelined control• Splitter architecture:
• Long word transmission• Power reduction
• On-the-fly LEDR encoding
• Adaptive control for fast asynchronous signals handling• Low crosstalk interconnect layout• Single FO4 inverter delay data cycle support (15ps on 65nm process, 67 Gbps)
• The Serial Link preferred over Parallel Link thanks to:• Reduced Interconnect and Active area• Easier routing, less coupling • Reduced power for long on-chip interconnects
26 ASYNC07
The End
• Thank you
Top Related