Delay/Phase Regeneration Circuits Crescenzo D’Alessandro, Andrey Mokhov, Alex Bystrov, Alex...
-
Upload
kent-pilley -
Category
Documents
-
view
221 -
download
0
Transcript of Delay/Phase Regeneration Circuits Crescenzo D’Alessandro, Andrey Mokhov, Alex Bystrov, Alex...
Delay/Phase Regeneration Circuits
Crescenzo D’Alessandro, Andrey Mokhov, Alex Bystrov, Alex Yakovlev
Microelectronics Systems Design Group
School of EECE
Newcastle University, UK
ASYNC 2007 - C D'Alessandro et al. - 2/28
OutlineIntroduction
Background on Phase-encoding
Dual-rail/multiple-rail phase encoding
Motivation for the present work
Taxonomy
Latch-based designs
MUTEX-based designs
Design types
Conclusions
ASYNC 2007 - C D'Alessandro et al. - 3/28
Phase EncodingDual-Rail
Main idea: encode data on the phase relationship between two identical out-of-phase signals
Resistant to transient faults
Similarity with dual-rail dual-spacer protocol
sp0 sp1
0
sp0
1
sp1
0
sp0
0
ref
t_1
t_0
data
t_1 before t_0t_0 before t_1
ASYNC 2007 - C D'Alessandro et al. - 4/28
Multiple RailNo group of wires has the same delay
All wires toggle when an item of data is sent
17 2017
/req
/ack
/a
/b
/c
/d
cdba cdba dbac
ASYNC 2007 - C D'Alessandro et al. - 5/28
Phase Corruption Phase corruption is due to jitter (introduced by the gates), physical wire fabric and transistor mismatches
Mismatch in process variations cause a systematic delay offset to appear between the two lines, which could cause errors in decoding
Additionally, cross-talk causes symbol-dependent phase corruption
As the wires are always “allies” in terms of cross-talk, the longer the wire, the more corrupted the phase relationship between the wires
What is then the optimal length of wire which “guarantees” that the phase relationship is maintained?
ASYNC 2007 - C D'Alessandro et al. - 6/28
Phase CorruptionExample of phase corruption
No change in sequence
Change in absolute value of phase
ASYNC 2007 - C D'Alessandro et al. - 7/28
TaxonomyDifferent design styles can be identified
We focus in this presentation on digital implementations
Latch-based designs
A latch is used on each wire
Gate-level implementation
Transistor-level implementation
MUTEX-based designs
A single MUTEX is used to arbitrate between the two edges
“Early-propagating”
“Merging”
ASYNC 2007 - C D'Alessandro et al. - 8/28
ParametersMaximum input time separation affected δmax
Events whose time separation is > δmax retain their original separation
Circuit latency λ
Time between the first event occurring and the corresponding output being generated
Response time ζ
Time between the two events below which the time separation cannot be regenerated
Capture range κ= δmax – ζ
Using the convention sometimes used in PLLs to give a value for the range
Linearity
ASYNC 2007 - C D'Alessandro et al. - 9/28
Graphs
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10
Out
put T
ime S
epar
atio
n/La
tenc
y (F
O4)
Input Time Separation (FO4)
Response of latch-based design (2)
FallRise
Latency (fall)Latency (rise)
δmax
λ
ζ
κ
Linearity:
how flat
this part is
ASYNC 2007 - C D'Alessandro et al. - 10/28
Passive Solution
“Textbook” solution
Different response for rising/falling – can be matched using balanced drivers
Not very linear
Capacitor size a problem – also introduces latency
0
2
4
6
8
10
12
0 2 4 6 8 10O
utpu
t Tim
e Sep
arat
ion/
Laten
cy (F
O4)
Input Time Separation (FO4)
Response of passive device
FallRise
Latency (fall)Latency (rise)
o2
o1i1
i2
Sender
ASYNC 2007 - C D'Alessandro et al. - 11/28
Latch-basedGate level/1
Q
Q
D
G
i2
i1 o1
o2
Q
Q
D
G
ASYNC 2007 - C D'Alessandro et al. - 12/28
Latch-basedGate level/1
Latches are transparent at startup
They are closed after one edge arrives at the output
They are then reopened after the pulse is finished
6 FO4 capture range, stops working around 5 FO4 input delta
Difference in rising and falling behaviour
Q
Q
D
G
i2
i1 o1
o2
Q
Q
D
G
0
2
4
6
8
10
12
0 2 4 6 8 10O
utpu
t Tim
e Sep
arat
ion/
Laten
cy (F
O4)
Input Time Separation (FO4)
Response of latch-based design (1)
FallRise
Latency (fall)Latency (rise)
ASYNC 2007 - C D'Alessandro et al. - 13/28
Latch-basedGate level/2
Similar to previous design
Two pulse generators – faster
Only blocks one output and not both
Only one output used – less difference between rising and falling edges
Q
Q
D
G
Pulse generator - width=
Pulse generator - width=
i2
i1 o1
o2Q
Q
D
G
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10O
utpu
t Tim
e Sep
arat
ion/
Laten
cy (F
O4)
Input Time Separation (FO4)
Response of latch-based design (2)
FallRise
Latency (fall)Latency (rise)
ASYNC 2007 - C D'Alessandro et al. - 14/28
Latch-basedTransistor level
i1
i2
o2
o1
ASYNC 2007 - C D'Alessandro et al. - 15/28
Latch-basedTransistor level
Better latency and response
Capture range can be increased increasing tau
Good linearity
i1
i2
o2
o1
0
1
2
3
4
5
0 0.5 1 1.5 2 2.5 3 3.5 4Ou
tput
Tim
e Sep
arati
on/L
atenc
y (F
O4)
Input Time Separation (FO4)
Response of transistor-based design (with keepers)
FallRise
Latency (fall)Latency (rise)
ASYNC 2007 - C D'Alessandro et al. - 16/28
MUTEX-based
i2
i1
o2
o1
ASYNC 2007 - C D'Alessandro et al. - 17/28
MUTEX-based
Higher latency (complex gates)
Good response and capture range
Poor linearity
Early-propagating
i2
i1
o2
o1
0
2
4
6
8
10
0 1 2 3 4 5
Out
put T
ime
Sepa
ratio
n/L
aten
cy (F
O4)
Input Time Separation (FO4)
Response of modified-ME design
FallRise
Latency (fall)Latency (rise)
0
0.5
1
1.5
2
0 0.5 1 1.5 2
Out
put T
ime
Sepa
ratio
n/La
tenc
y (F
O4)
Input Time Separation (FO4)
Response of modified-ME design
FallRise
ASYNC 2007 - C D'Alessandro et al. - 18/28
MUTEX-based
“Infinite” capture range – lower-bounded
Flat response
Very high latency – dependent on input time separation
NOR-MUTEX is slow
C C
g11g12
g21
g22
g12g11
g22
g21
i1
i2
o1
o2
+
-
g11
g12
g21
g22
ref
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10O
utpu
t Tim
e Sep
arat
ion/
Laten
cy (F
O4)
Input Time Separation (FO4)
Response of "merge" design
FallRise
Latency (fall)Latency (rise)
ASYNC 2007 - C D'Alessandro et al. - 19/28
STG for RepeaterSTG for a repeater
Use timing assumptions:
i1- -> p1 -> g11-, g12-
g11- -> i1+
… and mirror ones
This STG can be synthesised using PETRIFY
Synthesised version in next slide…
g21-
o2-
o1-
o1-
g22-
o2-
g21+ g22+
i2-i1-
ME2
ME1
g11+
o2+
o1+
o1+
g12+
o2+
g11- g12-
i1+ i2+
p1 p2
p7
p5
p8
p3 p4
p6
ASYNC 2007 - C D'Alessandro et al. - 20/28
MUTEX-basedw/PETRIFY
Very good linearity and capture range
High latency independent on input until 0.5 FO4
Generated using PETRIFY (STG in previous slide)
i2
i1
o2
o1
C
g22
g11
g12
g21
0
2
4
6
8
10
12
14
16
18
0 5 10 15 20O
utpu
t Tim
e Sep
arat
ion/
Laten
cy (F
O4)
Input Time Separation (FO4)
Response of PETRIFY-generated design
FallRise
Latency (fall)Latency (rise)
ASYNC 2007 - C D'Alessandro et al. - 21/28
TSETransition Sequence Encoder
This circuit generates a number of requests based on an input matrix
The acknowledgments can be either “proper” or a delayed version of the output signals
Can be used as a phase-encoder
req[1]
req[2]
req[3]
ack[1]ack[2]ack[3]
go
R[1,3]
R[1,2]
R[2,3]
R[2,1]
R[3,2]
R[3,1]
ASYNC 2007 - C D'Alessandro et al. - 22/28
MUTEX-TSE
This solution is similar to the MUTEX-based one, only using the TSE as a sender
λ < 2 FO4
Increasing output time separation dependent on the input (output δ > 8FO4)
C
i1
i2
o1
o2
R[1,2]
R[2,1]
go
0
5
10
15
20
0 2 4 6 8 10O
utpu
t Tim
e Sep
arat
ion/
Laten
cy (F
O4)
Input Time Separation (FO4)
Response of TSE design
FallRise
Latency (fall)Latency (rise)
ASYNC 2007 - C D'Alessandro et al. - 23/28
TSE – Transistor-level
Like above, only rising and falling edge
Transistor-level implementation of the TSE
Results similar to the previous case
Note the similarity with the transistor-level latch-based design
C
i1
i2
R1[1,2]
R2[1,2]
R2[2,1]
o2
o1
R2[1,2]
R1[1,2]
R1[2,1]
R1[2,1]
R2[2,1]
0
5
10
15
20
0 2 4 6 8 10O
utpu
t Tim
e Sep
arat
ion/
Laten
cy (F
O4)
Input Time Separation (FO4)
Response of TSE design (transistor level)
FallRise
Latency (fall)Latency (rise)
ASYNC 2007 - C D'Alessandro et al. - 24/28
Multiple-railMultiple-rail phase-encoding requires similar designs to regenerate the phase relationship
The design on the right is a simple expansion of the previous latch-based design
Very slow response
Only useful for large δ
Acceptable latency
Q
Q
D
G
Q
Q
D
G
Q
Q
D
G
Q
Q
D
G
i2
i3
i4
Pulse generator
Pulse generator
Pulse generator
Pulse generator
i1 o1
o2
o3
o4
ASYNC 2007 - C D'Alessandro et al. - 25/28
Multiple-rail “merge”
Better design: use a TSE
Shown: 3-wires regeneration – left, rising edge only, right; rising and falling edges
Better response, but λ depends on the input time separation (needs to wait for all inputs to be present)
i1
i2
i3
o1
o2
o3
sender
go
receiver
R1[1,3]
R1[1,2]
R2[1,2]
R2[1,3]
Co1
o4
o2
o3
go
R1[1,4]
R2[1,4]
ASYNC 2007 - C D'Alessandro et al. - 26/28
Performance comparison
Dual-rail implementations
Area in transistor count
κ and λ in FO4
Area and energy for “Latch-based transistor level” design is for no keeper/keeper
“Charge compensation”: area calculated estimating the size of the capacitors
Avg. for rise/fall
Design Area pJ/bit ζ κ λ
Latch-based 1 58 0.82 6 (avg)
2 2.5
Latch-based 2 68 0.59 4 3 3.5
Mod. MUTEX 88 1.17 <0.5 1 7
Automatic Synthesis
94 0.9 <0.1 5 7
MUTEX-based merging
110 0.98 <0.1 δinput
Latch-based transistor level
28/32
0.43/0.47 <0.5 3 1
Charge-compensation
24 0.22 2 4 4 (avg)
TSE gate-level
74 0.78 <0.1 δinput
TSE transistor-level
52 0.79 <0.1 δinput
ASYNC 2007 - C D'Alessandro et al. - 27/28
ConclusionsSome phase-regeneration circuits have been presented
More work to do:
Metastability behaviour, in particular for keeper structures
Behaviour in case of faults
Characterisation with different input signal slopes
ASYNC 2007 - C D'Alessandro et al. - 28/28
Contact details
Crescenzo S. D’Alessandro
Microelectronics Systems Design Group
School of Electrical, Electronics and Computer Engineering
Merz Court
Newcastle University, UK
Crescenzo.D’[email protected]
http://async.org.uk