SIGNAL PRECONDITIONING USING FEEDFORWARD
EQUALIZERS IN ADC-BASED DATA LINKS
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL
ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Ryan Boesch
May 2016
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/dk653rc7126
© 2016 by Ryan Boesch. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Boris Murmann, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Mark Horowitz
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Madihally Narasimha
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost for Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
iv
Abstract
As the data rates for high-speed wireline transceivers continue to increase, inter-
symbol interference (ISI) due to channel loss is becoming more pronounced and mul-
tiple techniques have been suggested to address this issue. One technique that has
recently been gaining popularity is the ADC-based receiver. In ADC-based receivers,
a digital feedforward equalizer (FFE) is used in conjunction with a decision feedback
equalizer (DFE) to equalize the channel and recover the data. However, in order to
recover the data with a high fidelity, a power-hungry ADC is needed to digitize the
signal. Recent work has shown that an analog receive-side FFE (RX-FFE) prior to
the ADC can reduce the required ADC resolution while achieving the same BER.
In order to obtain a net improvement for the system, the RX-FFE must be imple-
mented with low power consumption, low noise, and small chip area. In this thesis, an
RX-FFE is demonstrated that meets these requirements and outperforms state-of-the-
art designs. The RX-FFE is constructed entirely with low-noise and power-efficient
analog-inverter transconductors and capacitors, avoiding the use of area-intensive in-
ductors. The delay element is implemented as a single-path Pade-inspired delay shown
to be equivalent to the first-order Pade delay in terms of RX-FFE performance. The
proof-of-concept RX-FFE is demonstrated to reduce the signal dynamic range by 2×
resulting in a 1 bit ADC resolution relaxation. The total power consumed is less than
26 mW with less than 0.62 mVRMS output noise for all coefficient values and an area
of only 0.003 mm2 in 40 nm CMOS.
v
Acknowledgments
I have been fortunate to interact with so many wonderful people during the course
of my Ph.D. I want to take this opportunity to thank those who played a big role in
the completion of this work.
Firstly, I must thank my advisor Professor Boris Murmann. My success in this
program can largely be attributed to his guidance and support. He has been a better
advisor than I could have dreamed of finding when I first started on this journey.
I also thank Professor Mark Horowitz and Dr. Madihally Narasimha for being
on my reading committee. I thank Professor Amin Arbabian for being on my orals
committee and Professor Jon Fan for chairing the orals committee.
I thank the Broadcom Foundation and Stanford’s initiative for Rethinking Analog
Design for the funding that they provided. I thank the TSMC University Shuttle
Program for the integrated circuit fabrication they provided.
Thanks to Tom Kwan from Broadcom for arranging presentations and providing
early feedback on my work. Thanks to Hiroshi Takatori, John Duan, and Albert
Vareljian from Futurewei for help with chip debugging and for access to test equip-
ment. Thanks to Frankie Liu and Vincent Lee from Oracle for access and support
with test equipment.
I thank Ann Guerra for all the administrative help she provided throughout my
degree. She has always gone above and beyond for me and all of the students in the
Murmann group. We are lucky to have her. In addition, I thank Joe Little for his
vi
IT support. The speed with which he replies to emails and resolves server issues is
paramount and I am greatly appreciative of the help I have received from him over
the years.
I would also like to thank everyone in the Murmann group, past and present. In
particular, thanks to Jonathon Spaulding, Doug Adams, and Martin Kraemer for the
trips aboard and for good times back home.
Last but not least, I thank my family whose love and support mean everything
to me. To my sister — you have been a role model of mine my entire life. Thanks
for setting the bar so high. To my brother — you continue to impress me each day.
I am proud of what you have accomplished and excited to see what you will achieve
next. To my mother - I am lucky to have enjoyed your unwavering support and
unconditional love. I certainly would not have made it here without you. To my wife
— I do not think I could have finished this degree without you. Meeting you is the
best thing to ever happened to me.
Finally, I dedicate this thesis to the memory of my father. All of the best parts
of who I am today can be traced back to you.
vii
Contents
Abstract v
Acknowledgments vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 8
2.1 Channel characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Transmission line model . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Linear system model . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Pulse response . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Typical channels . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Receive feedforward equalizer (RX-FFE) . . . . . . . . . . . . . . . . 14
2.2.1 FFE operation . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Metrics of FFE performance . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Peak-to-main ratio (PMR) . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Eye opening equivalence . . . . . . . . . . . . . . . . . . . . . 20
3 Analog delays for FFEs 23
viii
3.1 Delay approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Ideal delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.2 Lumped delay line . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.3 Bessel delays . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.4 Pade delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Equivalence of first-order delays . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Coefficient spread . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Example transformation . . . . . . . . . . . . . . . . . . . . . 32
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Analog FFEs in high-speed links 37
4.1 FFE design parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Simulation methodology . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Delay type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Delay time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 Mathematical analysis . . . . . . . . . . . . . . . . . . . . . . 41
4.4.2 Channel characteristic dependence . . . . . . . . . . . . . . . 43
4.4.3 First-order delays . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Number of taps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 Parasitic pole frequency . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.7 Coefficient resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.8 Main cursor attenuation . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5 Inverter-based FFE 52
5.1 Analog-inverter transconductor . . . . . . . . . . . . . . . . . . . . . 52
5.2 Unity-gain stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
ix
5.2.1 Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2.2 Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2.3 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2.4 Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.5 Supply rejection . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.6 Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.1 Single-path Pade-inspired delay . . . . . . . . . . . . . . . . . 61
5.3.2 Comparison with two-path Pade delay . . . . . . . . . . . . . 64
5.4 Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Summing circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6 Full FFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.6.1 FFE noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6.2 FFE mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 FFE design 71
6.1 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.3 Summing circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.4 PRBS generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5 Output driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7 Measurement results 77
7.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.1 On-chip channel . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.1.2 Off-chip channel . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Test debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
x
7.3 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3.1 Pulse responses and DR improvement . . . . . . . . . . . . . . 83
7.3.2 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3.3 Eye diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3.4 LMS system identification method . . . . . . . . . . . . . . . 87
7.4 Performance summary . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8 Conclusions 92
8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A FFE coefficient optimization 95
A.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A.2 Brute force solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.3 MATLAB optimization toolbox . . . . . . . . . . . . . . . . . . . . . 99
B Equivalence of first-order delays in FFEs 101
C Pade approximants 105
D Low-frequency nonlinearity simulation 107
D.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
D.2 Transient simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
D.2.1 DFT method . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
D.2.2 LMS method . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
D.3 DC simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
E Unity-gain stage nonlinearity 117
E.1 Analog-inverter transconductor . . . . . . . . . . . . . . . . . . . . . 117
E.2 Unity-gain stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
xi
E.2.1 First-order case . . . . . . . . . . . . . . . . . . . . . . . . . . 119
E.2.2 Second-order case . . . . . . . . . . . . . . . . . . . . . . . . . 119
E.2.3 Third-order case . . . . . . . . . . . . . . . . . . . . . . . . . 120
E.3 Comparison with simulation . . . . . . . . . . . . . . . . . . . . . . . 120
F Unity-gain stage supply rejection 123
F.1 Single-ended . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
F.2 Pseudo-differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
G Switched-capacitor tuning circuit 127
H Gain compression generalization 132
Bibliography 135
xii
List of Tables
7.1 Test equipment for the measurements with the on-chip channel. . . . 79
7.2 Test equipment for the measurements with the off-chip channel. . . . 81
7.3 Performance summary for state-of-the-art RX-FFEs. . . . . . . . . . 91
E.1 Transistor-level simulated Taylor coefficient values, Gjk, for the degen-
erated inverter transconductor load. . . . . . . . . . . . . . . . . . . . 121
xiii
List of Figures
1.1 (a) Visual representation of a backplane system with transmitter, chan-
nel, and receiver (reproduced with permission from [1]) and (b) the
corresponding block diagram. . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Low-rate PAM2 data transmission with simple data recovery via thresh-
old comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 High-rate PAM2 data transmission with errors for simple data recovery
via threshold comparison. . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Comparison of (a) conventional, (b) ADC-based, and (c) proposed
transceiver architectures. . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Insertion loss versus frequency for various channels [2]. . . . . . . . . 14
2.2 Normalized pulse response versus time for various channels [2]. . . . . 14
2.3 Block diagram of an n-tap RX-FFE. . . . . . . . . . . . . . . . . . . 15
2.4 Visualization of the pulse response equalization for a 5-tap RX-FFE. 16
2.5 Pulse response versus time at the coefficient outputs and FFE output. 16
2.6 Normalized pulse responses versus time at the channel output and the
at the FFE output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Transmitted signal, received signal, and equalized signal for a random
sequence of bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
xiv
2.8 Normalized magnitude response versus normalized frequency of the
channel, FFE, and channel+FFE. . . . . . . . . . . . . . . . . . . . . 18
2.9 Normalized pulse response versus time with the discrete pulse response
terms labeled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 The received pulses and received signal versus time demonstrating the
peak signal due to ISI. . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Schematic diagram of an N -order lumped-LC approximation of a loss-
less transmission line. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Magnitude (top) and phase (bottom) versus frequency for LC delays
of orders 1, 2, and 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Group delay versus frequency for LC delays of orders 1, 2, and 3. . . 26
3.4 Magnitude (top) and phase (bottom) versus frequency for Bessel delays
of orders 1, 2, and 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Group delay versus frequency for Bessel delays of orders 1, 2, and 3. . 27
3.6 Magnitude (top) and phase (bottom) versus frequency for Pade delays
of orders 1, 2, and 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7 Group delay versus frequency for Pade delays of orders 1, 2, and 3. . 29
3.8 The spectral norm versus the pole and zero ratio of the delay, α, for
FFEs of orders 2, 3, 4, and 5. . . . . . . . . . . . . . . . . . . . . . . 31
3.9 Magnitude and phase versus frequency for first-order delays with α = 0,
α = 1/3, and α = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.10 Group delay versus frequency for first-order delays with α = 0, α =
1/3, and α = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.11 FFE magnitude (top) and phase (bottom) versus frequency for α = 0,
α = 1/3, and α = 1 with ideal coefficient transformations. . . . . . . . 34
xv
3.12 FFE magnitude (top) and phase (bottom) versus frequency for α = 0,
α = 1/3, and α = 1 with practical coefficient transformations. . . . . 34
3.13 Pulse responses versus time for the channel pulse and equalized pulses
after 5-tap FFEs with α = 0, α = 1/3, and α = 1 optimized with 5-bit
coefficient resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 The 3-tap FFE block diagram with variable delay type and delay time
for the MATLAB simulation of the optimal delay time. . . . . . . . . 41
4.2 DR improvement versus delay time for a 3-tap FFE with Bessel and
Pade delay types of order 1 (solid), 2 (dashed), and 3 (dotted). . . . . 41
4.3 Block diagram of a 2-tap FFE with first-order Pade delays; variable
delay time, τ ; and optimal coefficient, c2. . . . . . . . . . . . . . . . . 44
4.4 DR improvement versus delay time for the 2-tap FFE in figure 4.3 for
various channel pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 DR improvement versus pole and zero ratio, α, for τ = 25 ps. . . . . . 46
4.6 DR improvement versus pole and zero ratio, α, for τg = 25 ps. . . . . 46
4.7 Block diagram of an n-tap FFE with variable delay time, τ , and opti-
mal coefficients, c1 to cn. . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.8 DR improvement versus delay time for 3-tap, 4-tap, and 5-tap FFEs
with first-order Pade delays. . . . . . . . . . . . . . . . . . . . . . . . 47
4.9 Block diagram of an n-tap FFE with variable parasitic pole frequency,
fp, at each node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.10 DR improvement versus fp for 3-tap, 4-tap, and 5-tap FFEs with first-
order Pade delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.11 DR improvement versus coefficient resolution (plus sign bit) for 3-tap,
4-tap, and 5-tap FFEs with first-order Pade delays. . . . . . . . . . . 49
xvi
4.12 DR improvement versus main cursor amplitude for 3-tap, 4-tap, and
5-tap FFEs with first-order Pade delays. . . . . . . . . . . . . . . . . 49
5.1 (a) The analog-inverter transconductor and (b) the associated transistor-
level schematic diagram. . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Example circuits using the analog-inverter transconductor. . . . . . . 53
5.3 Schematic diagram of the unity-gain stage with parasitics. . . . . . . 55
5.4 Schematic diagram of the inverter-based first-order Pade delay. . . . . 61
5.5 Block diagram of the buffered inverter-based first-order Pade delay. . 62
5.6 Schematic diagram of the buffered inverter-based first-order delay. . . 63
5.7 Schematic diagrams of (a) the single-path Pade-inspired delay of this
work and (b) the two-path Pade delay [3]. . . . . . . . . . . . . . . . 64
5.8 Half-circuit schematic diagram of the inverter-based coefficient. . . . . 65
5.9 Half-circuit schematic diagram of the inverter-based summing circuit. 66
5.10 Half-circuit schematic diagram of the inverter-based FFE. . . . . . . . 68
6.1 The single-path Pade-inspired delay schematic diagram with triode-
degenerated load transconductor. . . . . . . . . . . . . . . . . . . . . 72
6.2 Half-circuit schematic digram for the reduced input capacitance 5-bit
coefficient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Block diagram of the on-chip signal generator including a PRBS gen-
erator and LVDS conversion stage. . . . . . . . . . . . . . . . . . . . 75
6.4 Half-circuit schematic diagram of the output driver. . . . . . . . . . . 76
7.1 Die photo of the proof-of-concept IC fabricated in the TSMC40 GP
process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2 Test PCB photo and (inset) chip-on-board bonding. . . . . . . . . . . 79
7.3 Test signal paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
xvii
7.4 Measured eye diagram for a 20 Gb/s PRBS signal for the first chip
revision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.5 Post-layout simulated eye diagram for a 20 Gb/s PRBS signal with
additional supply resistance. . . . . . . . . . . . . . . . . . . . . . . . 82
7.6 Measured normalized pulse response for the 0.5 m FR4 PCB trace chan-
nel and the channel+FFE. . . . . . . . . . . . . . . . . . . . . . . . . 84
7.7 Normalized PRBS response generated from the pulse responses in fig-
ure 7.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.8 Measured and simulated integrated noise voltage versus coefficient value. 85
7.9 Eye diagrams for the on-chip channel measurements. . . . . . . . . . 86
7.10 Measured normalized pulse response with and without the FFE equal-
ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.11 Measured normalized PRBS response with and without the FFE equal-
ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.12 Block diagram of the LMS algorithm system identification. . . . . . 88
7.13 Impulse response for the bench and simulated channel response for the
on-chip channel test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.14 Impulse response for the bench and simulated equalized response for
the on-chip channel test. . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.15 Measured frequency response of the channel and channel+FFE for the
on-chip channel test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.16 Simulated frequency response of the channel and channel+FFE for the
on-chip channel test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.17 Signal distortion ratio versus input signal variance comparing sinusoid
and PRBS signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
D.1 DFT of io[n] with A1 = A2 = 10 mV. . . . . . . . . . . . . . . . . . . 109
xviii
E.1 Schematic diagram of the inverter transconductor for nonlinearity anal-
ysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
E.2 Schematic diagram of the unity-gain stage for nonlinearity analysis. . 118
F.1 Schematic diagram of the unity-gain stage for supply rejection analysis. 124
G.1 The single-path Pade-inspired delay schematic diagram with gate-voltage
tunable triode-degenerated load transconductor. . . . . . . . . . . . . 128
G.2 Contour lines of constant gain and common mode and 50 point Monte
Carlo simulation of converged bias voltages for the switched-capacitor
circuit in figure G.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
G.3 Monte Carlo simulation of the contour lines of constant common mode
with the output common mode forced to (top) half of the supply and
(bottom) the natural common mode. . . . . . . . . . . . . . . . . . . 128
G.4 (a) The schematic diagram for the switched-capacitor gain and com-
mon mode replica tuning circuit and (b) the associated clock phase
diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
xix
xx
Chapter 1
Introduction
1.1 Motivation
Each day more people in more places are accessing more data at higher rates and
data centers need to adapt to keep up with the demand. User data is stored with
ever increasing density in server hardware and, to compound the problem, it is being
accessed more frequently. But the number of wireline links to this information is
limited by the planar structure of the PCBs on which they reside. Therefore, meeting
this increased demand requires increasing the data rates of these already very high-
speed links. The 100 Gigabit Ethernet standard (100GbE) as defined in IEEE 802.3bj
calls for four lanes at 25 Gb/s [4] and the feasibility of 400GbE is currently being
investigated [5]. Substantial innovations will be necessary to enable this standard
and other future standards. To add to the challenge, data centers are already very
power hungry and require elaborate cooling systems. Therefore, the increased data
rate cannot come with a corresponding increase in power consumption, which requires
a disruption to the classical power and speed relationship.
Short-range communication of data at high rates is typically performed with high-
speed transceivers over backplane systems as depicted in figure 1.1. In such systems,
1
2 CHAPTER 1. INTRODUCTION
TX RXChannelData In Data Out
(a)
(b)
Figure 1.1: (a) Visual representation of a backplane system with transmitter, channel,and receiver (reproduced with permission from [1]) and (b) the corresponding blockdiagram.
data is sent by a transmitter through the channel, which consists of the PCB trace,
connectors, etc., and is recovered at the receiver. The data is conventionally encoded
in pulse amplitudes which is referred to as pulse amplitude modulation (PAM). En-
coding 1 bit per symbol requires two amplitude levels and is referred to as PAM2 or
non-return to zero (NRZ). A low-rate example of this method is shown in figure 1.2.
In this case, the received signal resembles the transmitted signal and the data can
be recovered by simply comparing to a threshold (i.e. slicing). This method is only
viable up to a few gigabits per second [6]. For higher-rate transmissions, each pulse
is substantially dispersed by the channel and intersymbol interference (ISI) occurs.
An example of this is shown in figure 1.3 for the same transmit data and channel
response as in figure 1.2 but with a 4× increased data rate. For this case, slicing
without equalization results in errors in the output data. The source of the ISI is
the channel loss at frequencies equal to and less than the Nyquist frequency, which
becomes more problematic at higher data rates. Data recovery in the presence of ISI
is the primary challenge in advancing the data rate and many techniques have been
1.1. MOTIVATION 3
TX…100001010111… RXChannel
V
t
…100001010111…
1 1 1 111
… 0000 0 0 …
V
t
Figure 1.2: Low-rate PAM2 data transmission with simple data recovery via thresholdcomparison.
TX…100001010111… RXChannel
V
t
…000000?11111…
V
t
4× rate increase
errors
1 1 1 111
… 0000 0 0 …
Figure 1.3: High-rate PAM2 data transmission with errors for simple data recoveryvia threshold comparison.
employed to equalize for the channel loss.
The conventional transceiver architecture is depicted in figure 1.4(a). Linear op-
erations are commutative and equalization is equally effective both before and after
the channel. Before the channel, a pre-emphasis transmit feedforward equalizer (TX-
FFE) introduces high-frequency peaking to the transmit signal to counteract the
de-emphasis of the channel. The channel ISI is only observable at the receiver which
complicates the adaptation by requiring a back channel from the receiver to the trans-
mitter [6]. Also, the equalization performance is limited by the peak signal that can
be delivered by the circuit. As a consequence, additional equalization is required after
the channel. On the receive side, the continuous time linear equalizer (CTLE) boosts
4 CHAPTER 1. INTRODUCTION
the high-frequency signal content similar to the TX-FFE, but because the channel
has already attenuated the signal, the CTLE is not peak-power limited. The CTLE’s
limitation is in its adaptability. In most cases, the CTLE is blind to the channel and
the other equalizer blocks must adapt around it. Also, the high-frequency peaking
boosts the signal and noise indiscriminately, resulting in a noise penalty. This is
where the decision feedback equalizer (DFE) outperforms all other equalizers. The
DFE is an equalization block in which the noise-less recovered bits are scaled and
subtracted from the signal to remove the residual post-cursor ISI without introducing
additional noise. However, the DFE cannot correct for pre-cursor ISI due to causality
constraints. Therefore, the combined equalization of the TX-FFE and CTLE must
sufficiently suppress the pre-cursor ISI while the DFE can clean up the remaining
post-cursor ISI.
DigitalFFE DFE
TXFFE Channel CTLE DFE
ADC
ADC DSPRXFFE
reduced complexity
highpower
reducedpower
this work
Channel
Channel
blindequalization
peak power constrained
Conventional Architecture
1st Generation ADC Architecture
ProposedADC Architecture
simple TX
simple TX
(a)
(b)
(c)
Figure 1.4: Comparison of (a) conventional, (b) ADC-based, and (c) proposedtransceiver architectures.
The conventional transceiver architecture has proven effective in sustaining the
advances in data rates demanded by the evolution of the standards until only recently.
A limitation has been reached in the capacity of the channel for PAM2 signaling. For
many applications, advancing the symbol rate only adds signal content at frequencies
1.1. MOTIVATION 5
where the channel loss is greater than 40 dB [2], and a recent survey of wireline
transceivers found a trend for a 2× decrease in power efficiency for each additional
6 dB in channel loss at Nyquist [7]. For these reasons, scaling the symbol rate is no
longer a viable option for advancing the data rate. Instead, it is better to make use
of the SNR readily available at the lower frequencies and increase the number of bits
per symbol. Modulating with four pulse amplitude levels encodes 2 bits per symbol
and is referred to as PAM4. While this method can enable higher data rates, there is
a penalty in terms of system complexity. It is no longer possible to recover the data
with a single slicer and more complex data recovery circuits must be employed.
One such PAM4 receiver architecture that is gaining popularity is the ADC-based
architecture depicted in figure 1.4(b) [8, 9]. In this architecture, the received signal
is quantized by a high-speed ADC and the equalization is completed in the digital
domain with digital signal processing (DSP). The advantage of this architecture is in
the power and portability of DSP processing, and the disadvantage is in the power
consumption of the high-speed ADC. The required resolution of the ADC is deter-
mined by the degree of equalization necessary in the DSP. For example, a 2 bit ADC
can completely recover the data with no post-processing necessary for a well-behaved
channel that introduces insignificant ISI. For systems with substantial channel loss,
an ADC resolution up to 8 bits can be necessary [9]. Therefore, it is desirable to per-
form some equalization in the analog front end to reduce the resolution requirements
of the ADC and complexity of DSP [10].
A CTLE can be used to perform some equalization prior to digitization, but
recent work has shown that an analog receive-side feedforward equalizer (RX-FFE)
prior to the ADC can further reduce the required ADC resolution while achieving
the same bit error rate (BER) [10]. The proposed transceiver architecture with the
RX-FFE is shown in figure 1.4(c). The objective of the RX-FFE is not to completely
equalize the channel and open the eye, but instead to reduce the dynamic range of the
6 CHAPTER 1. INTRODUCTION
signal resulting in a relaxation of the ADC resolution requirement. This can result in
substantial power savings for the system. For example, the 10 GS/s 6 bit ADC in [8]
consumes 143 mW of power. A reduction in the required resolution by 1 bit reduces
the ADC power by at least 2× for a saving of over 70 mW. Unfortunately, the power
consumption of previous state-of-the-art RX-FFEs exceeds this value, nullifying any
potential improvements in system performance [3, 11]. In this work, we present an
RX-FFE implemented with low power consumption, low noise, and small chip area
that outperforms these state-of-the-art designs, enabling the RX-FFE for this target
application.
1.2 Organization
The remainder of the thesis is organized as follows:
• Chapter 2 covers some background material that is prerequisite to the work in
this thesis. In §2.1.1, we discuss the characteristics of wireline channels. In
§2.2, we present the architecture and operation of the RX-FFE. In §2.3, we
introduce a metric to measure the performance of an equalizer with respect to
ADC resolution relaxation.
• Chapter 3 looks at analog delays for RX-FFEs. In §3.1, we consider the prop-
erties of various delay approximations with a focus on the Pade delay employed
in this work. In §3.2.1, we prove the equivalence of first-order delays in the
application of RX-FFEs and consider the practical limitations of the theorem.
• Chapter 4 studies the design space of RX-FFEs through MATLAB simulations.
The impact on performance is investigated for delay type, delay time, number of
taps, parasitic bandwidth, coefficient resolution, and main-cursor attenuation.
1.2. ORGANIZATION 7
The results from this chapter are used to guide architecture choices in chapter
5 and design decisions in chapter 6.
• Chapter 5 introduces the inverter-based RX-FFE. In §5.1, the analog-inverter
transconductor is discussed. In §5.2, we cover the performance equations for the
inverter-based unity-gain stage on which the performance of the FFE depends.
In §5.3, we introduce the single-path Pade delay used in this work. Finally, we
end the chapter with an overview of the complete inverter-based RX-FFE in
§5.6.
• Chapter 6 presents some of the design challenges and solutions for the proof-of-
concept integrated circuit.
• Chapter 7 covers the test procedures and provides the measurement results for
the proof-of-concept integrated circuit. In §7.3.1, the measured ADC resolution
relaxation performance of the RX-FFE is presented. In §7.4, a performance
summary is given, comparing the proof-of-concept RX-FFE performance to
previous state-of-the-art designs.
• Finally, chapter 8 concludes the thesis and outlines future work directions.
Chapter 2
Background
2.1 Channel characteristics
In this section, we describe the channel characteristics for typical backplane systems.
We start with transmission line theory to understand the frequency dependence of
the channel loss. Then we introduce some typical channels that demonstrate these
characteristics. These channels are similar to the channel for the target application
and are used as a reference for FFE MATLAB simulations in chapter 4.
2.1.1 Transmission line model
The channel of a backplane system can be modeled as a lossy transmission line. For
a single frequency of excitation, the forward-propagating wave at the position z and
time t has the form [12]
V (z, t) = V +e−αz cos (ωt− βz) (2.1)
where V + is the wave amplitude, β is the phase constant, ω is the angular frequency,
and α is the attenuation constant. The attenuation constant is comprised of the sum
8
2.1. CHANNEL CHARACTERISTICS 9
of the conduction loss attenuation constant, αc, and the dielectric loss attenuation
constant, αd:
α = αc + αd
=1
2RZ−1
o︸ ︷︷ ︸αc
+1
2GZo︸ ︷︷ ︸αd
(2.2)
where Zo is the characteristic impedance, R is the distributed line resistance, and G
is the distributed line conductance. In the ideal case, where there is no frequency
dependence of R or G, the loss is a constant factor e−αl across all frequency, where
l is the length of the transmission line. The frequency dependence comes from the
frequency dependence of the terms R and G, the source of which is covered in the
following sections.
Conduction loss
Conduction loss is dominated by the skin effect. In a good conductor, the wave
amplitude is concentrated at the surface and decreases by a factor e−dδ at depth d.
Because of the exponential nature of the decay, the field and associated currents are
concentrated in the first skin depth, δ. The expression for the skin depth is [12]
δ =1√πfµσ
. (2.3)
Resistance is inversely proportional to this depth resulting in the proportionality
αc ∝ R ∝√f. (2.4)
10 CHAPTER 2. BACKGROUND
Dielectric loss
Dielectric loss is due to dissipation of electromagnetic energy in the dielectric in the
form of heat. As opposed to the conduction loss, dielectric loss is directly proportional
to frequency [13]:
αd ∝ f. (2.5)
Therefore, at lower frequencies the conduction loss is dominant, but as frequencies
increase the dielectric loss begins to dominate. The value of the coefficient, αd, is
dependent on the dielectric in the backplane. FR4 is a low-cost glass-reinforced epoxy
that serves as the dielectric in most printed circuit boards (PCBs). MEGTRON6 is
a low-loss alternative that is gaining popularity, but it is more expensive than FR4.
A comparison of their performance is made in §2.1.4.
Total loss
The total loss of the channel is referred to as the insertion loss (IL). This loss is the
sum of the conduction loss, dielectric loss, and frequency-independent loss. Therefore,
a good model for the insertion loss over frequency is (in terms of dB)
IL(f) = a0 + a1
√f + a2f. (2.6)
As an example, the IEEE standard 802.3bj for 100GBASE-KR4 calls for a maximum
insertion loss of (with f in GHz) [4]
IL(f) ≤ 1.5 + 4.6√f + 1.318f. (2.7)
2.1. CHANNEL CHARACTERISTICS 11
2.1.2 Linear system model
A simplification of the transmission line model is possible by assuming that any
non-ideal effects of the transmission line are negligible (e.g. reflections due to load
mismatch) and expressing the frequency dependent channel loss as a linear transfer
function. Let H(f) be the channel transfer function based on the IL(f) with the
magnitude response
|H(f)| = 10−IL(f)20 (2.8)
and the associated impulse response
h(t) = F−1H(f). (2.9)
Assuming the high-frequency dielectric loss dominates, then
|H(f)| = e−κ|2πf | (2.10)
where κ = ln(10)2π
a2. We can deconstruct H(f) into the symmetric and antisymmetric
components, Hsym(f) and Hasym(f):
Hsym(f) = e−κ|2πf | (2.11)
Hasym(f) = je−κ|2πf | (u(f)− u(−f)) (2.12)
where u(f) is the Heaviside unit step function. The inverse Fourier transforms of the
symmetric component is the even portion of h(t) which can be expressed as
he(t) = F−1Hsym(f) =1
πκ
1(tκ
)2+ 1
(2.13)
ho(t) = F−1Hasym(f) =1
πκ
(tκ
)(tκ
)2+ 1
. (2.14)
12 CHAPTER 2. BACKGROUND
For the simple case where the even and odd contributions are equal, the channel
impulse response has the form
h(t) =1
πκ
(tκ
)+ 1(
tκ
)2+ 1
. (2.15)
Because κ ∝ a2, increasing loss results in increased pulse dispersion. An increase
in a2 can occur due to the properties of the dielectric or due to increasing channel
length. Some example channels that exhibit these properties are considered in §2.1.4.
2.1.3 Pulse response
As described in §2.1.2, the channel can be modeled as a linear system characterized
with the impulse response h(t) such that
rx(t) = tx(t) ∗ h(t) (2.16)
where tx(t) is the signal transmitted over the channel and rx(t) is the signal received
after the channel. The pulse response of the channel, p(t), is defined as the received
signal for a transmitted unit pulse
p(t) = rect(t/T ) ∗ h(t) (2.17)
where
rect(t/T ) =
1 |t| ≤ T/2
0 |t| > T/2
(2.18)
2.1. CHANNEL CHARACTERISTICS 13
is the unit pulse of width T . We are interested in the received signal sampled at the
baud rate, which we represent as
p[n] = p(nT + τ) (2.19)
where τ is chosen such that max(p(t)) = p(τ) resulting in
p[0] = max(p(t)). (2.20)
Notice that τ is essentially the delay of the channel. The term T is also know as the
unit interval (UI) and p[0] is the main cursor. Ideally, p[n] = 0 for all n 6= 0, and
when this is not the case, inter-symbol interference (ISI) is said to occur. The non-
zero terms for n < 0 are referred to as pre-cursors and are components from future
symbols that interfere with the present symbol. Similarly, the non-zero terms with
n > 0 are post-cursors, which are components from past symbols that interfere with
the present symbol. The pulse response for some example channels are considered in
§2.1.4.
2.1.4 Typical channels
The insertion loss versus frequency is plotted in figure 2.1 for channels with the
following properties [2]:
• 0.76 m PCB trace with MEGTRON6 dielectric
• 0.76 m PCB trace with FR4 dielectric
• 1.09 m PCB trace with FR4 dielectric.
For the same dielectric, increasing the channel length increases the loss, as expected.
The channel with the MEGTRON6 dielectric has substantially less attenuation than
14 CHAPTER 2. BACKGROUND
0 5 10 15
0
5
10
15
20
25
30
35
40
Frequency (GHz)
InsertionLoss(dB)
0.76m MEG0.76m FR4
1.09m FR4
Figure 2.1: Insertion loss versus fre-quency for various channels [2].
0 200 400 600 800−0.2
0
0.2
0.4
0.6
0.8
1
Time (ps)
Normalized
PulseRespon
se 0.76m MEG0.76m FR41.09m FR4
Figure 2.2: Normalized pulse responseversus time for various channels [2].
the FR4 channel of the same length, demonstrating the improvement in the dielectric
loss coefficient. For each of these channels, the pulse response is simulated with an
input pulse width T = 50 ps corresponding to a 20 GBd symbol rate1. The normalized
output is plotted in figure 2.2. Higher channel loss corresponds with increased pulse
dispersion, as expected. These channels are characteristic of the channel for the target
application and are used as a reference for FFE MATLAB simulations in chapter 4.
2.2 Receive feedforward equalizer (RX-FFE)
The analog RX-FFE is an equalization block that consists of continuous-time analog
delays, adaptable coefficients, and a summing circuit (see figure 2.3) [14]. It comple-
ments the blind equalization of the CTLE with its agility from the adaptability of
its coefficients. The adaptation of the coefficients is simplified as compared to the
TX-FFE because no back channel is required and it is compatible with well known
adaptation schemes such as the LMS algorithm [6].
The challenge is in the implementation of the analog delays. A digital FFE placed
1Bd is the unit symbol for baud with is the unit of symbol rate.
2.2. RECEIVE FEEDFORWARD EQUALIZER (RX-FFE) 15
after the ADC is relatively simplistic in its implementation, but the penalty is in
the required complexity of the ADC. Performing some of the equalization in the
analog front-end relaxes the resolution requirements of the ADC by reducing the
signal dynamic range (see §2.3.1 for a detailed discussion). In addition, a digital
FFE boosts the high-frequency ADC noise which is an issue that is mitigated by
the RX-FFE. For these reasons, it is worthwhile to trade power and complexity in
the RX-FFE for a reduction in power and complexity of the ADC and DSP. Previous
state-of-the-art designs succeeded in demonstrating the innate agility of the RX-FFE’s
equalization capabilities, but they came up short in terms of power, noise, and area
performance necessary for practical systems [3, 11]. The primary focus of this work is
the design of an RX-FFE with low power, low noise, and low area for ADC-resolution
relaxation in ADC-based links. Throughout the remainder of this text we will refer
to the RX-FFE simply as “FFE” and make a distinction only in the other cases.
2.2.1 FFE operation
In this section, we discuss the details of FFE operation with an example. Consider
the 5-tap FFE with ideal delays, coefficients, and summing circuit that is depicted
in figure 2.4. The transmitted pulse is dispersed by the channel before arriving at
vi
vo
D(s)
c1
1 n-1
cn
D(s)
c2
Figure 2.3: Block diagram of an n-tap RX-FFE.
16 CHAPTER 2. BACKGROUND
+
D(s)
Channel
D(s)
D(s)
D(s)
c1
c2
c3
c4
c5
p2
p1
p3
p4
p5
po
pi
Figure 2.4: Visualization of the pulseresponse equalization for a 5-tap RX-FFE.
0 1 2 3 4 5 6 7
−1
−0.5
0
0.5
1
Summed PulsesEqualized Pulse
PulseRespon
se
Time (UI)
p1
p2
p3
p4
p5
po
Figure 2.5: Pulse response versus timeat the coefficient outputs and FFEoutput.
the FFE input. This signal is then delayed by the analog delays resulting in a family
of five pulses, all offset in time relative to one another. These pulses are scaled and
summed to create the equalized output pulse.
For this example, we fixed c2 = 1 as the main tap. It is intuitive that c1 can
be sized to remove the first pre-cursor and c3 can be chosen to remove the first
post-cursor, etc. The problem is complicated by the ISI of the summed pulses, but
0 1 2 3 4 5 6 7
0
1
Equalized PulseChannel Pulse
Normalized
PulseRespon
se
Time (UI)
Figure 2.6: Normalized pulse re-sponses versus time at the channel out-put and the at the FFE output.
0 5 10 15
0
1
2
Transmit SignalReceive SignalEqualized Signal
Normalized
Amplitude
Time (UI)
Figure 2.7: Transmitted signal, re-ceived signal, and equalized signal fora random sequence of bits.
2.3. METRICS OF FFE PERFORMANCE 17
adaptation schemes exist that can optimize the coefficients [6]. The optimization
method used for this example is outlined in appendix A. The pulses scaled by these
optimal coefficients are shown in the schematic in figure 2.4 as well as in the plot in
figure 2.5. The first post-cursor is subtracted by the third-tap pulse scaled by the
large negative value of c3. The other coefficients are relatively small and c5 is nearly
zero, suggesting that a 4-tap FFE would be sufficient for this example channel. The
sum of the pulses (i.e. the FFE output) is also shown in figure 2.5. The main-cursor
amplitude is attenuated as compared to the input pulse, which is a consequence of
FFEs with coefficients constrained to less than unity. The improvement is most easily
observed in the normalized pulse responses in figure 2.6. As compared to the channel
pulse, the ISI has been substantially reduced in the equalized pulse response.
Figure 2.7 shows the transmit signal, received signal, and equalized signal with
normalized main-cursor amplitude for a random sequences of bits. The reduction in
signal dynamic range is readily apparent from this plot. A more detailed mathematical
analysis of this concept is covered in §2.3.1.
Figure 2.8 shows the normalized magnitude response of the channel, FFE, and
channel+FFE. The attenuation of the channel is accurately inverted by the transfer
function of the FFE up to the Nyquist frequency, explaining the reduction in ISI in
the time domain (see figure 2.6).
2.3 Metrics of FFE performance
2.3.1 Peak-to-main ratio (PMR)
Pulse amplitude modulation (PAM) is a modulation scheme in which the bits are
encoded in the amplitude of the transmitted pulses. Because the channel is a linear
18 CHAPTER 2. BACKGROUND
10−1
100
−40
−30
−20
−10
0
10
20
30
40
Channel+FFEChannelFFE
Normalized
Magnitude(dB)
Normalized Frequency ((UI)−1)
Figure 2.8: Normalized magnitude response versus normalized frequency of the chan-nel, FFE, and channel+FFE.
system, linear superposition holds and the receive signal is
rx[n] =∞∑
k=−∞
bkp[n− k] (2.21)
where bk is the pulse amplitude chosen from a set a symbols. For PAM2 (also referred
to as NRZ), bk ∈ −1, 1 and for PAM4, bk ∈ −1, −1/3, 1/3, 1. In either case,
the maximum possible received signal occurs when the contribution from each of the
pre-cursors and post-cursors adds constructively. This is referred to as the peak signal
and is mathematically represented as
peak =∞∑
k=−∞
|p[k]| . (2.22)
For the pulse in figure 2.9, all of the ISI terms are positive and the peak signal occurs
for an infinite sequence of ones. Because the ISI terms are concentrated in a few UI
around the main cursor, the peak occurs after a short sequence of ones as depicted
in figure 2.10. This excess signal results in an increase in the dynamic range (DR)
2.3. METRICS OF FFE PERFORMANCE 19
−3 −2 −1 0 1 2 3−0.2
0
0.2
0.4
0.6
0.8
1
Time (UI)
PulseRespon
se
p[0]
p[1]
p[2]
p[−1]
p[−2]
Figure 2.9: Normalized pulse responseversus time with the discrete pulse re-sponse terms labeled.
−5 −4 −3 −2 −1 0 1 2 3 4 5−0.5
0
0.5
1
1.5
2
2.5
3
Time (UI)
PulseRespon
ses
p[0]
r[0] =∑
kp[k]Peak
Main
Figure 2.10: The received pulses andreceived signal versus time demon-strating the peak signal due to ISI.
requirement of the ADC in ADC-based links. To see this, first consider the signal-
to-noise ratio (SNR) of the system which is determined by the main cursor, p[0], and
can be written as
SNRsys =(p[0])2
v2n
(2.23)
where v2n is the worst case noise that can be tolerated for a given main cursor ampli-
tude. In an ADC-based system, the ADC must quantize the entire DR of the signal
while attaining the SNR fixed by the main cursor amplitude. So the SNR of the ADC
needs to be
SNRADC =peak2
v2n
(2.24)
and the excess SNR is the ratio of (2.24) to (2.23) resulting in
excess SNR =SNRADC
SNRsys
=
(peak
main cursor
)2
= (PMR)2 (2.25)
20 CHAPTER 2. BACKGROUND
where we defined the term peak-to-main ratio (PMR) as
PMR =
∑∞k=−∞ |p[k]||p[0]|
. (2.26)
Notice that for an ideal pulse with no ISI, PMR = 1. The metric in (2.26) is very
important because it represents the relation between the ISI of the channel and the
excess SNR required by the ADC. For example, a reduction in the PMR by 2×
is equivalent to reducing the SNR requirement of the ADC by 1 bit. Therefore, a
powerful metric for the equalization performance of a block is the ratio of the PMR
at the input to the PMR at the output. This ratio represents the signal DR reduction
due to the block and is termed the DR improvement:
DR improvement =PMRin
PMRout
. (2.27)
2.3.2 Eye opening equivalence
The analysis in §2.3.1 results in the metric, PMR, in (2.26) that characterizes an
equalizer’s performance in regards to the reduction of the DR of the signal. Another
typical objective of an equalizer is to reduce the ISI in order to maximize the eye
opening. In this section, we will show that minimizing the PMR is equivalent to
maximizing the eye opening. To prove this, we will find an expression for the eye
opening in terms of the pulse response and show that maximizing this expression is
equivalent to minimizing the PMR.
The eye opening, in the absence of noise, is the difference between the minimum
signal for a positive symbol and the maximum of a signal for a negative symbol. The
first term is equivalent to finding the minimum of (2.21) for b0 = 1 and n = 0 which
2.3. METRICS OF FFE PERFORMANCE 21
can be expressed as
min (rx[0])|b0=1 = min
(∞∑
k=−∞
bkp[−k]
)∣∣∣∣∣b0=1
= p[0]−∑k 6=0
|p[k]| . (2.28)
Similarly, to maximize the signal for a negative symbol
max (rx[0])|b0=−1 = −p[0] +∑k 6=0
|p[k]| . (2.29)
Taking the difference between (2.28) and (2.29) we obtain an expression for the eye
opening as
eye opening = min (rx[0])|b0=1 − max (rx[0])|b0=−1
= 2p[0]
(2−
∑∞k=−∞ |p[k]|p[0]
)= 2p[0] (2− PMR) . (2.30)
We can observe a few things directly from the eye opening expression in (2.30):
1. It is proportional to the main cursor amplitude, p[0]. This is intuitive because,
in a linear system, scaling the signal would proportionally scale the eye opening.
2. It is general in that it will give a negative number for a closed eye, and this
number is more negative for higher ISI.
3. It is closely related to the PMR in that it is positive (i.e. the eye is open) for
PMR < 2 and negative (i.e. the eye is closed) otherwise.
4. Its magnitude is maximized when the PMR is minimized. For the ideal case,
PMR = 1 and the eye opening is 2p[0], as expected.
22 CHAPTER 2. BACKGROUND
Therefore, although it is the primary objective of this work to minimize ADC DR
requirements (i.e. minimize PMR), this is equivalent to maximizing the eye opening.
As a result, the FFE serves a secondary benefit in that the equalization effort in DSP
will be maximally reduced for an optimal FFE design.
Chapter 3
Analog delays for FFEs
3.1 Delay approximations
For an analog FFE to be viable for high-speed link receivers, it must be implemented
with low power, low noise, and low area. The greatest obstacle to achieving these
goals is in the design of the analog delay. As compared to a TX-FFE or digital FFE,
where the delays can be implemented in the digital domain, the analog RX-FFE
requires an analog delay implementation. In this section, we consider various delays
as approximations to the ideal delay. First, we introduce the ideal delay and discuss
its properties.
3.1.1 Ideal delay
The objective of an analog delay, at the most basic level, is to implement a block that
takes a time-varying input voltage, v(t), and outputs a delayed version, v(t− τ). To
understand how this might be accomplished, it is useful to transform to the frequency
23
24 CHAPTER 3. ANALOG DELAYS FOR FFES
domain with the Laplace transform resulting in [15]
Lv(t− τ) = e−sτV (s) = D(s)V (s) (3.1)
where V (s) = Lv(t) and
D(s) = e−sτ (3.2)
is the transfer function of the ideal delay. Decomposing this into magnitude and
phase we see that
|D(jω)| = 1 (3.3)
6 D(jω) = −τω (3.4)
τg = − d
dω6 D(jω) = τ. (3.5)
This transfer function can be exactly realized with a lossless transmission line, but
the area required would be excessive. For example, achieving a delay of 25 ps with a
conductor in silicon dioxide (εr = 3.9) requires a length of 1.92 mm which is too large
for a typical integrated circuit application. In addition, any practical implementation
will suffer from the high-frequency loss mechanisms due to skin depth and dielectric
loss. These topics are discussed in more detail in §2.1 with respect to channel charac-
teristics, but the essential concept is the same here. For these reasons, we investigate
lumped-circuit approximations to the ideal delay in the following sections.
3.1.2 Lumped delay line
Taking inspiration from the lumped-LC model of an infinitesimal transmission-line
segment, one possible delay implementation is shown in figure 3.1. For equal values
of all the capacitors and all the inductors, this circuit represents an N -order lumped
3.1. DELAY APPROXIMATIONS 25
approximation to a lossless transmission line. The delay per section is [16]
τ1 =√LC (3.6)
with the total delay
τtot = N√LC. (3.7)
The cutoff frequency is [16]
ωh =2√LC
(3.8)
above which the lumped line’s characteristics are degraded substantially. Since the
goal is to obtain a specific delay time, it is illustrative to substitute (3.7) into (3.8)
resulting in
ωh =2N
τtot
(3.9)
which shows that the accuracy of the approximation increases with the order, N .
Unfortunately, increasing N requires increasing the number of inductors resulting in
a large area penalty. Furthermore, the finite Q factor of the inductors will introduce
loss into the delay.
The magnitude and phase response of the lumped-LC delay with τ = 25 ps are
shown in figure 3.2 for orders 1, 2, and 3. The bandwidth of the delay response
(i.e. the range over which the magnitude and phase response match that of the ideal
delay) increases proportional to N as suggested by (3.9). The discrepancy in the
L1 LN
C1 CNvi vo
Figure 3.1: Schematic diagram of an N -order lumped-LC approximation of a losslesstransmission line.
26 CHAPTER 3. ANALOG DELAYS FOR FFES
10−1
100
101
102
−20
−10
0
10
10−1
100
101
102
−300
−200
−100
0
100
N = 1N = 2N = 3Ideal
Frequency (GHz)
Frequency (GHz)
Magnitude(dB)
Phase(D
eg)
Figure 3.2: Magnitude (top) andphase (bottom) versus frequency forLC delays of orders 1, 2, and 3.
10−1
100
101
102
0
10
20
30
40
50
N = 1N = 2N = 3Ideal
Frequency (GHz)
GroupDelay
(ps)
Figure 3.3: Group delay versus fre-quency for LC delays of orders 1, 2,and 3.
delay behavior is best observed in the group delay plot shown in figure 3.3. While
the agreement with the ideal group delay curve does improve with N , there are errors
below the Nyquist frequency even for N = 3 which limits the effectiveness of this
structure as a delay.
3.1.3 Bessel delays
In terms of group delay, Bessel delays are an even better delay approximation than the
lumped-LC delay. The group delay of the Bessel filter is maximally flat by design.
That is, for an N -order Bessel delay, the first N − 1 terms in the Taylor series
expansion of the group delay at ω = 0 are zero [17]. In this way, this delay type
optimally emulates the linear phase of an ideal delay. For the first three orders, the
3.1. DELAY APPROXIMATIONS 27
Bessel delay transfer functions for a delay of τ are [17]
Dbs1(s) =1
1 + (sτ)(3.10)
Dbs2(s) =3
3 + 3(sτ) + (sτ)2(3.11)
Dbs3(s) =15
15 + 15(sτ) + 6(sτ)2 + (sτ)3. (3.12)
The first-order case is simply a first-order pole. For higher orders, complex poles are
required and the response can be realized with a similar LC circuit as in figure 3.1
incurring a similar inductor-area penalty. Alternatively, active circuit realizations are
complex and result in high-power and high-noise penalties.
The magnitude and phase response of the Bessel delay with τ = 25 ps are shown
in figure 3.4 for orders 1, 2, and 3. The magnitude response begins to roll off at a low
frequency, which is a consequence of optimizing the pole locations for the maximally-
flat group delay while ignoring the magnitude response. The benefits of the Bessel
delay can be best observed in the group delay plot shown in figure 3.5. The group
delay matches the constant group delay of the ideal case up to a high frequency.
10−1
100
101
102
−20
−10
0
10
10−1
100
101
102
−300
−200
−100
0
100
N = 1N = 2N = 3Ideal
Frequency (GHz)
Frequency (GHz)
Magnitude(dB)
Phase(D
eg)
Figure 3.4: Magnitude (top) andphase (bottom) versus frequency forBessel delays of orders 1, 2, and 3.
10−1
100
101
102
0
5
10
15
20
25
30
N = 1N = 2N = 3Ideal
Frequency (GHz)
GroupDelay
(ps)
Figure 3.5: Group delay versus fre-quency for Bessel delays of orders 1,2, and 3.
28 CHAPTER 3. ANALOG DELAYS FOR FFES
It is possible to extend the bandwidth of the group delay by using an all-pole
equal-ripple delay [17]. This delay is similar in concept to a Chebyshev or elliptic
response where the ripple is bounded in the magnitude response, but in this case
the bound is on the ripple of the group delay. The improvement of the group delay
bandwidth is only marginal as compared to the Bessel delay so this delay type is not
given further consideration here.
3.1.4 Pade delays
Although Bessel delays have the maximally flat group delay for an all-pole transfer
function, the additional degrees of freedom introduced by adding zeros can result
in an improved approximation. One method to approximate the ideal delay with
poles and zeros is to use the Pade approximant. The Pade approximant gives the
best rational function approximation of a desired function in terms of matching the
highest possible number of Taylor series coefficients [18]. The Taylor expansion of the
ideal delay in (3.2) is
D(s) = e−sτ = 1− (sτ) +1
2(sτ)2 − 1
6(sτ)3 +
1
24(sτ)4 − 1
120(sτ)5 +O
((sτ)6
).
The derivation of the Pade approximant for the function ex is covered in appendix C.
Substituting x → (−sτ) into the expressions in (C.8), (C.9), and (C.10) results in
the Pade delays
Dpd1(s) =1− 1
2(sτ)
1 + 12(sτ)
(3.13)
Dpd2(s) =1− 1
2(sτ) + 1
12(sτ)2
1 + 12(sτ) + 1
12(sτ)2
(3.14)
Dpd3(s) =1− 1
2(sτ) + 1
10(sτ)2 − 1
120(sτ)3
1 + 12(sτ) + 1
10(sτ)2 + 1
120(sτ)3
. (3.15)
3.1. DELAY APPROXIMATIONS 29
To see how these delays are approximations of the ideal delay we expand their Taylor
series by the method of polynomial long division obtaining
Dpd1(s) = 1− (sτ) +1
2(sτ)2 − 1
4(sτ)3 +O
((sτ)4
)(3.16)
Dpd2(s) = 1− (sτ) +1
2(sτ)2 − 1
6(sτ)3 +
1
24(sτ)4 − 1
144(sτ)5 +O
((sτ)6
)(3.17)
where the first non-matching coefficient is boxed for each case.
The magnitude and phase response of Pade delays with τ = 25 ps are shown in
figure 3.6 for orders 1, 2, and 3. The left-half plane poles exactly match the right-half
plane zeros, canceling in magnitude and summing in phase. Therefore the magnitude
response, in the absence of any additional parasitic poles, is ideal across all frequencies.
The phase response is also in good agreement with that of the ideal delay which can
be best seen in the group delay plot in figure 3.7. Due to the additional degrees of
freedom introduced by the zeros, the group delay matches to an even higher frequency
(for a given order) than that of the Bessel delay. In particular, the bandwidth of the
constant group delay of Dpd1(s) is approximately 2× greater than that of Dbs1(s)
10−1
100
101
102
−20
−10
0
10
10−1
100
101
102
−300
−200
−100
0
100
N = 1N = 2N = 3Ideal
Frequency (GHz)
Frequency (GHz)
Magnitude(dB)
Phase(D
eg)
Figure 3.6: Magnitude (top) andphase (bottom) versus frequency forPade delays of orders 1, 2, and 3.
10−1
100
101
102
0
5
10
15
20
25
30
N = 1N = 2N = 3Ideal
Frequency (GHz)
GroupDelay
(ps)
Figure 3.7: Group delay versus fre-quency for Pade delays of orders 1, 2,and 3.
30 CHAPTER 3. ANALOG DELAYS FOR FFES
and nearly equal to Dbs2(s). Because the first-order pole and zero of Dpd1(s) can be
realized more simplistically in practice than the complex pole pair in Dbs2(s), it has
been a popular choice as an analog delay in many previous designs [3, 19, 20].
3.2 Equivalence of first-order delays
In this section, we introduce a set of first-order delays that are equivalent for FFEs
in terms of realizable transfer functions. In §3.2.1, we outline the theory behind the
equivalence. In §3.2.2, we investigate the practical consequences of this theorem in
terms of coefficient spread. Finally, in §3.2.3 we present an example to demonstrate
the strengths and limitations of the theory for a practical case.
3.2.1 Theorem
Any transfer function obtainable by an FFE with first-order Pade delays can be
exactly replicated by an FFE with delays consisting of a first-order pole and zero with
arbitrary offset, α. This result is proven in appendix B and the result is presented
here. For any feasible transfer function of an N -tap FFE with first-order Pade delays
having the form
Hpd1(s) =N−1∑n=0
cnDnpd1(s) (3.18)
we can create an equivalent FFE with delays
Dα(s) =1− 1
2αsτ
1 + 12sτ
(3.19)
such that
Hα(s) =N−1∑n=0
cαnDnα(s) = Hpd1(s). (3.20)
3.2. EQUIVALENCE OF FIRST-ORDER DELAYS 31
The coefficients cαn can be obtained from the coeffients cn by a matrix multiplication.
If we define the vectors
c1 =[c1 c2 · · · cN
]T(3.21)
cα =[cα1 cα2 · · · cαN
]T, (3.22)
then, by the definition of the matrix Mα in (B.12),
cα = Mαc1. (3.23)
This result is leveraged for the delay implementation in the proof-of-concept FFE
(see §5.3.1).
3.2.2 Coefficient spread
The coefficient spread of a set of coefficients is defined as the ratio of the maximum
coefficient magnitude to the minimum. It is an important metric because it sets the
requirement on the bit resolution of the coefficient’s circuit realization. Therefore,
10−3
10−2
10−1
100
101
0
10
20
30
40
50
N=2N=3
N=4
N=5
Pole/Zero Ratio α
‖Mα‖ 2
Figure 3.8: The spectral norm versus the pole and zero ratio of the delay, α, for FFEsof orders 2, 3, 4, and 5.
32 CHAPTER 3. ANALOG DELAYS FOR FFES
it is necessary to understand the effect of the transformation Mα on the coefficient
spread. Unfortunately, this transformation is a complicated process with respect to
the coefficient spread and no closed-form expression exists. Instead, what we can
easily compute is the spectral norm, ‖Mα‖2, which gives us the bound
‖cα‖2
‖c1‖2
≤ ‖Mα‖2. (3.24)
Note that this bound is not equal to the coefficient spread, but it is related and serves
as a useful proxy to enable a closed-form analysis. This bound is plotted in figure
3.8 versus α for FFEs of order 2, 3, 4, and 5. For the limit α→ 0, the spectral norm
approaches the finite value ‖M0‖2. In the next section, we investigate an example to
illustrate the coefficient spread penalty incurred in a practical case and compare that
to the bound in figure 3.8.
3.2.3 Example transformation
Consider three different 5-tap FFEs with first-order delays with pole and zero ratio
α = 0, α = 13, and α = 1 with τ = 25 ps for all cases. The magnitude response and
phase response of these delays are substantially different as shown in figure 3.9. In
fact, the group delay is not even equal for these delays (see figure 3.10)1, but the
analysis in the previous section shows that these delays can be used to construct
FFEs with equivalent transfer functions.
To show an example of this, consider the channel pulse for the 1.09 m FR4 trace
in figure 2.2. For the α = 1 case, the optimum coefficients to maximize the DR
1For this family of delays, τg = 12 (1 + α)τ . The fact that such a wide range of group delays can
achieve the same equalization demonstrates the agility of the FFE to absorb changes in its delays.
3.2. EQUIVALENCE OF FIRST-ORDER DELAYS 33
improvement as defined in (2.27) are (see appendix A.3 for the optimization method)
c1 =[0.286 1 −0.857 −0.857 0.571
]T. (3.25)
Using the transformation matrix we find the coefficients for the equivalent transfer
functions with α = 13
and α = 0 to be
c13
= M13c1 =
[−0.286 1.393 2.893 −6.750 2.893
]T(3.26)
c0 = M0c1 =[−0.143 −4.286 20.57 −25.14 9.143
]T. (3.27)
The magnitude and phase for these FFEs are plotted in figure 3.11 and they are
identical, as expected. The norm of the α = 1 case is ‖c1‖2 = 1.696 and the increases
in the norms due to the transformations are
‖c13‖2
‖c1‖2
= 4.728 ≤ 10.049 =∥∥∥M1
3
∥∥∥2
(3.28)
‖c0‖2
‖c1‖2
= 20.06 ≤ 46.042 = ‖M0‖2 (3.29)
10−1
100
101
102
−20
−10
0
10
10−1
100
101
102
−200
−100
0
100
Frequency (GHz)
Frequency (GHz)
Magnitude(dB)
Phase(D
eg)
α = 0α = 1/3
α = 1
Figure 3.9: Magnitude and phase ver-sus frequency for first-order delayswith α = 0, α = 1/3, and α = 1.
10−1
100
101
102
0
10
20
30
40
50
Frequency (GHz)
GroupDelay
(ps)
α = 0
α = 1/3
α = 1
Figure 3.10: Group delay versus fre-quency for first-order delays with α =0, α = 1/3, and α = 1.
34 CHAPTER 3. ANALOG DELAYS FOR FFES
10−1
100
101
102
103
−20
−10
0
10
10−1
100
101
102
103
−300
−200
−100
0
100
Frequency (GHz)
Frequency (GHz)
Magnitude(dB)
Phase(D
eg)
α = 0
α = 1
α = 1/3
Figure 3.11: FFE magnitude (top) andphase (bottom) versus frequency forα = 0, α = 1/3, and α = 1 with idealcoefficient transformations.
10−1
100
101
102
103
−20
−10
0
10
10−1
100
101
102
103
−300
−200
−100
0
100
Frequency (GHz)
Frequency (GHz)
Magnitude(dB)
Phase(D
eg)
α = 0
α = 1
α = 1/3
Figure 3.12: FFE magnitude (top) andphase (bottom) versus frequency forα = 0, α = 1/3, and α = 1 with prac-tical coefficient transformations.
which are more than 2× less than the bound in each case. The increases in the
coefficient spread for these cases are
coefficient spread (α = 13)
coefficient spread (α = 1)= 6.75 (3.30)
coefficient spread (α = 0)
coefficient spread (α = 1)= 50.2 (3.31)
where the coefficient spread of the α = 1 case is
coefficient spread (α = 1) =max(|c1|)min(|c1|)
= 3.5. (3.32)
This demonstrates that, while the coefficient spread is not exactly the ratio of the
norms, these two metrics are closely related.
The increase in the coefficient spread for both cases is significant, but it can be
reduced if a discrepancy in the transfer functions can be tolerated. For example, it is
easy to imagine for the α = 0 case that the smallest coefficient could be set to zero
with very little impact on the overall transfer function. With this minor change, the
3.2. EQUIVALENCE OF FIRST-ORDER DELAYS 35
0 100 200 300 400−0.2
0
0.2
0.4
0.6
0.8
1
Time (ps)
PulseRespon
se
After Channel
After FFE
α = 0
α = 1/3
α = 1
Figure 3.13: Pulse responses versus time for the channel pulse and equalized pulsesafter 5-tap FFEs with α = 0, α = 1/3, and α = 1 optimized with 5-bit coefficientresolution.
new coefficient spread is just 5.87 which represents an increase of just 1.68×. This
is substantially less than original solution with only a minor increase in mismatch
between the transfer functions.
To better understand these trade-offs, we optimize each FFE independently with
a fixed 5-bit coefficient resolution (see appendix A.2 for the optimization method).
The fixed coefficient resolution essentially puts an upper bound on coefficient spread
for all three cases. The resultant transfer functions are plotted in figure 3.12. Al-
though there is significant discrepancy in the high-frequency magnitude and phase,
there is insignificant signal power concentrated at these frequencies, so the impact on
equalization performance is limited. This is a fact that is best illustrated in the time
domain (see figure 3.13) where the reduction of the ISI and the associated PMR are
essentially equivalent for each case.
The conclusion is that the dependence of the FFE performance on α is complex
and needs to be carefully considered in the FFE design. The FFE performance
dependence on α is given additional consideration in §4.4.3.
36 CHAPTER 3. ANALOG DELAYS FOR FFES
3.3 Summary
In this chapter, we considered various analog delay approximations for FFEs. We
determined that the first-order Pade delay provides a good design choice in terms of
performance versus the implementation complexity. In addition, we showed that the
performance of all first-order delays in FFEs is equivalent in terms of the achievable
FFE transfer functions. The drawback was shown to be the increase in the coefficient
spread. For the example fifth-order FFE with α = 1/3, the equalization performance
degradation as compared to the Pade delay was demonstrated to be marginal. We
leverage this in the implementation of the single-path Pade-inspired delay introduced
in §5.3.1 and utilized in the proof-of-concept FFE design.
Chapter 4
Analog FFEs in high-speed links
4.1 FFE design parameters
The design space of generic FFEs is high dimensional and includes the following
parameters.
1. Delay type. The impact of delay type (i.e. ideal, Bessel, and Pade) on FFE
performance is investigated in §4.3. The ideal delay outperforms all other de-
lays, but has no exact lumped-circuit realization. The Pade delays outperform
all Bessel delays in the example simulations and the third-order Pade delay
approaches the performance of the ideal delay. At the optimal delay time, the
first-order Pade delay performs close to the optimum which, along with the
potential for simplicity in circuit realization, makes it a suitable design choice.
The first-order Pade delay is therefore chosen for the simulations of all of the
following FFE parameters.
2. Delay time. The impact of delay time on FFE performance is investigated
in §4.4. In §4.4.1, a mathematical analysis shows that the optimal delay time
should depend on the channel more so than the system baud rate. This is in
37
38 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS
opposition to the commonly held view that the delay time should be related to
the UI1. This analysis is supported by simulations in §4.4.2 which characterize
the optimal delay time for first-order Pade delays for various channels. In
§4.4.3, the FFE performance is simulated versus the pole and zero ratio, α, for
the cases of constant τ and constant group delay, τg. A reasonable design choice
is τ = 25 ps and is the chosen parameter unless otherwise stated.
3. Number of taps. The impact of number of taps, n, on FFE performance is in-
vestigated in §4.5. Because the power is directly proportional to this parameter,
it is a critical choice in the FFE design. There are diminishing returns on DR
improvement for n > 3, but sensitivity to delay time decreases for additional
taps. For the proof-of-concept design in this work, we choose a 5-tap FFE to
maximize the equalization performance, but the remainder of the simulations in
this chapter compare 3-tap, 4-tap, and 5-tap FFEs to demonstrate the impact
on parameter sensitivity of more aggressive designs.
4. Parasitic pole frequency. The impact of the parasitic pole frequency on
FFE performance is investigated in §4.6. Each node in the FFE implementation
introduces an unwanted parasitic pole, limiting the equalization performance.
Increasing the frequency of this pole improves the performance, but comes with
a cost of power (i.e. increased drive strength) or area (i.e. peaking inductors).
Therefore, this is a critical design parameter and its impact on FFE performance
must be well understood. Based on the simulations, a reasonable design target
is fp = 20 GHz with substantial performance degradation for fp < 10 GHz.
5. Coefficient resolution. The impact of coefficient resolution on FFE perfor-
mance is investigated in §4.7. Excessive coefficient resolution complicates the
1The optimal delay time is related to the UI when the objective is to completely equalize thechannel as it is in [3, 11]. When only partial equalization is the objective, the channel characteristicsdominate and dictate the optimal delay time.
4.2. SIMULATION METHODOLOGY 39
design and directly increases the parasitic capacitance at the delay output node.
Insufficient resolution results in suboptimal equalization performance. There
are diminishing returns for bits ≥ 5 suggesting that 5 bits (plus sign) is a rea-
sonable design choice. An aggressive design may pursue even lower resolution
coefficients to push the power and bandwidth performance.
6. Main cursor attenuation. The impact of main cursor attenuation on FFE
performance is investigated in §4.8. Attenuation is an unavoidable consequence
of FFEs with unity-gain limited coefficients and, as a parameter, it trades with
equalization performance and is a function of all the aforementioned design pa-
rameters. In particular, it is a parameter in the coefficient optimization and a
strong function of the channel characteristics as well as the target DR improve-
ment. The trade-off between DR improvement and main cursor attenuation is
the focus of §4.8. The performance degrades rapidly when the limit on attenu-
ation is greater than 12. For this reason, the main cursor attenuation is limited
to 12
in all other simulations (see §4.2 for details).
The design space is too complicated to cover completely. The simulations and plots
in this chapter represent a succinct subset of the possibilities to highlight the general
trade-offs and guide the design decisions for this work.
4.2 Simulation methodology
For each simulation, the optimal coefficents, c, are determined using the methods
detailed in appendix A. To summarize the methodology, (unless otherwise stated)
the coefficient’s magnitudes are bounded by unity, the main cursor tap is fixed at
unity, the main cursor is tap number 2, and the main cursor attenuation at the FFE
output is limited to 12. The pulse at the input of the FFE is defined as pi and at the
40 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS
output as po. As mathematical expressions, these constraints can be stated as
|ck| ≤ 1 for all k (4.1)
c2 = 1 (4.2)
max(po) ≥1
2. (4.3)
For the input pulse, pi, we choose the pulse response of the 1.09 m FR4 PCB trace
from [2] plotted in figure 4.2. Due to the long trace, there is significant dispersion for
this channel, and the PMR for this pulse is
PMRch = 6.0. (4.4)
The minimum PMR for the FFE output pulse for which the constraints are satisfied
is defined as PMReq which is found with the methods in appendix A. Using this term,
the performance metric defined in (2.27) is
DR Improvement =PMRch
PMReq
. (4.5)
Because PMReq is the minimum and PMRch is a constant, the DR improvement is
maximized under the constraints. For each point on the plots in this chapter, the
optimal FFE coefficients are found to determine the optimum DR improvement for
the given FFE parameters.
4.3 Delay type
For the 3-tap FFE in figure 4.1, the simulation methodology in §4.2 is repeated to
simulate for delay times from 10 ps to 100 ps for ideal, Bessel, and Pade delay types
of orders 1, 2, and 3. The result is plotted in figure 4.2.
4.4. DELAY TIME 41
pi
po
delay
c1
τ 1 c3
typedelay
τ
type
Figure 4.1: The 3-tap FFE block dia-gram with variable delay type and de-lay time for the MATLAB simulationof the optimal delay time.
0 20 40 60 80 1001
1.5
2
2.5
3
3.5
4
Delay Time (ps)
DR
Improvement
BesselPadeIdeal
Figure 4.2: DR improvement versusdelay time for a 3-tap FFE with Besseland Pade delay types of order 1 (solid),2 (dashed), and 3 (dotted).
For this simulation, the ideal delay time is approximately 30 ps, independent of
delay type. The first-order Pade delays outperform all Bessel delays and the ideal
delay puts an upper bound on the performance. For the optimal delay time of 30 ps,
the performance of the first-order Pade delays is within 5% of the ideal delays. This
fact, along with the potential for simplicity in circuit realization, makes it a good
design choice. The first-order Pade delay is therefore chosen for the simulations in
the following sections.
4.4 Delay time
4.4.1 Mathematical analysis
In §2.3.1 we defined the pulse response in (2.17) which we repeat here for convenience:
p(t) = rect(t/T ) ∗ h(t). (4.6)
42 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS
From this equation, we see that the pulse response depends on both the impulse
response of the channel, h(t), and the transmitted pulse width, T . There are two
interesting limiting cases to consider: low baud rate and high baud rate. For the low
baud rate case, the transmitted pulse width, T , is long compared to the length of the
impulse response. Therefore, a reasonable approximation is that h(t)→ δ(t) and
pl(t) ≈ rect(t/T ) ∗ δ(t)
= rect(t/T ). (4.7)
This is intuitive in that, for a low baud rate, a transmitted pulse passes unimpeded
through the channel. The second limiting case is for a high baud rate where the
pulse width, T , is sufficiently narrow such that we can make the approximation that
rect(t/T )→ δ(t) and
ph(t) ≈ δ(t) ∗ h(t)
= h(t). (4.8)
A consequence is that, for a sufficiently high baud rate, the pulse response approaches
the impulse response and is independent of the baud rate. This is supported by visual
inspection of the pulse responses in figure 2.2. A broadening of the transmit pulse
width will result in modest changes to the pulse response due to the substantial
dispersion of the channel relative to the baud samples. Therefore, for high baud rate
systems, such as the 20 GBd target for this work, the performance of the FFE in
terms of the delay time is determined by the channel characteristics more so than
the baud rate. In §4.4.2, we support this analysis with MATLAB simulations that
demonstrate that the optimum delay time is a function of the channel characteristics.
Delay characteristics other than the group delay impact the performance and
4.4. DELAY TIME 43
must be considered. The best example of this is a consequence of the first-order delay
equivalence theorem in §3.2. Because FFEs with delays of the form
Dα(s) =1− 1
2αsτ
1 + 12sτ
(4.9)
can achieve identical transfer functions, their achievable DR improvement is indistin-
guishable, but the group delay of Dα(s) is
τg =1
2(1 + α)τ (4.10)
which is an affine function in α. For α = 1 (i.e. the first-order Pade delay), τg = τ
but for all other α the τ in the delay definition does not equal the group delay. As a
result, the ideal delay time is not only dependent on the channel characteristics, but
also on the delay type. Therefore, a careful study of the FFE performance should be
made for the anticipated channel characteristics and the specific delay type chosen.
In §4.4.3, the DR improvement versus α is simulated for the cases with constant τ
and constant τg.
4.4.2 Channel characteristic dependence
In §4.4.1, a mathematical analysis shows that the optimal delay time should depend
on the channel more so than the system baud rate. The actual value of the optimal
delay time for a given channel is difficult to determine mathematically. Instead, it is
easiest to simulate the FFE performance with realistic channel pulses.
The block diagram for the 2-tap FFE used in this simulation is shown in figure
4.3. The first tap is chosen as the main cursor so that the second coefficient, c2, is
adapted to minimize the post-cursor ISI (i.e. maximize DR improvement). The delay
is a first-order Pade delay with the delay time, τ , swept from 10 ps to 100 ps. In
44 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS
pi
po
c2
τ
1
Dpd1(s)
Figure 4.3: Block diagram of a 2-tapFFE with first-order Pade delays; vari-able delay time, τ ; and optimal coeffi-cient, c2.
0 20 40 60 80 1001
1.5
2
2.5
3
3.5
4
0.76 m Meg0.76 m FR41.09 m FR4
Delay Time (ps)
DR
Improvement
Figure 4.4: DR improvement versusdelay time for the 2-tap FFE in figure4.3 for various channel pulse inputs.
essence, this simulation is subtracting a Pade delayed pulse from the original pulse
to maximally cancel the post-cursor ISI. The simplicity helps to isolate the impact of
the channel characteristics on the optimal delay time without the complications from
the other FFE parameters. For the channels, we use those introduced §2.1.4 with the
pulse responses in figure 2.2 [2]:
• 0.76 m PCB trace with MEGTRON6 dielectric
• 0.76 m PCB trace with FR4 dielectric
• 1.09 m PCB trace with FR4 dielectric.
The optimum DR improvement versus delay time for these channel pulses is plotted in
figure 4.4. Higher channel attenuation results in a larger optimal delay time because
the subtracted pulse is most effective when it is delayed to coincide with the additional
post-cursor ISI as compared to the other channels. In addition, the optimum DR
improvement increases with the channel attenuation because there is more ISI and,
therefore, more equalization possible. As a result, the FFE is most effective in systems
with high channel attenuation.
4.4. DELAY TIME 45
The conclusion from these simulations is that the optimal delay time is closely
related to the channel characteristics. It is shown in §4.5 that this can be somewhat
mitigated by adding FFE taps, but an accurate choice or adaptability of the delay
time is still a necessity for optimum FFE performance.
4.4.3 First-order delays
In this section, we investigate the impact of delay time on FFE performance with
respect to the first-order pole and zero offset, α. For the first simulation, τ = 25 ps
so that the group delay varies with α as
τg =1
2(1 + α)25 ps. (4.11)
Using the methodology in §4.2 the optimum DR improvement is found for FFEs of
order 3, 4, and 5 with delay pole and zero offset in the range 0 ≤ α ≤ 1. The result
is plotted in figure 4.5. While the performance is relatively consistent versus α for
n = 4 and n = 5, the n = 3 case shows significant degradation for small α.
For the second simulation, τg = 25 ps, which is achieved by changing τ as a
function of α as defined in the expression
τ =25 ps
12(1 + α)
. (4.12)
This is repeated for each α and the result is plotted in figure 4.6. Using the method-
ology in §4.2 the optimum DR improvement is found for FFEs with order 3, 4, and
5 with delay pole and zero offset in the range 0 ≤ α ≤ 1. The result is plotted in
figure 4.6. The n = 3 case from this simulation outperforms the previous simulation,
but it again substantially under-performs the n = 4 and n = 5 cases. Also for this
simulation, there is a greater dependence on α for the n = 4 and n = 5 cases.
46 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS
0 0.2 0.4 0.6 0.8 11
1.5
2
2.5
3
3.5
4
n=3n=4n=5
Pole and Zero Ratio, α
DR
Improvement
uation
Figure 4.5: DR improvement versuspole and zero ratio, α, for τ = 25 ps.
0 0.2 0.4 0.6 0.8 11
1.5
2
2.5
3
3.5
4
n=3n=4n=5
Pole and Zero Ratio, α
DR
Improvement
uation
Figure 4.6: DR improvement versuspole and zero ratio, α, for τg = 25 ps.
In both simulations, the optimum case occurs for α = 1, but the 5-tap FFE
limits the performance degradation for α < 1. Over a wide range of α, the 5-tap FFE
outperforms the 4-tap FFE with α = 1. Therefore, an FFE using delays implemented
with α < 1 can recover the performance by adding a tap. This technique is used in
the proof-of-concept design in this work where α = 13
delays are employed in a 5-tap
FFE.
4.5 Number of taps
The number of taps in the FFE, n, determines the number of delays, coefficients,
and inputs in the summing circuit. Therefore, n is directly proportional to the FFE
power and must be carefully considered in the design process. Figure 4.7 shows the
block diagram of an n-tap FFE with first-order Pade delays and variable delay time,
τ . To investigate the impact of n on FFE performance, the simulation methodology
in §4.2 is repeated for τ from 10 ps to 100 ps for 3-tap, 4-tap, and 5-tap FFEs. The
result is plotted in figure 4.8.
An (n + 1)-tap FFE can realize any transfer function that an n-tap FFE can
4.6. PARASITIC POLE FREQUENCY 47
pi
po
Dpd1(s)
c1
1 n-1
τ
1 cn
τ
Dpd1(s)
Figure 4.7: Block diagram of an n-tapFFE with variable delay time, τ , andoptimal coefficients, c1 to cn.
0 20 40 60 80 1001
1.5
2
2.5
3
3.5
4
n = 3n = 4n = 5
Delay Time (ps)
DR
Improvement
Figure 4.8: DR improvement versusdelay time for 3-tap, 4-tap, and 5-tapFFEs with first-order Pade delays.
by simply setting cn+1 = 0. Therefore, performance improves with the number of
taps, which is unsurprisingly the case for this plot. There is not a significant gain
in performance for n > 3, though. The benefit of increasing n is that the sensitivity
to delay time decreases. For n = 3, a very narrow range of delay times can achieve
optimum DR improvement, but the range for which the n > 3 cases can meet or exceed
that optimum is much wider. This insensitivity can be explained by considering the
n = 5 case. With c4 = c5 = 0, the FFE is reduced to a 3-tap FFE with delay time
τ , but for c2 = c4 = 0, the FFE reduces to a 3-tap FFE with delay time 2τ . By
interpolating between these cases, an effective delay time in the range of τ to 2τ can
be obtained.
4.6 Parasitic pole frequency
Low-frequency parasitic poles can limit the equalization performance, but increasing
the pole frequency comes at a cost of power (i.e. increased drive strength) or area
(i.e. peaking inductors). Therefore, this is a critical design parameter and its impact
on FFE performance must be well understood. Figure 4.9 shows the block diagram
48 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS
pi
po
c1
1 n-1
τ=25ps
1 cn
τ=25ps
fp fp fp
fpfpfp
fp
Dpd1(s) Dpd1(s)
Figure 4.9: Block diagram of an n-tapFFE with variable parasitic pole fre-quency, fp, at each node.
100
101
102
1
1.5
2
2.5
3
3.5
4
n = 3n = 4n = 5
Bandwidth, fp (GHz)
DR
Improvement
Figure 4.10: DR improvement versusfp for 3-tap, 4-tap, and 5-tap FFEswith first-order Pade delays.
of an n-tap FFE with first-order Pade delays with constant τ = 25 ps and variable
parasitic pole frequency, fp, at each node. The simulation methodology in §4.2 is
repeated for fp from 1 GHz to 100 GHz and for 3-tap, 4-tap, and 5-tap FFEs. The
result is plotted in figure 4.10. The FFE performance increases monotonically with fp,
as expect. For fp < 10 GHz, the performance falls off substantially. Based on these
results, a reasonable design target is fp = 20 GHz, above which there are diminishing
performance benefits.
4.7 Coefficient resolution
Excessive coefficient resolution complicates the design and directly increases the par-
asitic capacitance at the delay output. Insufficient resolution results in suboptimal
FFE performance. The block diagram for this simulation is similar to the n-tap FFE
in figure 4.7, but with constant τ = 25 ps and variable coefficient resolution in bits.
4.7. COEFFICIENT RESOLUTION 49
1 2 3 4 51
1.5
2
2.5
3
3.5
4
n = 3n = 4n = 5
Coefficient Resolution (bits)
DR
Improvement
Figure 4.11: DR improvement versuscoefficient resolution (plus sign bit)for 3-tap, 4-tap, and 5-tap FFEs withfirst-order Pade delays.
0 0.2 0.4 0.6 0.8 11
1.5
2
2.5
3
3.5
4
n = 3n = 4n = 5
Main Cursor Amplitude
DR
Improvement
Figure 4.12: DR improvement versusmain cursor amplitude for 3-tap, 4-tap, and 5-tap FFEs with first-orderPade delays.
The least-significant bit of each coefficient as a function of the bits is
LSB =1
2bits(4.13)
and the optimum coefficients are found through a brute-force sweep as outlined in
appendix A.2. The simulation methodology in §4.2 is repeated for coefficient resolu-
tions from 1 to 5 bits (plus sign bit) and for 3-tap, 4-tap, and 5-tap FFEs. The result
is plotted in figure 4.11.
There are diminishing returns for bits ≥ 3 suggesting that 3 bits (plus sign) is a
reasonable design choice for an aggressive design that pushes the power and bandwidth
performance. For the proof-of-concept design in this work, we choose a resolution of
5 bits (plus sign) to provide some design margin.
50 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS
4.8 Main cursor attenuation
Attenuation is an unavoidable consequence of FFEs with unity-gain limited coef-
ficients. It is a function of all the other FFE parameters and can be traded for
equalization performance. In particular, it is a parameter in the coefficient optimiza-
tion (see appendix A) and a strong function of the channel characteristics as well as
the target DR improvement. The block diagram for this simulation is similar to the
n-tap FFE in 4.7, but with constant τ = 25 ps. The optimal coefficients are found
using the method in appendix A.3 where the threshold in the constraint on the main
cursor amplitude (i.e. max(po) ≥ threshold) is swept. This process is repeated for
3-tap, 4-tap, and 5-tap FFEs. The result is plotted in figure 4.12.
The performance degrades rapidly for output main cursor amplitudes greater than
12. For this reason, the threshold is fixed at 1
2in all other simulations in this chapter,
as outlined in §4.2. This value is a strong function of the channel with a higher atten-
uation required to equalize for channels with low ISI. Therefore, the FFE performs
best in systems with substantial ISI, which is the target of this work.
4.9 Summary
In this chapter, we used MATLAB simulations to investigate the impact of various
FFE design parameters on the equalization performance. The following conclusions
are consequences of these simulations:
• First-order delays with pole and zero offsets less than one have limited equal-
ization performance degradation as compared to Pade delays for 5-tap FFEs.
• A 5-tap FFE provides robustness to delay time and channel characteristics.
• The parasitic pole frequency at each node in the FFE should exceed 20 GHz.
4.9. SUMMARY 51
• A coefficient resolution of 5 bits plus sign is sufficient to achieve the maximum
FFE performance with some design margin.
• A delay time of 25 ps is a reasonable design choice for < 1 m PCB traces in FR4
dielectric.
These results are used to guide the architecture choices in chapter 5 and the design
decisions in chapter 6.
Chapter 5
Inverter-based FFE
5.1 Analog-inverter transconductor
The analog-inverter transconductor is the fundamental building block for the inverter-
based FFE. As depicted in figure 5.1, the circuit architecture of the inverter transcon-
ductor is identical to that of a digital inverter, but the operating point is constrained
to the saturation region in the small-signal range about mid-supply. In this region,
the block behaves as a linear transconductor with
io = Gmvi (5.1)
where the total transconductance is a sum of the NMOS and PMOS small-signal
transconductance:
Gm = gm,n + gm,p. (5.2)
Due to the class AB operation, this transconductor is efficient in terms of noise, power,
and bandwidth while maintaining reasonable linearity performance.
This transconductor is a versatile building block and can be used to achieve many
functions (see figure 5.2) [21]. A simple configuration is the unity-gain stage as shown
52
5.1. ANALOG-INVERTER TRANSCONDUCTOR 53
vi vo
io = Gmvi
gm,n
gm,p
Gm = gm,n + gm,p
vi Gm vo
(b)(a)
Figure 5.1: (a) The analog-invertertransconductor and (b) the associatedtransistor-level schematic diagram.
vi voGm Gm
(a) Unity Gain (b) Summing
(c) Coefficient (d) Delay
vi vo
C
Gm Gm
vi1 voGm Gm
Gmvi2
vo = vi vo = vi1 + vi2
Gm
aGm
vo = avi vo = Dpd1(s) vi
Figure 5.2: Example circuits using theanalog-inverter transconductor.
in figure 5.2(a) where the self-biased load transconductor behaves as a small-signal
resistive load. This stage is modified to realized the FFE summing circuit (see figure
5.2(b)), coefficients (see figure 5.2(c)), and first-order Pade delays (see figure 5.2(d)).
The capacitor in the delay implements both the pole and the zero and the transfer
function in the absence of parasitics is
Dpd1(s) =1− s C
Gm
1 + s CGm
(5.3)
where
τ = 2C
Gm
. (5.4)
In reality, the transconductor has non-zero output conductance
Go = gds,n + gds,p, (5.5)
input capacitance1
Ci = Cgs,n + Cgs,p, (5.6)
1The miller capacitance due to Cgd is dependent on the transconductor load and, therefore, it isnot considered here. For the unity-gain configuration, a capacitance of 2(Cgd,n + Cgd,p) should beadded to the input capacitance.
54 CHAPTER 5. INVERTER-BASED FFE
and output capacitance
Co = Cdg,n + Cdb,n + Cdg,p + Cdb,p. (5.7)
The intrinsic gain of the transconductor is defined as
Ai =Gm
Go
=gm,n
gds,n
θi +gm,p
gds,p
(1− θi) (5.8)
where we defined the interpolation parameter
θi =gds,n
gds,n + gds,p
≤ 1. (5.9)
The intrinsic gain of the transconductor is an interpolation between the intrinsic gains
of its transistors.
The gain stage in figure 5.2(a) is a building block common to all the blocks in the
inverter-based FFE, and its performance determines the overall performance of the
FFE. Therefore, we investigate the details of this circuit in §5.2.
5.2 Unity-gain stage
Figure 5.3 shows the schematic diagram of the unity-gain stage with the parasitics
explicitly represented. The total parasitic output conductance is
Go,tot = 2Go (5.10)
5.2. UNITY-GAIN STAGE 55
Go,totCi Co,tot
GmGmvi vo
Figure 5.3: Schematic diagram of the unity-gain stage with parasitics.
and the output capacitance is
Co,tot = Ci + 2Co. (5.11)
In this section, we investigate the behavior of this circuit to aid in understanding the
performance of the overall FFE in §5.6. First, we look at the linear behavior: the
gain in §5.2.1 and bandwidth in §5.2.2. Next, we investigate the statistical behavior:
the noise in §5.2.3 and mismatch in eq. (5.25). Finally, we give an overview of the
nonlinear behavior: supply rejection in §5.2.5 and nonlinearity in §5.2.6.
5.2.1 Gain
The small-signal gain is
A =vo
vi
=Gm
Gm + 2Go
=1
1 + 2A−1i
. (5.12)
The finite Ai due to the non-zero Go reduces the gain below unity. To compensate
for this, the load transconductor is scaled by the factor
β =Ai − 1
Ai + 1. (5.13)
56 CHAPTER 5. INVERTER-BASED FFE
For the TSMC40 GP process with minimum length devices, Ai ≈ 6 which corresponds
to β = 0.71. This scale-factor can be achieved through various means. One option
is to reduce the load device widths, but this reduces the matching between the input
and load as well as complicates the layout. A more robust method is to use a tunable
load transconductor controlled by a calibration circuit, which is discussed in §6.1.
5.2.2 Bandwidth
Maximizing the bandwidth is crucial to maximizing FFE performance potential, as
demonstrated in §4.6. For optimal FFE performance, the bandwidth of each inter-
nal node should be greater than 20 GHz, but using inductors to resonate parasitic
capacitance results in a substantial area penalty (see [11]). The bandwidth of the
unity-gain stage is fixed by the transit frequency of the process, achieving a wide
bandwidth without the use of inductors.
The unloaded bandwidth of the gain stage is
ωc =Gm + 2Go
Ci + 2Co
≈ gm,n + gm,p
Cgs,n + Cgs,p
= θcωT,n + (1− θc)ωT,p (5.14)
where we defined the interpolation parameter
θc =Cgs,n
Cgs,n + Cgs,p
≤ 1 (5.15)
5.2. UNITY-GAIN STAGE 57
and the NMOS and PMOS transit frequencies
ωT,n =gm,n
Cgs,n
(5.16)
ωT,p =gm,p
Cgs,p
. (5.17)
The unloaded bandwidth is an interpolation between the transit frequencies of the
transistors which are set by the gmID
factors. These factors are fixed by the VGS − Vtwhich is itself fixed by the supply voltage. Therefore, little can be done to increase the
unloaded bandwidth. In a practical implementation, the gain stage is always loaded
by the capacitance of the following stage. In the limiting case where the capacitance
is much greater than Co,tot, the bandwidth is
ωc =Gm
CL
=
(gm,n
IDS,n
+gm,p
ISD,p
)C−1
L
ITOT
2(5.18)
where increasing the power increases the speed. For this FFE design, the actual
bandwidth is dependent on both the self-loaded capacitance and the next-stage ca-
pacitance. The result is a trade-off between power and bandwidth at each node in
the FFE, which is discussed in more detail in §5.3.1.
5.2.3 Noise
Assuming, for simplicity, that the output conductance is zero, the spot noise power
is
v2n,o
∆f=v2
n,i
∆f= 2γeff4kTG−1
m
= 2v2
n
∆f(5.19)
58 CHAPTER 5. INVERTER-BASED FFE
where we defined the effective transconductor gamma factor
γeff = θnγn + (1− θn)γp, (5.20)
the interpolation parameter
θn =gm,n
gm,n + gm,p
≤ 1, (5.21)
and the spot noise power of a single transconductor
v2n
∆f= γeff4kTG−1
m . (5.22)
The effective gamma factor is an interpolation between the NMOS and PMOS gamma
factors. To reduce the noise it is necessary to increase the transconductance. The
gmID
factors are fixed by the supply voltage and the device width ratio is fixed by the
common mode. As a result, the only way to reduce the noise is to increase the power
by scaling the NMOS and PMOS devices proportionally.
The total noise accumulates through the stages of the FFE. Depending on the
coefficient values, the noise can add constructively, compounding the problem. This
is discussed in §5.6.1.
5.2.4 Mismatch
The threshold voltage mismatch for the NMOS and PMOS transistor are [22]
σp =AVT,p√WpLp
(5.23)
σn =AVT,n√WnLn
. (5.24)
5.2. UNITY-GAIN STAGE 59
The total mismatch of the unity-gain stage is
σtot =
√1
2σ2
p +1
2σ2
n. (5.25)
Mismatch between the positive and negative half-circuits limits the effectiveness of the
pseudo-differential implementation at combating the supply noise and second-order
nonlinearities. The total mismatch accumulates through the stages of the FFE. De-
pending on the coefficient values, the mismatch can add constructively, compounding
the problem. This is discussed in §5.6.2.
5.2.5 Supply rejection
In appendix F.1, we show that the single-ended supply rejection of the gain stage is
vo
vdd
≈ 2gm,p
gm,p + gm,n
≈ 1 (5.26)
which shows that the supply noise passes unimpeded to output. To reduce the impact
of this issue, the final FFE is implemented pseudo-differentially, which rejects the
single-ended supply noise to the extent that the circuit is balanced.
A fully-balanced pseudo-differential circuit will completely reject first-order supply
noise, but this does not mean it is completely immune to supply noise. There is a
second-order effect in which the supply noise is mixed with the input signal. The
details are discussed in appendix F.2, but the result is
vod = vid + a11vidvdd (5.27)
60 CHAPTER 5. INVERTER-BASED FFE
where the conversion gain, a11, is approximately
a11 ≈ −g
(p)20
g(p)10
≈ −1
4
(gm,p
ISD,p
). (5.28)
The last expression is a square-law approximation, but gives some intuition as to the
magnitude of the conversion gain. The extracted factor is about 2× this result, but
the linear dependence on the gmID
factor is reasonably accurate.
Due to the lack of first-order supply rejection in this circuit and the limitations
of the pseudo-differential implementation, some additional supply rejection circuitry
such as an LDO should be considered for a robust implementation. We did not
implement this in the proof-of-concept design due to time constraints.
5.2.6 Nonlinearity
In appendix E, we show that the nonlinearity of the gain stage can be modeled as a
Taylor series and is dominated by the first few terms
vo ≈ a1vi + a2v2i + a3v
3i (5.29)
where
a2 ≈G20
G10
(1 + β) (5.30)
a3 ≈G30
G10
(1− β) +G21
G10
(1 + β) + a2G20
G10
(2β) (5.31)
and
Gjk = g(n)jk − (−1)j+kg
(p)jk . (5.32)
5.3. DELAY 61
vi vo
C
Gm Gm
Figure 5.4: Schematic diagram of the inverter-based first-order Pade delay.
The terms g(n)jk and g
(p)jk are the Taylor series coefficients for the NMOS and PMOS
transistors as defined in (E.2) and (E.1), respectively. The second-order nonlinearity
is reduced by the pseudo-differential implementation and the third-order term, a3,
dominates.
5.3 Delay
5.3.1 Single-path Pade-inspired delay
As shown in §4.3, the first-order Pade delay provides a good trade-off between FFE
performance and delay complexity. Figure 5.4 shows the schematic diagram of the
inverter-based implementation with the transfer function
D(s) = −Dpd1(s) = −1− s C
Gm
1 + s CGm
(5.33)
where
τ = 2C
Gm
. (5.34)
This delay realization combines the power, noise, and bandwidth performance of the
inverter transconductor with simplicity and equalization performance of the first-order
Pade delay. The limitation is revealed by analyzing a cascade of two delays. The delay
62 CHAPTER 5. INVERTER-BASED FFE
vo
C
Gm Gmvi buffer
Figure 5.5: Block diagram of the buffered inverter-based first-order Pade delay.
input impedance is determined by the Miller capacitor and can be expressed as
Zin =1
sC (1 +Dpd1(s)). (5.35)
As a consequence, the transfer function of the first delay is disrupted and has the
formvo1
vi1
= −1− s C
Gm
1 + s CGm
(2 +Dpd1(s)). (5.36)
One solution is to insert a buffer before each delay to isolate the input impedance as
shown in figure 5.5, but this adds additional power and the buffer capacitance loads
the previous delay output. Therefore, the impact of the buffer stage needs to be
carefully considered. Using a scaled replica of the gain stage in the delay as a buffer
with a scale factor β (see figure 5.6), the delay transfer function becomes
D(s) =1− s C
Gm
1 + s CGm
(1 + 2β−1). (5.37)
In the limit where β →∞, the delay behaves like a first-order Pade delay. In general,
the transfer function represents a first-order delay as defined in (3.19) with pole and
zero ratio
α =1
(1 + 2β−1), (5.38)
5.3. DELAY 63
τ factor from the delay definition
τ = 2(1 + 2β−1)C
Gm
, (5.39)
and group delay
τg = (2 + 2β−1)C
Gm
. (5.40)
In §3.2.1, we show that any pole and zero ratio can be absorbed into the FFE coeffi-
cient at a cost of coefficient spread. For the case with equal power in the buffer and
delay where β = 1 (i.e. α = 13), the coefficient spread penalty bound for a 5-tap FFE
is (see §3.2.2) ∥∥∥M13
∥∥∥2
= 10.049. (5.41)
The practical trade-off between α and FFE performance is simulated in §4.4.3. For
the 5-tap FFE and α = 13, the FFE performance degradation is just 13 %, which is
an acceptable penalty to pay in trade for the simplicity of this delay implementation.
Additionally, β = 1 is a practical design choice in that the gain stage in the delay is
replicated exactly for the buffer, simplifying the design. For these reasons, we chose
β = 1 for the delays in this work as depicted in figure 5.7(a).
The finite output conductance of the inverter transconductors will limit the gain
and must be addressed. This topic is covered in ??.
β β 1 1
C
vovi
Figure 5.6: Schematic diagram of the buffered inverter-based first-order delay.
64 CHAPTER 5. INVERTER-BASED FFE
Gm
C
vovi
(a) single-path delay (this work)
Gm Gm Gm
(b) two-path delay
vi GmGm Gm Gm vo
Gm
Gm
C
Figure 5.7: Schematic diagrams of (a) the single-path Pade-inspired delay of this workand (b) the two-path Pade delay [3].
5.3.2 Comparison with two-path Pade delay
Figure 5.7(b) shows a two-path implementation of the Pade delay which is an alter-
native to the delay introduced in this work [3, 20]. The two-path delay achieves the
Pade response by subtracting the input from a first-order pole with a gain of two
resulting in
D2p(s) =2
1 + s CGm
− 1 = Dpd1(s). (5.42)
A consequence of this approach is that signal is created and then destroyed, wasting
power while introducing noise and nonlinearity. The total spot noise of this two-path
delay is
v2n,2p
∆f= 4kT (6γeff)G−1
m (5.43)
5.4. COEFFICIENTS 65
vip
vim
5 bits
5 bits
(from other
half circuit)
vo
Figure 5.8: Half-circuit schematic diagram of the inverter-based coefficient.
which is 1.5× the single-path delay and has the total spot noise
v2n,1p
∆f= 4kT (4γeff)G−1
m . (5.44)
When the 1.5× power of this two-path delay is additionally accounted for, this single-
path delay is 2.25×more efficient in terms of noise for the same power. In addition, for
the same power (i.e. each Gm in the two-path scaled down by 1.5×), the single-path
delay has a stronger output drive by a factor of 1.5×. These benefits in performance
come with only a modest cost in FFE performance due to the pole and zero offset,
and account for the FFE performance improvement as compared to [3] (see §7.4 for
comparison of the measured results).
5.4 Coefficients
Figure 5.8 shows the half-circuit schematic diagram of the inverter-based coefficient.
The pseudo-differential implementation is exploited to realize negative coefficients by
feeding the signal from the negative half-circuit to the positive coefficient. The input
transconductors are a set of binary-weighted minimum-sized inverters that can be
switched in or out to achieve the desired coefficient gain. Using inverter transcon-
ductors in the coefficients and summing circuit (see §6.3) results in a ratiometric
66 CHAPTER 5. INVERTER-BASED FFE
½×
1×
2×
1×
½×
1×
vi2
vi1
vi3
vi4
vi5
vo
Figure 5.9: Half-circuit schematic diagram of the inverter-based summing circuit.
FFE design where the common mode at each stage is common to the entire FFE.
In addition, the benefits of the inverter transconductor and gain stage are imparted
on the total FFE. The impact of coefficient resolution is investigated in §4.7. The
conclusion is that 5 bit plus sign resolution is sufficient to ensure an optimal solution
is achievable with margin. We use this margin to absorb for the coefficient spread
introduced by the pole and zero offset in the delay.
5.5 Summing circuit
The half-circuit schematic diagram of the inverter-based summing circuit is shown in
figure 5.9. The use of the inverter transconductors in the summing circuit completes
5.6. FULL FFE 67
the power-efficient ratiometric FFE design. The transconductors are scaled to antici-
pate the relative magnitudes of the coefficients to maximize the dynamic range of the
coefficient resolution. The total output conductance is
GL,tot = 6Go +Gm (5.45)
and the gain for the 1× transconductor is
vo
vi2
=1
1 + 6Ai
. (5.46)
Because the intrinsic gain of the transconductor is close to 6, the approximate ex-
pression for the output voltage is
vo =1
4vi1 +
1
2vi2 + vi3 +
1
2vi4 +
1
4vi5. (5.47)
The main tap is vi2 and the first post-cursor tap, vi3, is 2× larger. This is due to the
coefficient transformation required to absorb the pole and zero offset in the delay as
determined from the analysis in §3.2.2.
5.6 Full FFE
Figure 5.10 shows the half-circuit schematic diagram of the complete inverter-based
FFE. It is constructed entirely from the inverter transconductors gain stage. There-
fore, the total FFE performance is derived from the performance equations of the
gain stage outlined in §5.2.
68 CHAPTER 5. INVERTER-BASED FFE
5.6.1 FFE noise
The dominant noise source of the FFE is the delays. Due to the multi-path nature
inherent to the FFE architecture, the noise from the delays see multiple paths to the
outputs and can add constructively or destructively. The worst case is for the first
delay whose FFE output noise contribution is
v2n,o,d1
∆f= (a2 + a3 + a4 + a5)2 4
v2n
∆f(5.48)
where ak is the gain of the k coefficient path including the scale factor of the summing
circuit and v2n∆f
is defined in (5.22). In the worst case, which occurs for maximum
½×
1×
2×
1×
½×
1×
5+1 bits
delay
cell
delay
cell
delay
cell
delay
cell
5+1 bits
5+1 bits
5+1 bits
5+1 bits
vo
vi
Figure 5.10: Half-circuit schematic diagram of the inverter-based FFE.
5.7. SUMMARY 69
coefficient values and equal signs,
v2n,o,d1
∆f=
81
4
v2n
∆f. (5.49)
This is nearly 3× the total contribution of all the coefficients and the entire summing
circuit combined. For typical coefficient values, this factor is reduced, but the delays
remain the primary contributor to FFE noise and power must be invested to mitigate
this problem. This is discussed in §6.1 while covering the transistor-level design of
the delays.
5.6.2 FFE mismatch
Mismatch is a statistical process and its behavior in the complete FFE is similar to
the noise described in §5.6.1. The first delay has the largest contribution to the FFE
offset, which can be represented as
σo,d1 = (a2 + a3 + a4 + a5)σtot (5.50)
where σtot is the total mismatch of the gain stage defined in (5.25). As a consequence,
the mismatch in the delays is the primary contributor to FFE offset. This is discussed
in §6.1 while covering the transistor-level design of the delays. Additionally, the
tuning for the PVT variations of the delays is performed by a switched-capacitor
circuit which is introduced in appendix G.
5.7 Summary
In this chapter, we introduced the single-path Pade-inspired delay and the inverter-
based FFE. The use of analog-inverter transconductors throughout the FFE results
70 CHAPTER 5. INVERTER-BASED FFE
in a power and noise efficient design with a high bandwidth without the use of area-
intensive inductors. The single-path delay provides a 2.25× improvement in power
efficiency as compared to the popular two-path Pade implementation [3]. This im-
provement translates into a substantial increase in power efficiency of the total FFE
which is dominated by the noise of its delay elements. The measurement results in
chapter 7 support this analysis. The design of the proof-of-concept inverter-based
FFE is covered in chapter 6.
Chapter 6
FFE design
6.1 Delay
The single-path Pade-inspired delay used in this proof-of-concept FFE design was in-
troduced in §5.3.1 where the non-zero output conductance was ignored to simplify the
discussion. This is addressed now by degenerating the load transconductors with tri-
ode devices as shown in figure 6.1. This reduces the load conductance to compensate
for the non-zero output conductance of the transconductors. The gates of the triode
devices can be tied to supply and ground with the residual gain error being absorbed
into the coefficients. A more robust solution is to tune the triode gate voltages to
adjust for gain and common mode PVT variations as discussed in appendix G.
Low-Vt minimum-length devices are used in the delay to maximize the bandwidth.
The device PMOS/NMOS width ratio is chosen to set the common mode at VDD/2 =
0.5 V which is approximately 2× for this process. The remaining degree of freedom in
the device widths determines the power dissipation and the output noise. The delays
are sized such that the total FFE output noise for the worst-case coefficients is limited
to 1 mVRMS. To achieve this, the NMOS widths are Wn = 2×1.5 µm with the PMOS
widths Wp = 4× 1.5 µm. In post-layout simulations, the transconductance is 6.5 mS
71
72 CHAPTER 6. FFE DESIGN
C
Gm GmGmGm vovi
Figure 6.1: The single-path Pade-inspired delay schematic diagram with triode-degenerated load transconductor.
with a current of 430 µA per input transconductor and 350 µA per degenerated load
transconductor for a total half-circuit current of 1.56 mA per delay. The half-circuit
spot noise isvn,d√∆f
= 2.5nV√Hz. (6.1)
To achieve the target delay time of 25 ps as determined in §4.4, the feedthrough
capacitor is implemented as a 13 fF MOM capacitor. The total delay results from a
combination of this capacitor and the gate-drain parasitic capacitance.
6.2 Coefficients
The coefficients are realized as a set of binary-weighted unit-sized inverters as de-
scribed in §5.4. Low-Vt minimum-length devices are used to maximize the band-
width. For the unit inverter, the NMOS devices are sized Wn = 160 nm and the
PMOS devices are sized Wp = 360 nm so that the nominal common mode is half the
supply voltage. To implement the 5 bits plus sign resolution, 2× (25 − 1) = 62 unit
inverters are required. This results in a large capacitance that loads the delay output
and limits the bandwidth from reaching the 20 GHz target determined in §5.2.2. To
6.2. COEFFICIENTS 73
vi vo
8×
4×
2×
1×
b3
b4
b2
b1
b0
b0
8×
Figure 6.2: Half-circuit schematic digram for the reduced input capacitance 5-bitcoefficient.
minimize this capacitance, the least-significant bit (LSB) is implemented with series-
stacked transistors to effectively lengthen the device by a factor of two as depicted
in figure 6.2. With this modification, the new input capacitance is equivalent to 34
unit inverters which is a reduction of approximately 2×. This comes with a match-
ing penalty, but since it is limited to the LSB it does not substantially impact the
performance.
In post-layout simulations, the unity-gain configuration has the transconductance
Gm = 6.4 mS, which is similar to the transconductors in the delay and the summing
circuit. As a result, the bandwidth of delay driving the coefficient is nearly equal to
that of the coefficient driving the summing circuit, which suggests this is an optimal
design point. In post-layout RC-extracted simulations, the bandwidth exceeds 20 GHz
for the delay, coefficients, and summing circuit1, achieving the design target without
1The bandwidth of the coefficient driving the 2× summing circuit transconductor is 15 GHz, butthe simulation plotted in figure 4.9 suggests this should not be detrimental to the overall performance.
74 CHAPTER 6. FFE DESIGN
resorting to peaking inductors. The load device is made from eight unit-sized inverters
to compensate for the device output conductance. Gain errors for this stage are
absorbed into the coefficient values and require no tuning. The output noise in the
unity-gain configuration isvn,c√∆f
= 3.1nV√Hz
(6.2)
with a half-circuit power consumption of 700 µA.
6.3 Summing circuit
The schematic for the summing circuit is shown in figure 5.9. Low-Vt minimum-
length devices are used to maximize the bandwidth. For the 1× inverter as shown
in figure 5.9, the NMOS width is Wn = 4 × 0.75 µm and the PMOS width is Wp =
4×1.5 µm, which is equivalent to the sizing the delay. In post-layout simulations, the
unit transconductance is 6.2 mS with a current of 485 µA per transconductor. The
total current for the half-circuit summing circuit is 2.91 mA with an output noise of
vn,s√∆f
= 1.9nV√Hz. (6.3)
To compensate for the offset of the FFE an additional transconductor drives the
output node with an off-chip bias voltage setting the input. This transconductor can
source or sink current to adjust the common-mode voltage to trim the offset and set
the FFE output common mode to the desired level to drive the next stage.
6.4. PRBS GENERATOR 75
PRBS7
PRBS7
Phase
Aligner
MU
X
10 GHz
CLK3 bits
3 bitsLVDS Conversion
To
FFE...
Figure 6.3: Block diagram of the on-chip signal generator including a PRBS generatorand LVDS conversion stage.
6.4 PRBS generator
To aid in the testing process an on-chip signal generator was designed. It is con-
structed of a 20 Gb/s pseudo-random bit sequence (PRBS) generator and a low-
voltage differential signal (LVDS) conversion stage as shown in figure 6.3. To achieve
this high data rate, the PRBS generator is built from two (27 − 1) 10 Gb/s PRBS
generators muxed together to form a single 20 Gb/s (27−1) PRBS signal. This signal
is phase-aligned by a weakly coupled inverter chain. The signal is converted to LVDS
using a weak inverter driving a strong self-biased inverter to limit the swing. The
amplitude is configured by the value of the 3-bit load transconductor. The LVDS
conversion inverters are sized similar to the core FFE devices, so the FFE input com-
mon mode is achieved by design. In addition, the PRBS generators can be configured
in a pulse mode to aid in testing and coefficient optimization.
6.5 Output driver
The output drivers are constructed from the same unit inverter transconductor as in
the summing circuit and delays, sharing the common mode with the FFE. A 50 Ω
76 CHAPTER 6. FFE DESIGN
resistive load is formed with 100 Ω poly resistors from the output to supply and
ground (see 6.4). The load capacitance is dominated by the pads and oscilloscope
input. The bandwidth is therefore fixed by the resulting RC time-constant. The
transconductance is chosen to achieve a nominal gain of unity. The devices are sized
Wn = 14× 0.75 µm and Wp = 14× 1.5 µm.
vi voGm
100
100
Figure 6.4: Half-circuit schematic diagram of the output driver.
Chapter 7
Measurement results
7.1 Test setup
Figure 7.1 shows the die photo of the proof-of-concept integrated circuit (IC) fab-
ricated in the TSMC40 GP process with an FFE area of only 0.003 mm2. The
high-speed inputs and outputs are probed through single-ended ground-signal-ground
(GSG) pads and differential ground-signal-signal-ground (GSSG) pads. The low-
frequency signals (i.e. voltage supply, voltage references, and digital I/O) are bonded
directly to the test PCB via the chip-on-board method as shown in figure 7.2. The
PCB has a low-speed digital interface to the NanoRiver Miniboard GPIO card for scan
chain read and write to set the FFE coefficient values and on-chip signal generator
amplitude. The analog reference voltages Vbn and Vbp (see appendix G) can be set ex-
ternally by on-board DACs (TI DAC128S085). Alternatively, the switched-capacitor
bias circuit can be enabled to set the bias with an externally provided 500 kHz clock
source. Unfortunately, measurement issues limited a complete physical verification
of the switched-capacitor bias circuit (see appendix G). As such, the measurement
results in this chapter are for the delay configured with fixed ground and supply
bias voltages as in figure 6.1. The NanoRiver Miniboard is connected by USB to
77
78 CHAPTER 7. MEASUREMENT RESULTS
G
S
G
G
S
G
G
S
S
G
S
S
G
FFE CH
PRBS
FFE CHPRBS
FFE area 0.003 mm2
FFE EQ
G
S
G
S
S
S
S
S
S
G G
G
GG
PRBS
t-line
FFE EQ
Figure 7.1: Die photo of the proof-of-concept IC fabricated in the TSMC40 GPprocess.
a computer and controlled through Python and MATLAB scripts. The oscilloscope
data is retrieved with the Keysight IVI drivers and the python-ivi library. Additional
communication with the scope and other test equipment is achieved with the VISA
protocol with the PyVISA library. There are multiple signal paths to facilitate the
testing process, which are outlined in the following sections.
7.1.1 On-chip channel
The on-chip channel signal path is shown in figure 7.3(a). The test equipment used
for these measurements is outlined in table 7.1. The bit error rate tester (BERT) is
configured as a 10 GHz clock and probed through GSG pads to the on-chip 20 Gb/s
signal generator. There are two FFEs in the signal path. The first is configured to
emulate the channel and is referred to as the channel FFE. The second is configured
to equalize for the channel by the adaptation of its coefficients and is referred to as the
equalizer FFE. The pseudo-differential FFE output is driven off chip through GSSG
7.1. TEST SETUP 79
chip-on-boardbonded
Figure 7.2: Test PCB photo and (inset) chip-on-board bonding.
pads and dc blocking capacitors to the high-speed oscilloscope. There is a second
signal path shown in figure 7.3(b) with the equalizer FFE removed to measure the
channel response. The benefit of the on-chip channel measurement setup is in that
the total impact of the test system is constant between the channel and channel+FFE
measurements. Therefore, the FFE response can be accurately separated from the rest
of the test system. Another benefit is that the signal generation can be performed on
chip with only a clock input, reducing the complexity of the required test equipment.
A drawback is the limited insertion loss introduced by the channel FFE, as is detailed
in the measurement results in §7.3.3.
To adapt the coefficients, the PRBS generator in figure 7.3(a) is configured in pulse
Table 7.1: Test equipment for the measurements with the on-chip channel.
Use Equipment
Oscilloscope Keysight DSA-X 643AClock Generator Keysight N4872A
Input Probe Infinity 40 GHz GSGOutput Probe Dual Infinity 40 GHz GSSGGPIO Card NanoRiver Miniboard
80 CHAPTER 7. MEASUREMENT RESULTS
PRBSFFE
CH
FFE
EQ
PRBSFFE
CH
FFE
EQ
0.5m FR4 PCB Traces
10 GHz
CLK
10 GHz
CLK
From
BERT
To Scope
To Scope
To Scope
(a) On-chip PRBS, channel, and equalizer.
(b) On-chip PRBS and channel.
(c) Off-chip PRBS and channel.
Figure 7.3: Test signal paths.
mode. The pulse passes through the channel FFE and through a single coefficient path
in the equalizer FFE by setting all other coefficients to zero. This pulse response is
measured on the oscilloscope and the process is repeated for all five coefficient paths
to generate a family of pulses. These pulses are processed in MATLAB through
the brute-force optimization process outlined in appendix A.2 to find the optimal
coefficient values. These values are then programmed into the equalizer FFE on the
IC to measure the equalized pulse and equalized PRBS data. A coordinate descent
optimization of the coefficients was performed on the measured output to verify that
the local optimum was found, but this did not significantly improve the performance.
The channel FFE pulse and PRBS data are measured from the signal path in figure
7.3(b) to characterize the ISI reduction and DR improvement due to the FFE. The
measurement results from this test are presented in §7.3.3 and §7.3.4.
7.1. TEST SETUP 81
7.1.2 Off-chip channel
The off-chip channel signal path is shown in figure 7.3(c). The test equipment used
for these measurements is outlined in table 7.2. A BERTScope is used as a signal
generator for pulse and PRBS signals. This signal is then passed through the channel
which is a differential 0.5 m FR4 PCB trace. The channel output is probed on the chip
and passes through a short 50 Ω terminated on-chip differential transmission line to
the FFE input. The pseudo-differential FFE output is driven off chip through GSSG
pads and dc blocking capacitors to the high-speed oscilloscope. The off-chip channel
measurements allow for more accurate channel responses with high insertion loss, but
require a high-speed signal generator and channel. Also, only the off-chip channel
losses can be characterized with the on-chip transmission line losses unknown.
To adapt the coefficients, the BERT referred to in figure 7.3(c) is configured in
pulse mode. The pulse passes through the channel and through a single coefficient
path in the FFE by setting all other coefficients to zero. This pulse response is measure
on the oscilloscope and the process is repeated for all five coefficient paths to generate
a family of pulses. These pulses are processed in MATLAB through the brute-force
optimization process outlined in appendix A.2 to find the optimal coefficient values.
These values are then programmed into the FFE on the IC to measure the equalized
pulse and equalized PRBS data. A coordinate descent optimization of the coefficients
was performed on the measured output to verify that the local optimum was found,
Table 7.2: Test equipment for the measurements with the off-chip channel.
Use Equipment
Oscilloscope Keysight DSA-X 93204APRBS Generator Tektronix BSA286C
Input Probe Dual Infinity 40 GHz GSSGOutput Probe Dual Infinity 40 GHz GSSGGPIO Card NanoRiver Miniboard
82 CHAPTER 7. MEASUREMENT RESULTS
but this did not significantly improve the performance. The channel pulse and PRBS
data are measured by bypassing the chip and connecting the channel output directly to
the scope. Additionally, an estimate of the channel response including the effect of the
on-chip transmission line can be made by configuring the first coefficient path to unity,
bypassing all the delays. The high-frequency loss from the coefficient and output
buffer are small compared to the channel and add negligible ISI. The measurement
results from this test are presented in §7.3.1.
7.2 Test debug
The first chip revision had some bugs that prevented the full FFE verification. The
primary issue was an approximately 2× decrease in bandwidth caused by an off-
chip return-current path. The PRBS generator and output buffer were on separate
supplies with separate on-chip grounds. The grounds were connected off-chip on the
PCB, which put a bondwire and the PCB parasitics in the return current path at
the interface between these blocks. Figure 7.4 shows the measured eye diagram for
a 20 Gb/s PRBS signal for the first chip revision. Figure 7.5 shows the post-layout
0 10 20 30 40 50−0.15
−0.1
−0.05
0
0.05
0.1
0.15
Time (ps)
Voltage
(V)
Figure 7.4: Measured eye diagram fora 20 Gb/s PRBS signal for the firstchip revision.
0 10 20 30 40 50−0.15
−0.1
−0.05
0
0.05
0.1
0.15
Time (ps)
Voltage
(V)
Figure 7.5: Post-layout simulated eyediagram for a 20 Gb/s PRBS signalwith additional supply resistance.
7.3. MEASUREMENT RESULTS 83
simulated eye diagram for the same signal with additional supply resistance and bond
wires added. The performance degradation is similar, supporting the return current
path as the source of the bandwidth decrease. For this reason, a second chip revision
was fabricated with the primary change being a shared on-chip ground. The following
measurement results in this chapter are for this second chip revision.
7.3 Measurement results
7.3.1 Pulse responses and DR improvement
The normalized channel and equalized pulse responses are shown in figure 7.6 for
the off-chip 0.5 m FR4 PCB trace channel measurement detailed in §7.1.2. The main
cursor attenuation is not depicted and is 3.03× for this channel and set of coefficients.
The channel pulse has significant ISI and the PMR as defined in (2.26) is
PMRch = 3.87. (7.1)
With the optimal coefficient values, the FFE output pulse has the PMR
PMReq = 1.83 (7.2)
which corresponds to a DR improvement as defined in (2.27) of
DR improvement =PMRch
PMReq
= 2.11. (7.3)
The reduction in signal DR is illustrated by the plot in figure 7.7 which shows the
normalized PRBS responses generated from the associated pulse responses. Although
signals are scaled so that the main cursor amplitude is equal, the peak amplitude is
84 CHAPTER 7. MEASUREMENT RESULTS
0 100 200 300 400 500 600
0
0.2
0.4
0.6
0.8
1
ChannelEqualized
Time (ps)
Normalized
Voltage
Figure 7.6: Measured normalizedpulse response for the 0.5 m FR4 PCBtrace channel and the channel+FFE.
0 2 4 6 8−6
−4
−2
0
2
4
6
ChannelEqualized
Time (ns)
Normalized
Voltage
Figure 7.7: Normalized PRBS re-sponse generated from the pulse re-sponses in figure 7.6.
more than 2× lower for the signal equalized by the FFE. This DR improvement
corresponds to a relaxation of the ADC resolution by more than 1 bit. As a point
of reference, the 10 GS/s 6 bit ADC in [8] consumes 143 mW of power. This DR
improvement results in a reduction of the ADC power by more than 2× for a savings
of over 70 mW. For this configuration of coefficients, the FFE consumes only 23 mW,
resulting in a significant reduction in the total system power.
The channel length is limited by the maximum PCB trace length on the test
channel board used during testing. Post-layout simulation results suggest that the
performance increases for even longer PCB traces with higher channel attenuation.
7.3.2 Noise
The noise is measured using the Keysight DSA-X 643A oscilloscope output data.
The baseline noise power is first measured for the oscilloscope with a 50 Ω terminated
input. The FFE is then biased with a dc input common mode voltage and the output
captured with the oscilloscope. The noise power is calculated by subtracting the
baseline from this measurement. This is repeated for coefficient values from 0 to
7.3. MEASUREMENT RESULTS 85
5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
SimulationMeasurement
Coefficients Value
Noise
(mV
RMS)
Figure 7.8: Measured and simulated integrated noise voltage versus coefficient value.
31 (i.e. best-case to worst-case noise) and the result is plotted in figure 7.8. The
post-layout simulated noise is plotted for comparison and it is in good agreement
with the measured data. The baseline oscilloscope noise is 0.83 mVRMS. With the
basline oscilloscope noise power subtracted, the FFE noise is calculated to be between
0.3 mVRMS and 0.62 mVRMS for all coefficient values. Assuming a conservative system
bandwidth of 5 GHz, this corresponds to spot noise in the range from 4.2 nV√Hz
to
7.7 nV√Hz
.
7.3.3 Eye diagrams
Although it is not the objective of the FFE in this work to completely equalize the
channel, it is nonetheless interesting to investigate the performance for such a scenario.
To do this, the measurement setup with the on-chip signal generator and channel is
used as detailed in §7.1.1. Figure 7.9 shows the measured eye diagrams for 16 Gb/s
and 20 Gb/s PRBS data after the channel FFE and after the equalizer FFE.
For both data rates, the eye is closed after the channel and the FFE is able to open
the eye, demonstrating the FFE performs up to 20 Gb/s. The measured channel and
equalized pulse response for the 20 Gb/s data rate are shown in figure 7.10. There
86 CHAPTER 7. MEASUREMENT RESULTS
(a) 16 Gb/s after channel FFE
50 mV/div, 10ps/div
50 mV/div, 10ps/div
(c) 20 Gb/s after channel FFE
7.5 mV/div, 10ps/div
7.5 mV/div, 10ps/div
(b) 16 Gb/s after equalizer FFE
(d) 20 Gb/s after equalizer FFE
Figure 7.9: Eye diagrams for the on-chip channel measurements.
is a significant reduction in the first pre-cursor and post-cursor ISI terms which is
responsible for the eye opening in figure 7.9(d). The PMR of the channel pulse is
PMRch = 2.30 (7.4)
and the FFE output pulse has the PMR
PMReq = 1.62 (7.5)
which corresponds to a DR improvement of
DR improvement =PMRch
PMReq
= 1.42. (7.6)
Figure 7.11 shows the corresponding measured normalized PRBS response with and
without the FFE equalization which illustrates this DR improvement. This corre-
sponds to less than a 0.5 bit ADC resolution relaxation. Because the channel has
7.3. MEASUREMENT RESULTS 87
100 200 300 400 500
−0.2
0
0.2
0.4
0.6
0.8
1
ChannelEqualized
Time (ps)
Normalized
Voltage
Figure 7.10: Measured normalizedpulse response with and without theFFE equalization.
0 2 4 6 8 10 12−3
−2
−1
0
1
2
3
ChannelEqualized
Time (ns)
Normalized
Voltage
Figure 7.11: Measured normalizedPRBS response with and without theFFE equalization.
limited ISI there is less to equalize for so the FFE is less effective. Furthermore, the
main cursor attenuation is not shown in these normalized plots and it is nearly 7×.
As a result, the FFE is best utilized in systems with significant ISI.
7.3.4 LMS system identification method
Theory
In order to characterize the frequency domain performance and linearity of the FFE,
system identification in the form of the least mean squares (LMS) algorithm is used
[23]. The block diagram in figure 7.12 shows the LMS algorithm adapted for system
identification in this application. The primary challenge is that the input sequence
is only partially known. Although the PRBS sequence is known, imbalanced duty
cycle can introduce additional nonlinearity, adding to the nonlinearity of the FFE.
Therefore, the duty cycle is an additional parameter to be optimized along with
the filter coefficients. With this accounted for, the filter coefficients converge to the
impulse response of the system and the residual signal is the portion that can not be
captured by a linear filter and is defined as the distortion signal. The ratio of the
88 CHAPTER 7. MEASUREMENT RESULTS
h [n] h[n]adaptive
filter
FFE
response
distortion
partially
known input
measured
signal
Figure 7.12: Block diagram of the LMS algorithm system identification.
linear signal power to the distortion signal power is defined as the signal-to-distortion
ratio (SDR). From the impulse response, we can also plot the frequency response to
characterize the equalization performance in the frequency domain.
Linear response
Figure 7.13 shows the identified impulse response comparing bench and post-layout
simulation results for the on-chip channel signal path in figure 7.3(b). Similarly,
figure 7.14 shows the identified impulse response comparing bench and post-layout
simulation results for the equalized on-chip channel signal path in figure 7.3(a). The
equalization performance is readily apparent in that the equalized impulse response
is approaching a Dirac delta for the baud samples. The measurement results are in
good agreement with the simulated results.
Figure 7.15 shows the normalized magnitude response for the measured channel
and equalized signals. The equalization of the high-frequency loss is clearly visible
from this plot. The post-layout simulated normalized magnitude responses shown in
figure 7.16 are in excellent agreement with the measured data.
7.3. MEASUREMENT RESULTS 89
0 100 200 300 400 500−5
0
5
10
15
20x 10
−3
BenchSimulated
Time (ps)
ImpulseRespon
se
Figure 7.13: Impulse response for thebench and simulated channel responsefor the on-chip channel test.
0 100 200 300 400 500−5
0
5
10
15
20x 10
−3
BenchSimulated
Time (ps)
ImpulseRespon
se
Figure 7.14: Impulse response forthe bench and simulated equalized re-sponse for the on-chip channel test.
Nonlinearity
Because this is a non-conventional linearity measurement, it requires careful consid-
eration. To support the measurement result, we compare it to simulation both with
conventional methods using a sinusoid input and the LMS residual method using a
PRBS input. Based on the analysis in appendix H and the statistical properties of
the input signal, the expected difference in SDR between the sinusoid and PRBS sig-
nals is 4.6 dB for a third-order dominated case and 7.7 dB for a fifth-order dominated
case. The actual difference is a sum of the contributions from all cases and is approxi-
mately 8 dB. The SDR for the sinusoid case calculated with the LMS method and the
conventional THD method match exactly, supporting this nonlinearity measurement
technique. For the measured data with the linear responses from the last section and
input signal variance 0.008 V2, the SDR is 35 dB. This is within 3 dB of the simulated
SDR. For this input signal level, the output SNR is approximately 30 dB, suggesting
that the optimal performance occurs for a larger input voltage.
90 CHAPTER 7. MEASUREMENT RESULTS
100
101
−30
−25
−20
−15
−10
−5
0
5
10
EqualizedChannel
Frequency (GHz)
Normalized
Magnitude(dB)
Figure 7.15: Measured frequency re-sponse of the channel and chan-nel+FFE for the on-chip channel test.
100
101
−30
−25
−20
−15
−10
−5
0
5
10
EqualizedChannel
Frequency (GHz)
Normalized
Magnitude(dB)
Figure 7.16: Simulated frequency re-sponse of the channel and chan-nel+FFE for the on-chip channel test.
10−3
10−2
30
35
40
45
50
55
60
65
70
Sine SimulationPRBS SimulationPRBS Measured
Input Signal Variance (V2)
SDR
(dB)
Figure 7.17: Signal distortion ratio versus input signal variance comparing sinusoidand PRBS signals.
7.4 Performance summary
Table 7.3 gives the performance summary of previous state-of-the-art FFE designs
and the design in this work for comparison [3, 11, 24]. This work achieved a 2×
reduction in power per tap while maintaining a competitive symbol rate. Due to
the omission of inductors in this design, the FFE area is just 0.003 mm2 which is
a significant improvement compared to previous designs. The noise is reduced by
7.4. PERFORMANCE SUMMARY 91
Table 7.3: Performance summary for state-of-the-art RX-FFEs.
This Work [24] [11] [3]
Power (mW) 20 to 26 80 90Taps 5 7 7Power/Tap (mW) 5.2 9.3 12.8Symbol Rate (GBd) 20 40 25Process 40 nm CMOS 65 nm CMOS 28 nm CMOSSupply Voltage (V) 1 1 1
Spot Noise (nV/√
Hz) 4.2 to 7.7 — 11.4 to 26.6SDR (dB) 35 — —Area (mm2) 0.003 0.75 0.085
almost 3× as compared to [3] with no noise numbers reported in [11]. The primary
sources of these improvements are attributed to the efficient single-path Pade-inspired
delay architecture as described in §5.3.2 and the efficiency of the analog inverter
transconductor as described in §5.1.
Chapter 8
Conclusions
8.1 Summary
As discussed in chapter 1 and chapter 2, an RX-FFE in ADC-based links can reduce
the required ADC resolution resulting in a substantial power reduction. In order to
obtain a net improvement for the system, the RX-FFE must be implemented with
low-power consumption, low noise, and small chip area. The greatest obstacle to
achieving these goals is in the design of the analog delay. In chapter 3, analog delays
for RX-FFEs were investigated and the equivalence of first-order delays was proven
for the application of RX-FFEs.
The design space for FFEs is high dimensional and includes the delay type, delay
time, number of taps, parasitic bandwidth, coefficient resolution, and main cursor
attenuation. The effect of these parameters on the performance in terms of signal
dynamic range reduction were investigated with MATLAB simulations in chapter 4
to guide the architecture choices in chapter 5 and design decisions in chapter 6.
The inverter-based FFE was introduced in chapter 5 along with the single-path
first-order Pade-inspired delay. The design equations of the FFE were covered to guide
92
8.2. FUTURE WORK 93
the practical design decisions in chapter 6. The design of the proof-of-concept inverter-
based FFE IC was discussed in chapter 6. A switched-capacitor tuning circuit was
introduced to tune for gain and common mode PVT variations of the delay element.
In chapter 7, the test and measurement results for the proof-of-concept FFE IC
were presented. The FFE was demonstrated to reduce the signal dynamic range by
2× for a 1 bit ADC resolution relaxation. The total power consumed was less than
26 mW with less than 0.62 mVRMS output noise for all coefficient values and an area of
only 0.003 mm2 in 40 nm CMOS. A technique to estimate the distortion with system
identification was discussed and the signal to distortion ratio was measured to be
35 dB.
8.2 Future work
There are multiple opportunities for future research on this topic. An obvious next
step would be the demonstration of this FFE in an ADC-based link receiver. This
would solidify the system level calculations by demonstrating the ADC relaxation
directly.
Another option is the demonstration of a more aggressive FFE design. The MAT-
LAB simulations in chapter 4 show that a substantial portion of the equalization
performance can be obtained with a 3-tap FFE with delay time tuning and 3-bit
coefficient resolution. This would further reduce the FFE power consumption and
noise, resulting in even greater system performance improvements if a competitive
ADC resolution relaxation can be achieved.
Additional work is necessary to debug the switched-capacitor bias circuit. In
addition to this, a possible improvement would be to tune for the delay time constant
instead of the gain. Coefficient adaptation techniques are a critical component to
enable the RX-FFE in commercial high-speed link receivers. Background calibration
94 CHAPTER 8. CONCLUSIONS
techniques could alleviate the need for delay time or gain tuning by tracking the PVT
variations and absorbing the changes into the coefficients.
Appendix A
FFE coefficient optimization
A.1 Problem formulation
Consider an n-tap FFE with the input pulse response pi(t) and delays with transfer
function D(s). The impulse response of a single delay is
d(t) = L−1 D(s) . (A.1)
The pulse response before delay k is defined to be pk(t) and it follows that
p1(t) = pi(t) (A.2)
pk(t) = pk−1(t) ∗ d(t). (A.3)
The pulses are sampled with period Ts to obtain the discrete sequences
pk[n] = pk(nTs). (A.4)
95
96 APPENDIX A. FFE COEFFICIENT OPTIMIZATION
If we elect tap 2 to be the main tap (i.e. c2 = 1) then the output of the FFE is
po[n] =n∑k=1
ckpk[n]
= p2[n] +∑k 6=2
ckpk[n]. (A.5)
This can be expressed in matrix notation as
po = Pc + p2 (A.6)
where we defined
P =[p1 p3 · · · pn
](A.7)
cT =[c1 c3 · · · cn
]. (A.8)
Adapting the expression for PMR in (2.26) to vector notation
PMR =‖po‖1
‖po‖∞
=‖Pc + p2‖1
‖Pc + p2‖∞(A.9)
where ‖ · ‖k represents the lk-norm. The expression in (A.9) is the objective function
that we want to minimize over the possible values of c (i.e. |ck| ≤ 1). The objective
function contains no penalty for main cursor attenuation, so it is useful to add a
constraint to account for this. The expression in the denominator of (A.9) represents
the main cursor amplitude and we can constrain this to be greater than a threshold.
A.2. BRUTE FORCE SOLUTION 97
Finally, we have the optimization problem
minimizec
‖Pc + p2‖1
‖Pc + p2‖∞subject to |ck| ≤ 1, k = 1, 3, . . . , n
‖Pc + p2‖∞ ≥ threshold.
(A.10)
The numerator and denominator of the objective function are convex, but the ratio
of two convex functions is not convex [25]. In addition, the final inequality constraint
is not convex. As a result, additional techniques are necessary to find the optimal
coefficients.
A.2 Brute force solution
One method to solve this problem is to leverage the fact that the coefficients are
quantized in practical FFE realizations so there is a finite set of possible coefficient
values. In this method, the objective function in (A.9) is evaluated for each valid
set of coefficients that satisfy the constraints. The following code is an example
implementation of this method in MATLAB.
Listing A.1: Brute force PMR optimization in MATLAB
function [ c opt ] = b r u t e f o r c e ( ps , N, th r e sho ld )
i f nargin < 3 ; th r e sho ld = 0 ; end ;
num taps = s ize ( ps , 2 ) ;
c i n i t = ones ( num taps−1, 1)∗(−N) ;
p2 = ps ( : , 2 ) ;
P = ps ( : , [ 1 , 3 : num taps ] ) ;
98 APPENDIX A. FFE COEFFICIENT OPTIMIZATION
pmr opt = i n f ;
c = nan ;
while ˜ i s e q u a l ( c , c i n i t )
i f isnan ( c ) ; c = c i n i t ; end ;
po = P∗c/N + p2 ;
i f max( po ) >= thre sho ld
pmr = sum(abs ( po ) )/max( po ) ;
i f pmr < pmr opt
pmr opt = pmr ;
c opt = [ c (1)/N; 1 ; c ( 2 : end)/N ] ;
end
end
c = g e t n e x t c (N, c ) ;
end
end
function next c = g e t n e x t c (N, c )
next c = c ;
for k = 1 : length ( c )
i f c ( k ) == N
next c ( k ) = −N;
else
next c ( k ) = c ( k ) + 1 ;
break ;
end
end
end
A.3. MATLAB OPTIMIZATION TOOLBOX 99
A.3 MATLAB optimization toolbox
Another method is to use constrained nonlinear optimization techniques. MATLAB
supports these methods through the optimization toolbox and the function fmincon().
Because the objective function is not convex in this case, the solution can converge
to a local optimum and is dependent on the initial conditions. The benefit is that
the solution is found much faster than the brute force method in appendix A.2. The
following code is the implementation of this method used in this work.
Listing A.2: Nonlinear PMR optimization in MATLAB with fmincon()
function c opt = optim pmr opt ( ps , thresho ld , x0 )
num taps = s ize ( ps , 2 ) ;
i f nargin < 3
x0 = zeros ( num taps−1, 1 ) ;
else
x0 = x0 ( [ 1 , 3 : num taps ] ) ;
end
Ap = ps ( : , [ 1 , 3 : num taps ] ) ;
bp = ps ( : , 2 ) ;
pmr = @(p) sum(abs (p ) )/max(p ) ;
c r e a t e p u l s e = @( x ) Ap∗x+bp ;
fun = @( x ) pmr( c r e a t e p u l s e ( x ) ) ;
f o r c e e q u a l = true ;
i f f o r c e e q u a l
nonlcon = @( x ) dea l ( [ ] , th r e sho ld − max( c r e a t e p u l s e ( x ) ) ) ;
else
100 APPENDIX A. FFE COEFFICIENT OPTIMIZATION
nonlcon = @( x ) dea l ( th r e sho ld − max( c r e a t e p u l s e ( x ) ) , [ ] ) ;
end
lb = −ones ( num taps−1, 1 ) ;
ub = +ones ( num taps−1, 1 ) ;
Aeq = [ ] ;
beq = [ ] ;
A = [ ] ;
b = [ ] ;
opt i ons = opt imset ( ’ MaxFunEvals ’ , 10000 , . . .
’ Algorithm ’ , ’ ac t ive−s e t ’ , . . .
’ Disp lay ’ , ’ none ’ ) ;
x = fmincon ( fun , x0 , A, b , Aeq , beq , . . .
lb , ub , nonlcon , opt ions ) ;
c opt = [ x ( 1 ) ; 1 ; x ( 2 : end ) ] ;
end
Appendix B
Equivalence of first-order delays in
FFEs
The objective of this appendix is to show that an N -tap FFE constructed with delays
of the form
Dα1(s) =1− 1
2α1sτ
1 + 12sτ
(B.1)
having the associated transfer function
Hα1(s) =N−1∑n=0
cnDnα1
(s) (B.2)
can be transformed into an equivalent FFE with delays Dα2(s) and Hα2(s) = Hα1(s)
by an appropriate linear transformation of the coefficients.
To show this, we substitute M = N − 1 to simplify the alegebra and then ex-
pand the expression for Hα(s) into a polynomial of (12sτ)m using binomial expansion
101
102 APPENDIX B. EQUIVALENCE OF FIRST-ORDER DELAYS IN FFES
obtaining
Hα(s) =M∑n=0
cnDnα(s)
=M∑n=0
cn(1− 1
2αsτ)n
(1 + 12sτ)n
=
∑Mn=0 cn(1 + 1
2sτ)M−n(1− 1
2αsτ)n
(1 + 12sτ)M
=
∑Mn=0 cn
(∑M−nk1=0
(M−nk1
)(1
2sτ)k1
)(∑nk2=0
(nk2
)(−α)k2(1
2sτ)k2
)(1 + 1
2sτ)M
=
∑Mn=0 cn
(∑M−nk1=0
∑nk2=0
(M−nk1
)(nk2
)(−α)k2(1
2sτ)k1+k2
)(1 + 1
2sτ)M
=
∑Mn=0 cn
(∑Mm=0 amn(1
2sτ)m
)(1 + 1
2sτ)M
. (B.3)
The numerator of the final expression can be written more elegantly in matrix form
as
bTAαc =[(1
2sτ)M (1
2sτ)M−1 . . . 1
2sτ 1
]a00 a01 . . . a0N
a10 a11 . . . a1M
......
. . ....
aM0 aM1 . . . aMM
c0
c1
...
cM
(B.4)
so that
Hα(s) =bTAαc
(1 + 12sτ)M
. (B.5)
To identify the expression for amn, notice that terms containing (12sτ)m require k1 +
k2 = m. Therefore, the double summation over k1 and k2 from above contributes at
most one term to (12sτ)m for each k1 and can be reduced to a single summation over
the index k. This requires the substitutions k1 → k and k2 → m − k. The double
103
summation will only contribute a term when there exists a k that simultaneously
meets the constraints 0 ≤ k ≤ M − n and 0 ≤ m − k ≤ n. This is equivalent to a
single constraint of the form max(0,m − n) ≤ k ≤ min(M − n, m). Therefore, we
conclude that
amn =
min(M−n,m)∑k=max(0,m−n)
(M − nk
)(n
m− k
)(−α)m−k. (B.6)
From this we see that the matrix Aα depends only on the order of the FFE and α.
Now we can write the transfer functions Hα1(s) and Hα2(s) as
Hα1(s) =bTAα1cα1
(1 + 12sτ)M
(B.7)
Hα2(s) =bTAα2cα2
(1 + 12sτ)M
. (B.8)
From this, we see that to have the equality Hα1(s) = Hα2(s) we need
Aα2cα2 = Aα1cα1 (B.9)
→ cα2 = A−1α2Aα1cα1 . (B.10)
A common case is the transformation from ideal Pade delays into some pole and zero
offset, α. For this case, α1 → 1 and α2 → α resulting in
cα = A−1α A1c1
= Mαc1 (B.11)
where we defined
Mα = A−1α A1. (B.12)
A MATLAB function to generate the matrix Aα is listed here for reference.
104 APPENDIX B. EQUIVALENCE OF FIRST-ORDER DELAYS IN FFES
Listing B.1: MATLAB function to generate the matrix Aα.
function A = create A (N, alpha )
A = nan (N, N) ;
for m = 0 :N−1
for n = 0 :N−1
A(m+1, n+1) = create a mn (m, n , N, alpha ) ;
end
end
end
function a mn = create a mn (m, n , N, alpha )
a mn = 0 ;
for k = max(0 , m−n ) :min(N−1−n , m)
a mn = a mn + (−1)ˆ(m−k)∗ alpha ˆ(m−k ) . . .
∗binom (N−1−n , k )∗binom (n , m−k ) ;
end
end
function y = binom (n , k )
y = f a c t o r i a l (n)/ f a c t o r i a l (n−k )/ f a c t o r i a l ( k ) ;
end
Appendix C
Pade approximants
The Pade approximant is the best rational function approximation to the Taylor
series of a function in the sense that it matches the Taylor coefficients to the highest
possible order [18]. Mathematically, this can be expressed for a function, f(x), as the
rational function, Rm/n(x), with order m in the numerator and n in the denominator
Rm/n(x) =a0 + a1x+ · · ·+ amx
m
1 + b1x+ · · ·+ bnxn(C.1)
such that
f(0) = R(0)
f (1)(0) = R(1)(0)
f (2)(0) = R(2)(0) (C.2)
...
f (m+n)(0) = R(m+n)(0)
where f (n)(0) represents the nth derivative of f(x) at x = 0.
105
106 APPENDIX C. PADE APPROXIMANTS
As an example, consider f(x) = ex with the associated Taylor series
f(x) =∞∑n=0
xn
n!= 1 + x+
1
2x2 + · · · (C.3)
and m = n = 1 so that
R1/1(x) =a0 + a1x
1 + b1x. (C.4)
The terms a0, a1, and b1 constitute three degrees of freedom that allow us to match
the Taylor series to third order. Applying the constraints we see that
f(0) = 1 = a0 = R(0) (C.5)
f (1)(0) = 1 = a1 − a0b1 = R(1)(0) (C.6)
f (2)(0) = 1 = 2b1(a0b1 − a1) = R(2)(0). (C.7)
Therefore a0 = 1, b1 = −12, and a1 = 1
2and we see that
R1/1(x) =1 + 1
2x
1− 12x. (C.8)
Continuing in this fashion, the second-order case is
R2/2(x) =1 + 1
2x+ 1
12x2
1− 12x+ 1
12x2
(C.9)
and the third-order case is
R3/3(x) =1 + 1
2x+ 1
10x2 + 1
120x3
1− 12x+ 1
10x2 − 1
120x3. (C.10)
Appendix D
Low-frequency nonlinearity
simulation
D.1 Problem formulation
This appendix presents three methods for simulating the Taylor series coefficients
for two-dimensional functions. The expressions derived are for the case with voltage
inputs and current output, but these methods apply equally well to the other possible
cases. For this case, the goal is to determine the coefficients gkl in the Taylor series
expansion
io(vi, vo) =∑k, l
gklvki v
lo. (D.1)
Two of the methods rely on transient simulations while the final method uses dc
sweep simulations.
107
108 APPENDIX D. LOW-FREQUENCY NONLINEARITY SIMULATION
D.2 Transient simulation
For the transient methods, the inputs are set to
vi(t) = A1 cos(2πf0t+ φ1) (D.2)
vo(t) = A2 cos(2πMf0t+ φ2) (D.3)
(D.4)
where M is an integer. Sampling at Ts = 1Nf0
by substituting t→ nf0N
we obtain
vi[n] = A1 cos(
2πn
N+ φ1
)(D.5)
vo[n] = A2 cos(
2πMn
N+ φ2
)(D.6)
io[n] =∑k, l
gklvki [n]vlo[n] (D.7)
where N M is an integer. For both of the following methods, a practical choice is
A1 = A2 = 10 mV.
D.2.1 DFT method
The first method to extract the coefficients is in the frequency domain by taking
the DFT of the output current. This method is practical for lab measurements
because the magnitude of the DFT can be approximately measured using a spectrum
analyzer and highly-linear sinusoidal inputs can be generated with signal generators
and bandpass filters.
D.2. TRANSIENT SIMULATION 109
0 20 40 60
−150
−100
−50
0
DFT Bin Number
Magnitude(dB)
∝ g10A1
∝ g20A2
∝ g30A3
∝ g01A1
∝ g02A2
∝ g03A3∝ g21A
3
∝ g11A2
∝ g12A3
Figure D.1: DFT of io[n] with A1 = A2 = 10 mV.
The DFT of the output current is
Io[m] ≡ 1
NDFT(io[n])
=1
N
N−1∑n=0
io[n]e−j2πmnN . (D.8)
The nonlinearity of the transconductor mixes vi and vo which can be observed in the
DFT plot in figure D.1. The following derivation shows that the coefficients can be
determined from their associated DFT bin.
Using the identity
cos(x) =1
2
(ejx + e−jx
)(D.9)
we can see that io[n] is the sum of complex exponentials of the form
Cej2πknN (D.10)
where k is an integer and C is a complex constant. In particular, we see that
110 APPENDIX D. LOW-FREQUENCY NONLINEARITY SIMULATION
vki [n]vlo[n] =
(1
2A1e
jφ1
)k (1
2A2e
jφ2
)lej2π
nN
(k+lM)
+
(1
2A1e
−jφ1)k (
1
2A2e
−jφ2)le−j2π
nN
(k+lM) + · · · (D.11)
Taking into consideration the identity
1
N
N−1∑n=0
ej2πknN =
1 if k = 0 (mod N)
0 otherwise
(D.12)
we can conclude that the term Io[m] depends only on the terms in io[n] containing
ej2πmnN . Therefore,
Io[1] ≈ g10A1
2ejφ1 (D.13)
Io[M ] ≈ g01A2
2ejφ2 (D.14)
Io[k + lM ] ≈ gkl
(A1
2ejφ1
)k (A2
2ejφ2
)l. (D.15)
If the sign of g10 and g01 are known, then
g10 ≈ sgn(g10)2|Io[1]|A1
(D.16)
g01 ≈ sgn(g01)2|Io[M ]|A2
. (D.17)
Alternatively, if φ1 and φ2 are known, then
g10 ≈ 2Io[1]
A1ejφ1(D.18)
g01 ≈ 2Io[M ]
A2ejφ2. (D.19)
D.2. TRANSIENT SIMULATION 111
For either case, the expression for all other Taylor coefficients is
gkl ≈Io[k + lM ](
Io[1]g10
)k (Io[M ]g01
)l . (D.20)
The following MATLAB code determines the Taylor coefficients using this method.
Listing D.1: MATLAB function to find two-dimensional Taylor coefficients using the
transient DFT method.
function gtrans = c a l c g t r a n s f f t ( f , x , y , th r e sho ld )
i f nargin < 4 ; th r e sho ld = 1e−12; end ;
Ax = max(abs ( x ) ) ;
Ay = max(abs ( y ) ) ;
[ ˜ , kx ] = max(abs ( f f t ( x ) ) ) ; kx = kx − 1 ;
[ ˜ , ky ] = max(abs ( f f t ( y ) ) ) ; ky = ky − 1 ;
N=length ( f ) ;
inds = 1 :N;
f f f t=f f t ( f ) ;
n o t c o r r e c t e d = [ ones (1 , N/2) , zeros (1 , N/ 2 ) ] ;
n o n c o r r e c t e d f f f t = abs ( f f f t ( find ( n o t c o r r e c t e d ) ) ) ;
while any( n o n c o r r e c t e d f f f t > th r e sho ld )
v a l i d i n d s = find ( n o n c o r r e c t e d f f f t > th r e sho ld ) ;
[ ˜ , min ind ] = min( n o n c o r r e c t e d f f f t ( v a l i d i n d s ) ) ;
nonco r r e c t ed inds = inds ( find ( n o t c o r r e c t e d ) ) ;
temp = nonco r r e c t ed inds ( v a l i d i n d s ) ;
m in ind ac tua l = temp ( min ind ) ;
n o t c o r r e c t e d ( min ind ac tua l ) = 0 ;
112 APPENDIX D. LOW-FREQUENCY NONLINEARITY SIMULATION
k = round ( ( min ind actua l −1)/ky ) ;
j = abs (round ( ( min ind actua l−1 − k∗ky )/ kx ) ) ;
x j y k f f t = f f t ( x . ˆ j .∗ y . ˆ k ) ;
g t rans ( j +1, k+1) = real ( x j y k f f t ( min ind ac tua l ) ’ . . .
∗ f f f t ( min ind ac tua l ) )/ abs ( x j y k f f t ( min ind ac tua l ) ) ˆ 2 ;
n o n c o r r e c t e d f f f t = abs ( f f f t ( find ( n o t c o r r e c t e d ) ) ) ;
end
D.2.2 LMS method
The LMS method outlined in this section provides more accurate Taylor coefficient
values as compared to the last section. The limitation is that this method is not
practical for measured data.
First we define the vectors
vTi =[vi[0] · · · vi[N − 1]
](D.21)
vTo =[vo[0] · · · vo[N − 1]
](D.22)
iTo =[io[0] · · · io[N − 1]
](D.23)
gT =[g00 · · · g03 g10 · · · g33
](D.24)
and the matrix
A =[v0i v0
o · · · v0i v3
o v1i v0
o · · · v3i v3
o
](D.25)
where the operator represents the element-wise Hadamard product. Then we see
D.2. TRANSIENT SIMULATION 113
that
Ag ≈ io (D.26)
and we can solve for the optimum coefficients in the LMS sense as
g? = (ATA)−1AT io. (D.27)
The following MATLAB code determines the Taylor coefficients using this method.
Listing D.2: MATLAB function to find two-dimensional Taylor coefficients using the
transient LMS method.
function gtrans = c a l c g t r a n s l m s ( f , x , y , N)
i f nargin < 4 ; N = 3 ; end ;
x = x ( : ) ; % make column v e c t o r
y = y ( : ) ; % make column v e c t o r
f = f ( : ) ; % make column v e c t o r
A = [ ] ;
for j = 0 :N
for k = 0 :N
A = [A, x . ˆ j .∗ y . ˆ k ] ;
end
end
gtrans = A\ f ;
g t rans = reshape ( gtrans , [N+1, N+1 ] ) ’ ;
114 APPENDIX D. LOW-FREQUENCY NONLINEARITY SIMULATION
D.3 DC simulation
This method provides an alternative to the transient methods from the previous sec-
tion. The Taylor coefficients are calculated with the derivatives that are approximated
from the finite differences. A two-dimensional dc sweep simulation gives the terms
io[m, n] = io(m∆v, n∆v) (D.28)
where m and n are integers and ∆v is the step size. With these points, the Taylor
coefficients can be calculated as
gMN =1
M !N !(2∆v)M+N
M∑m=0
N∑n=0
(−1)m+n
(M
m
)(N
n
)io[M − 2m, N − 2n]. (D.29)
Smaller ∆v values result in errors due to small differences in output current value.
Large ∆v can exceed the radius of convergence of the Taylor series causing errors.
Practical values for ∆v are in the range between 1 mV and 10 mV.
The following MATLAB code determines the Taylor coefficients using this method.
Listing D.3: MATLAB function to find two-dimensional Taylor coefficients using the
dc discrete differences.
function gdc = ca l cgdc ( f , x , y , N)
i f nargin < 4 ; N = 3 ; end ;
dx = d i f f ( x ( 1 : 2 ) ) ;
dy = d i f f ( y ( 1 : 2 ) ) ;
c en t e r = ( s ize ( f ) + 1 )/2 ;
gdc=nan (N+1, N+1);
for j =1:N+1
for k=1:N+1
D.3. DC SIMULATION 115
gdc ( j , k)= der iv2 ( f , dx , dy , center , j −1, k−1) . . .
/ f a c t o r i a l ( j −1)/ f a c t o r i a l (k−1);
end
end
end
function d = der iv2 ( z , dx , dy , xy , ordx , ordy )
i f ordy==0
cvy = [ 1 ] ;
indy = [ 0 ] ;
else
cvy =[1 ,−1] ;
for k = 1 : ordy−1
cvy=conv ( cvy , [ 1 , − 1 ] ) ;
end
i f mod( ordy ,2)==0
indy =[ordy /2:−1:−ordy / 2 ] ;
else
indy =[ordy :−2:−ordy ] ;
end
end
d=0; m=1;
for k=xy(2)+ indy
i f ordx==0
d=d+cvy (m)∗ z (k , xy ( 1 ) ) ;
else
d=d+cvy (m)∗ de r i v ( z (k , : ) , dx , xy ( 1 ) , ordx ) ;
116 APPENDIX D. LOW-FREQUENCY NONLINEARITY SIMULATION
end
m=m+1;
end
d=d/( dy∗(2ˆmod( ordy , 2 ) ) ) ˆ ordy ;
end
function d = der i v (y , dx , n , order )
i f s ize (y ,1)> s ize (y , 2 )
y=transpose ( y ) ;
end
cv =[1 ,−1] ;
for k = 1 : order−1
cv=conv ( cv , [ 1 , − 1 ] ) ;
end
i f mod( order ,2)==0
d=sum( y (n+[ order /2:−1:− order / 2 ] ) . ∗ cv )/ dxˆ order ;
else
d=sum( y (n+[ order :−2:− order ] ) . ∗ cv )/(2∗dx )ˆ order ;
end
end
Appendix E
Unity-gain stage nonlinearity
E.1 Analog-inverter transconductor
The current through a MOSFET is a two-dimensional function of the gate-source
voltage and gate-drain voltage. We define the Taylor series of these functions for the
PMOS and the NMOS as
isd,p =∑j,k
g(p)jk v
jsg,pv
ksd,p (E.1)
ids,n =∑j,k
g(n)jk v
jgs,nv
kds,n. (E.2)
Kirschoff’s current law for the analog-inverter transconductor in figure E.1 gives the
equation
io = ids,n − isd,p
=∑j,k
(g
(n)jk − (−1)j+kg
(p)jk
)vji v
ko
=∑j,k
Gjkvji vko (E.3)
117
118 APPENDIX E. UNITY-GAIN STAGE NONLINEARITY
vi vo
io
isd,p
ids,n
Figure E.1: Schematic diagram of theinverter transconductor for nonlinear-ity analysis.
vi vo
io1 io2
Figure E.2: Schematic diagram of theunity-gain stage for nonlinearity anal-ysis.
where we defined
Gjk = g(n)jk − (−1)j+kg
(p)jk . (E.4)
E.2 Unity-gain stage
The current of the input transcoductor in figure E.2 is a two-dimensional function of
the input and output voltages with the Taylor series
io1 =∑j,k
Gjkvji vko (E.5)
where the coefficients Gjk are defined in (E.4). The self-biased load in figure E.2 is a
width-scaled replica of the input transconductor. That is, it has the same quiescent
gate-source voltage and gate-drain voltage, but the transistor widths are scaled by β.
Therefore, the nonlinear I − V characteristics of the load are simply scaled by β and
we can write
io2 = β∑j,k
Gjkvjovko . (E.6)
E.2. UNITY-GAIN STAGE 119
The output voltage will be a a function of the input voltage with the Taylor series
vo =∑n
anvni . (E.7)
Kirschoff’s current law for the output node gives the equation
io1 + io2 =∑j,k
Gjkvko
(vji + βvjo
)= 0. (E.8)
E.2.1 First-order case
Solving for a1 can be achieved through traditional methods resulting in
a1 =G10
β(G10 +G01) +G01
. (E.9)
For a1 = 1 we need
β(G10 +G01) +G01 = G10 (E.10)
and after solving for β we find
β =G10 −G01
G10 +G01
=Ai − 1
Ai + 1(E.11)
where Ai = G10
G01is the intrinsic gain of the inverter transconductor. For Ai 1, the
scale factor is unity.
E.2.2 Second-order case
Equating the terms from (E.8) containing v2i gives the expression
−a2(βG10 + (1 + β)G01) = G20(1 + β)−G11(1− β) +G02(1 + β) (E.12)
120 APPENDIX E. UNITY-GAIN STAGE NONLINEARITY
By substituting the equality in (E.10) and rearranging, we find the second-order
coefficient to be
a2 = −G20
G10
(1 + β) +G11
G10
(1− β)− G02
G10
(1 + β). (E.13)
E.2.3 Third-order case
Equating the terms from (E.8) containing v3i gives the expression
−a3(βG10 + (1 + β)G01) = G30(1− β)−G21(1 + β) +G12(1− β)−G03(1 + β)
− a2 (G20(2β) +G11(2β − 1) +G02(2β + 2)) . (E.14)
By substituting the equality in (E.10) and rearranging, we find the third-order coef-
ficient to be
a3 = −G30
G10
(1− β) +G21
G10
(1 + β)− G12
G10
(1− β) +G03
G10
(1 + β)
+ a2
(G20
G10
(2β) +G11
G10
(2β − 1) +G02
G10
(2β + 2)
). (E.15)
E.3 Comparison with simulation
Using the method decribed in appendix D.2, the Taylor coefficients, Gjk, and an
are simulated for the unity-gain stage designed as described in §6.1. The Taylor
coefficients, Gjk, for the load transconductor are listed in table E.1. The intrinsic
gain of the transconductor is Ai = 5.84 which corresponds to β = 0.71. From these
E.3. COMPARISON WITH SIMULATION 121
Table E.1: Transistor-level simulated Taylor coefficient values, Gjk, for the degener-ated inverter transconductor load.
Gjk k = 0 k = 1 k = 2 k = 3
j = 0 0 0.83 mS −0.18 mS/V 0.17 mS/V2
j = 1 4.85 mS −0.94 mS/V −2.43 mS/V2 —
j = 2 −1.52 mS/V −3.74 mS/V2 — —
j = 3 −13.9 mS/V2 — — —
values we calculate the second-order coefficient to be
a2 = − G20
G10
(1 + β)︸ ︷︷ ︸0.53 V−1
+G11
G10
(1− β)︸ ︷︷ ︸−0.06 V−1
− G02
G10
(1 + β)︸ ︷︷ ︸0.06 V−1
= 0.53 V−1 (E.16)
which is in reasonable agreement with the directly simulated case where
a2sim = 0.64 V−1. (E.17)
The discrepancy is due to the fact that the scale factor, β, is implemented by gener-
ating the load with triode devices, which distorts the relationship between the input
and load transconductor Taylor coefficients. This can also be accounted for, but
the analysis is complex and of little added value. This simulation suggests that a
reasonable approximation for the second-order coefficient is
a2 ≈G20
G10
(1 + β). (E.18)
122 APPENDIX E. UNITY-GAIN STAGE NONLINEARITY
Similarly, for the third-order coefficient
a3 = −
0.84 V−2︷ ︸︸ ︷G30
G10
(1− β) +
−1.32 V−2︷ ︸︸ ︷G21
G10
(1 + β)−
0.15 V−2︷ ︸︸ ︷G12
G10
(1− β) +
0.06 V−2︷ ︸︸ ︷G03
G10
(1 + β)
+ a2G20
G10
(2β)︸ ︷︷ ︸−0.24 V−2
+ a2G11
G10
(2β − 1)︸ ︷︷ ︸−0.04 V−2
+ a2G02
G10
(2β + 2)︸ ︷︷ ︸−0.07 V−2
= −0.62 V−2 (E.19)
as compared to the directly simulated case
a3sim = −0.68 V−2. (E.20)
A reasonable approximation is
a3 ≈G30
G10
(1− β) +G21
G10
(1 + β) + a2G20
G10
(2β). (E.21)
It is important to note that, while the term a2 can be reduced with a pseudo-
differential implementation, its contribution to (E.21) can not be removed.
Appendix F
Unity-gain stage supply rejection
F.1 Single-ended
For the single-ended case, the traditional small-signal analysis is sufficient. For the
unity-gain stage in figure F.1, the supply rejection is
vo
vdd
=2gm,p + 2gds,p
gm,p + gm,n + 2gds,p + 2gds,n
≈ 2gm,p
gm,p + gm,n
≈ 1. (F.1)
Similarly, the ground rejection is
vo
vss
=2gm,n + 2gds,n
gm,p + gm,n + 2gds,p + 2gds,n
≈ 2gm,n
gm,p + gm,n
≈ 1. (F.2)
In other words, supply noise passes directly to the output.
123
124 APPENDIX F. UNITY-GAIN STAGE SUPPLY REJECTION
vdd
vss
vo
Vic
Figure F.1: Schematic diagram of the unity-gain stage for supply rejection analysis.
F.2 Pseudo-differential
A pseudo-differential implementation rejects the single-ended supply noise to the
extent that the circuit is balanced. But even a fully-balanced pseudo-differential
circuit is not completely immune to supply noise. To see this, assume that the
transistors have the Taylor series
isd,p =∑j,k
g(p)jk v
jsg,pv
ksd,p (F.3)
ids,n =∑j,k
g(n)jk v
jgs,nv
kds,n. (F.4)
The output of the unity-gain stage is a function of both the input voltage, vi, and the
supply voltage, vdd, which can be represented with the two-dimensional Taylor series
vo =∑m,n
amnvmi v
ndd. (F.5)
The term a10 is the linear gain from the input to the output and a01 is the linear gain
from the supply to the output. For a pseudo-differential implementation, the output
F.2. PSEUDO-DIFFERENTIAL 125
voltage is
vod =∑m,n
amn
(vid
2
)mvndd −
∑m,n
amn
(−vid
2
)mvndd
=∑
m odd, n
2amn
(vid
2
)mvndd. (F.6)
The first-order supply gain term, a01, is canceled and the first-order input gain term,
a10, remains, as expected. The most dominant term that is not canceled by the
pseudo-differential implementation is a11vidvdd. This product can be thought of as
the input signal mixed with the supply noise with a conversion gain a11. The analysis
to find the expression for a11 is messy, but a reasonably accurate approximation is
a11 = −g(p)20
g(p)10
. (F.7)
For accuracy, this expression should be extracted from transistor-level simulations,
but some intuition can be drawn from a square-law approximation. If we assume
ISD,p =1
2κp(VSG,p − Vt,p)2 (F.8)
then the non-zero terms are
g(p)10 =
∂ISD,p
∂VSG,p
= κp(VSG,p − Vt,p) (F.9)
g(p)20 =
1
2
∂2ISD,p
∂V 2SG,p
=1
2κp (F.10)
and substituting into the expression for a11 we see that
a11 = −1
2
1
(VSG,p − Vt,p)= −1
4
(gm,p
ISD,p
). (F.11)
126 APPENDIX F. UNITY-GAIN STAGE SUPPLY REJECTION
Therefore, we can expect the conversion gain to be greater than unity, but less than
ten.
Appendix G
Switched-capacitor tuning circuit
As discussed in §5.6.2, the delay has the largest contribution to the total FFE offset.
To mitigate this problem the triode gate voltages, Vbp and Vbn, can be biased to adjust
for the gain and common mode of the delays as depicted in the schematic diagram in
figure G.1. Figure G.2 shows the contour lines of unity gain and half-supply common
mode in the space of Vbp and Vbn. The intersection of these lines is the optimal bias
point where both objectives are simultaneously achieved: the gain is unity and the
common mode is half of the supply. It is possible to tune for PVT variations by
finding this optimal bias point for a replica gain stage and mirroring the bias voltages
to the delays in the core FFE.
One conceivable tuning method is to create a feedback loop for the replica gain
stage that sets Vbp so that the common mode is VDD/2. For this case, sweeping
Vbn and plotting Vbp produces the contour line of constant common mode as in figure
G.2. The issue with this simple solution is revealed when mismatch is introduced into
the gain stage. Figure G.3 (top) shows a simulation of the contour lines of constant
common mode for 100 Monte Carlo points. The device mismatch causes these lines
to vary significantly. This effect is due to the feedback loop adjusting Vbp to also
compensate for the mismatch of the delays in setting the common mode. The gain
127
128 APPENDIX G. SWITCHED-CAPACITOR TUNING CIRCUIT
C
Gm GmGmGm
Vbp
Vbn
vovi
Figure G.1: The single-path Pade-inspired delay schematic diagram with gate-voltagetunable triode-degenerated load transconductor.
from the triode devices to the output common mode is small and large variations
are necessary to compensate for small mismatches. The goal of the replica-based
tuning circuit is to adjust for global PVT variations and, as such, it is desirable to
tune the common mode without compensating for the mismatch. To accomplish this,
the common mode with the mismatch of the non-degenerated gain stage is measured
by shorting out the triode devices. This common mode is defined to be the natural
0.5 0.6 0.7 0.8 0.90
0.2
0.4
0.6
Constant GainConstant CMMonte Carlo
Vbn (V)
Vbp(V
)
Figure G.2: Contour lines of con-stant gain and common mode and 50point Monte Carlo simulation of con-verged bias voltages for the switched-capacitor circuit in figure G.4.
0.5 0.6 0.7 0.8 0.9 10
0.3
0.6
0.5 0.6 0.7 0.8 0.9 10
0.3
0.6
Vbn (V)
Vbn (V)
Vbp(V
)Vbp(V
)
Figure G.3: Monte Carlo simulation ofthe contour lines of constant commonmode with the output common modeforced to (top) half of the supply and(bottom) the natural common mode.
129
common mode expressed as VCMN. The lines of constant common mode are plotted
in figure G.3 (bottom) for a Monte Carlo simulation with the feedback loop modified
to force the common mode to this natural common mode. The spread is significantly
reduced, as expected. This calibration technique requires multiple phases which can
be achieved with the switched-capacitor circuit shown in figure G.4(a).
The calibration of gain and common mode is completed in four phases which are
depicted in figure G.4(b). Phase 1 and 2 make a small step in Vbn towards unity gain.
Phase 3 and 4 make a small step in Vbp towards the natural common mode.
During phase 1, the amplifier is auto-zeroed while −vi is sampled on one capacitor
and Avi on the other where A is the gain of the stage1. During phase 2, the voltage
is integrated onto Cf1 updating Vbn with the voltage
∆Vbn = vi(A− 1)Cs1
Cf1
. (G.1)
When A > 1 the bias voltage Vbn is increased and for A < 1 it is decreased. The gain
is monotonically decreasing for increasing Vbn, therefore, this acts as a restorative
force to set the gain to unity. The step size is proportional to the error in a fashion
similar to the LMS algorithm.
During phase 3, the amplifier is auto-zeroed while VCM−VCMN is sampled on Cs2.
During phase 4, the voltage is integrated onto Cf2 updating Vbp with the voltage
∆Vbp = (VCM − VCMN)Cs2
Cf2
. (G.2)
The natural common mode is a constant and VCM is monotonically decreasing for
increasing Vbp, therefore, this acts as a restorative force to set the common mode
equal to the natural common mode. Similar to the update step for Vbn, the step size
1The reference voltage VCM− vi is generated by a self-biased inverter transconductor with equalNMOS and PMOS widths, resulting in a voltage that is some value vi less than VCM.
130 APPENDIX G. SWITCHED-CAPACITOR TUNING CIRCUIT
Cp2
Cp1
Cs2
Cf2
CL
Cf1
CL
Cs1
Cs1
Cbp
Cbn
4
4
4
d4 4
4
1
1
2
2
2
d2 1
1
Gm Gm
1
2
4
2
d2
d4
4
1
Phase 1 Phase 2 Phase 3 Phase 4(b)
(a)
VCM - vi
Vbn
Vbp
to FFE
delays...
to FFE
delays...VCM
VCMN
full-size replica
unity-gain stage
Figure G.4: (a) The schematic diagram for the switched-capacitor gain and commonmode replica tuning circuit and (b) the associated clock phase diagram.
131
is proportional to the error.
The amplifier is implemented as a traditional two-stage for high gain to minimize
second-order offset effects not canceled by the auto-zeroing. This switched-capacitor
circuit is sensitive to leakage current so high-voltage I/O devices are used in the
amplifier with a 1.8 V supply. The current is 300 µA per amplifier for a total power
consumption of 1.1 mW. An input clock with frequency 500 kHz is used to generate
the clock phases in figure G.4(b) with each phase having a period of 2 µs. The output
converges within 1 ms in post-layout simulations. The plot in figure G.2 shows the
converged bias voltages for plot-layout Monte Carlo simulations for 50 points.
Unfortunately, there was an issue during testing that prevented a complete physi-
cal verification in the proof-of-concept design. The measured outputs Vbp and Vbn on
the oscilloscope showed large spikes occurring at the clock frequency. This issue was
not reproducible with post-layout simulations. A possible source of this problem is
the undesired overlap of adjacent clock phases. The internal clocks are not accessible
off chip so this could not be verified.
Appendix H
Gain compression generalization
For a nonlinear system defined by the expression
y = a1x+ a2x2 + a3x
3 + · · · (H.1)
the terms anxn can contribute signal directly proportional to x that is indistinguish-
able from the linear contribution of the term a1x. This effect is referred to as gain
compression because the contribution is typically opposite in sign from the linear
term. Here we define the linear contribution of xn as the gain error αn. For the
simple case with x = cos(ωt), we can use the trigonometric identity
cos3(ωt) =1
4(3 cos(ωt) + cos(3ωt)) (H.2)
to see that
α3 =3
4. (H.3)
For a general input x, a closed-form expression is not always known. In this case, the
statistical properties of the signal can be used for the analysis. To find the general
132
133
expression for αn we minimize the distortion power expression
var(xn − αnx) = var(xn)− 2αnE[xn+1] + α2nE[x2] (H.4)
by differentiating and setting equal to zero obtaining
∂
∂αnvar(xn − αnx) = −2E[xn+1] + 2αnE[x2] = 0 (H.5)
→ αn =E[xn+1]
E[x2](H.6)
with the associated distortion power
dn = var(xn − αnx)
= var(xn)− α2nE[x2]. (H.7)
For the zero-mean unit-variance sinusoid xs =√
2 cos(ωt), the distortion is
d3s = 410
16− 4
9
16=
1
4(H.8)
This expression is useful in characterizing the linearity of a system. The ratio of the
distortion power for target application input signal to that that of the sinusoidal case
can be used to predict the total distortion from conventional sinusoidal measurement
techniques. For example, a zero-mean unit-variance normally distributed signal, xg,
has the third-order distortion power
d3g = 15− 32 · 1 = 6. (H.9)
In other words, the output distortion power for a normally distributed input will
be 24× that of the case for a sinusoidal input of equal power. This factor is a
134 APPENDIX H. GAIN COMPRESSION GENERALIZATION
strong function of the signal characteristics. For example, the zero-mean unit-variance
square-wave signal xsq = sgn(cos(ωt)) has zero third-order distortion power. This can
be seen from the definition of the distortion power where
d3sq = 1− 12 · 1 = 0. (H.10)
Bibliography
[1] M. El-Chammas, “Background calibration of timing skew in time-interleaved
A/D converters,” Ph.D. dissertation, Stanford University, August 2010.
[2] M. Shanbhag. (2011, Apr.) 100 Gb/s simulated backplane channels. [Online].
Available: http://ieee802.org/3/100GCU/public/channel.html
[3] E. Mammei, F. Loi, F. Radice, A. Dati, M. Bruccoleri, M. Bassi, and A. Maz-
zanti, “A power-scalable 7-tap FIR equalizer with tunable active delay line for
10-to-25Gb/s multi-mode fiber EDC in 28nm LP-CMOS,” in ISSCC Dig. Tech.
Papers, Feb. 2014, pp. 142–143.
[4] “IEEE approved draft standard for ethernet amendment 2: Physical layer spec-
ifications and management parameters for 100 Gb/s operation over backplanes
and copper cables,” IEEE P802.3bj, Apr. 2014.
[5] J. D’Ambrosia, P. Mooney, and M. Nowell. (2013, May) 400 Gb/s ethernet:
Why now? [Online]. Available: http://goo.gl/n574PC
[6] J. F. Bulzacchelli, “Equalization for electrical links,” IEEE Solid-State Circuits
Mag., vol. 7, no. 4, pp. 23–31, 2015.
135
136 BIBLIOGRAPHY
[7] K. Smith, A. Wang, and L. Fujino, “Through the looking glass II - part 1 of 2:
Trend tracking for ISSCC 2013,” IEEE Solid-State Circuits Mag., vol. 5, no. 1,
pp. 71–89, 2013.
[8] B. Zhang, A. Nazemi, A. Garg, N. Kocaman, M. R. Ahmadi, M. Khanpour,
H. Zhang, J. Cao, and A. Momtaz, “A 195mW / 55mW dual-path receiver AFE
for multistandard 8.5-to-11.5 Gb/s serial links in 40nm CMOS,” in ISSCC Dig.
Tech. Papers, Feb. 2013, pp. 34–35.
[9] D. Cui, H. Zhang, N. Huang, A. Nazemi, B. Catli, H. G. Rhew, B. Zhang,
A. Momtaz, and J. Cao, “A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-
end with programmable gain control and analog peaking in 28nm CMOS,” in
ISSCC Dig. Tech. Papers, Jan. 2016, pp. 58–59.
[10] E.-H. Chen, R. Yousry, and C.-K. K. Yang, “Power optimized ADC-based serial
link receiver,” IEEE J. Solid-State Circuits, vol. 47, no. 4, pp. 938–951, Apr.
2012.
[11] A. Momtaz and M. Green, “An 80 mW 40 Gb/s 7-tap T/2-spaced feed-forward
equalizer in 65 nm CMOS,” IEEE J. Solid-State Circuits, vol. 45, no. 3, pp.
629–639, Mar. 2010.
[12] M. N. Sadiku, Elements of Electromagnetics. New York, NY: Oxford University
Press, 2001, vol. 428.
[13] J. Baker-Jarvis, M. D. Janezic, B. Riddle, C. L. Holloway, N. Paulter, and
J. Blendell, “Dielectric and conductor-loss characterization and measurements
on electronic packaging materials,” NIST, Boulder, CO, Tech. Rep. 1520, July
2001.
BIBLIOGRAPHY 137
[14] N. Wiener and Y. Lee, “Electrical network system,” U.S. Patent 2 124 599, July
26, 1938.
[15] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and Systems. Pear-
son, 2014.
[16] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits. Cam-
bridge University Press, 2003.
[17] R. Schaumann and M. E. V. Valkenburg, Design of Analog Filters. Oxford
University Press, USA, 2001.
[18] H. Pade, Sur la representation approchee d’une fonction par des fractions ra-
tionnelles. Gauthier-Villars et fils, 1892, no. 740.
[19] K. Bult and H. Wallinga, “A CMOS analog continuous-time delay line with
adaptive delay-time control,” IEEE J. Solid-State Circuits, vol. 23, no. 3, pp.
759–766, June 1988.
[20] S. K. Garakoui, E. A. M. Klumperink, B. Nauta, and F. F. E. V. Vliet, “A 1-
to-2.5GHz phased-array IC based on gm-RC all-pass time-delay cells,” in ISSCC
Dig. Tech. Papers, Feb. 2012, pp. 80–82.
[21] R. L. Geiger and E. Sanchez-Sinencio, “Active filter design using operational
transconductance amplifiers: a tutorial,” IEEE Circuits and Devices Mag., vol. 1,
no. 2, pp. 20–32, 1985.
[22] M. J. M. Pelgrom, H. P. Tuinhout, and M. Vertregt, “Transistor matching in
analog CMOS applications,” in IEDM Dig. Tech. Papers, Dec. 1998, pp. 915–
918.
[23] B. Widrow and S. Streans, Adaptive Signal Processing. Prentice-Hall, Inc., 1985.
138 BIBLIOGRAPHY
[24] R. Boesch, K. Zheng, and B. Murmann, “A 0.003 mm2 5.2 mW/tap 20 GBd
inductor-less 5-tap analog RX-FFE,” in VLSI Circuits Dig. Tech. Papers, to be
published.
[25] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University
Press, 2004.
[26] V. Balasubramanian. (2009, May) EyeMax 3m 28 AWG cable as-
sembly vs IEEE P802-3ba spec. Draft 2.0. [Online]. Available:
http://ieee802.org/3/ba/public/channel.html
[27] B. Nauta, “A CMOS transconductance-C filter technique for very high frequen-
cies,” IEEE J. Solid-State Circuits, vol. 27, no. 2, pp. 142–153, Feb. 1992.
[28] P. Patel. (2009, Sept.) 1 meter backplane channel. [Online]. Available:
http://ieee802.org/3/100GCU/public/channel.html
[29] V. Stojanovic, M. Horowitz, J. Zerbe, K. Yang, and W. Ellersick. (2003, May)
Stanford EE371 lecture notes: High-speed links (lecture 16). [Online]. Available:
http://goo.gl/kcvPe9
Top Related