Download - SIGNAL PRECONDITIONING USING ... - Stanford University

SIGNAL PRECONDITIONING USING FEEDFORWARD

EQUALIZERS IN ADC-BASED DATA LINKS

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL

ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Ryan Boesch

May 2016

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/dk653rc7126

© 2016 by Ryan Boesch. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/dk653rc7126

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Boris Murmann, Primary Adviser


Mark Horowitz


Madihally Narasimha

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

As the data rates for high-speed wireline transceivers continue to increase, inter-

symbol interference (ISI) due to channel loss is becoming more pronounced and mul-

tiple techniques have been suggested to address this issue. One technique that has

recently been gaining popularity is the ADC-based receiver. In ADC-based receivers,

a digital feedforward equalizer (FFE) is used in conjunction with a decision feedback

equalizer (DFE) to equalize the channel and recover the data. However, in order to

recover the data with a high fidelity, a power-hungry ADC is needed to digitize the

signal. Recent work has shown that an analog receive-side FFE (RX-FFE) prior to

the ADC can reduce the required ADC resolution while achieving the same BER.

In order to obtain a net improvement for the system, the RX-FFE must be imple-

mented with low power consumption, low noise, and small chip area. In this thesis, an

RX-FFE is demonstrated that meets these requirements and outperforms state-of-the-

art designs. The RX-FFE is constructed entirely with low-noise and power-efficient

analog-inverter transconductors and capacitors, avoiding the use of area-intensive in-

ductors. The delay element is implemented as a single-path Pade-inspired delay shown

to be equivalent to the first-order Pade delay in terms of RX-FFE performance. The

proof-of-concept RX-FFE is demonstrated to reduce the signal dynamic range by 2×

resulting in a 1 bit ADC resolution relaxation. The total power consumed is less than

26 mW with less than 0.62 mVRMS output noise for all coefficient values and an area

of only 0.003 mm2 in 40 nm CMOS.

v

Acknowledgments

I have been fortunate to interact with so many wonderful people during the course

of my Ph.D. I want to take this opportunity to thank those who played a big role in

the completion of this work.

Firstly, I must thank my advisor Professor Boris Murmann. My success in this

program can largely be attributed to his guidance and support. He has been a better

advisor than I could have dreamed of finding when I first started on this journey.

I also thank Professor Mark Horowitz and Dr. Madihally Narasimha for being

on my reading committee. I thank Professor Amin Arbabian for being on my orals

committee and Professor Jon Fan for chairing the orals committee.

I thank the Broadcom Foundation and Stanford’s initiative for Rethinking Analog

Design for the funding that they provided. I thank the TSMC University Shuttle

Program for the integrated circuit fabrication they provided.

Thanks to Tom Kwan from Broadcom for arranging presentations and providing

early feedback on my work. Thanks to Hiroshi Takatori, John Duan, and Albert

Vareljian from Futurewei for help with chip debugging and for access to test equip-

ment. Thanks to Frankie Liu and Vincent Lee from Oracle for access and support

with test equipment.

I thank Ann Guerra for all the administrative help she provided throughout my

degree. She has always gone above and beyond for me and all of the students in the

Murmann group. We are lucky to have her. In addition, I thank Joe Little for his

vi

IT support. The speed with which he replies to emails and resolves server issues is

paramount and I am greatly appreciative of the help I have received from him over

the years.

I would also like to thank everyone in the Murmann group, past and present. In

particular, thanks to Jonathon Spaulding, Doug Adams, and Martin Kraemer for the

trips aboard and for good times back home.

Last but not least, I thank my family whose love and support mean everything

to me. To my sister — you have been a role model of mine my entire life. Thanks

for setting the bar so high. To my brother — you continue to impress me each day.

I am proud of what you have accomplished and excited to see what you will achieve

next. To my mother - I am lucky to have enjoyed your unwavering support and

unconditional love. I certainly would not have made it here without you. To my wife

— I do not think I could have finished this degree without you. Meeting you is the

best thing to ever happened to me.

Finally, I dedicate this thesis to the memory of my father. All of the best parts

of who I am today can be traced back to you.

vii

Contents

Abstract v

Acknowledgments vi

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background 8

2.1 Channel characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Transmission line model . . . . . . . . . . . . . . . . . . . . . 8

2.1.2 Linear system model . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.3 Pulse response . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.4 Typical channels . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Receive feedforward equalizer (RX-FFE) . . . . . . . . . . . . . . . . 14

2.2.1 FFE operation . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Metrics of FFE performance . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.1 Peak-to-main ratio (PMR) . . . . . . . . . . . . . . . . . . . . 17

2.3.2 Eye opening equivalence . . . . . . . . . . . . . . . . . . . . . 20

3 Analog delays for FFEs 23

viii

3.1 Delay approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Ideal delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.2 Lumped delay line . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1.3 Bessel delays . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.4 Pade delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Equivalence of first-order delays . . . . . . . . . . . . . . . . . . . . . 30

3.2.1 Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.2 Coefficient spread . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.3 Example transformation . . . . . . . . . . . . . . . . . . . . . 32

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Analog FFEs in high-speed links 37

4.1 FFE design parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Simulation methodology . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 Delay type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 Delay time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.1 Mathematical analysis . . . . . . . . . . . . . . . . . . . . . . 41

4.4.2 Channel characteristic dependence . . . . . . . . . . . . . . . 43

4.4.3 First-order delays . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5 Number of taps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.6 Parasitic pole frequency . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.7 Coefficient resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.8 Main cursor attenuation . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Inverter-based FFE 52

5.1 Analog-inverter transconductor . . . . . . . . . . . . . . . . . . . . . 52

5.2 Unity-gain stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

ix

5.2.1 Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2.2 Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2.3 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2.4 Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2.5 Supply rejection . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.6 Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.3.1 Single-path Pade-inspired delay . . . . . . . . . . . . . . . . . 61

5.3.2 Comparison with two-path Pade delay . . . . . . . . . . . . . 64

5.4 Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.5 Summing circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.6 Full FFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.6.1 FFE noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.6.2 FFE mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 FFE design 71

6.1 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.2 Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.3 Summing circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.4 PRBS generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.5 Output driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7 Measurement results 77

7.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.1.1 On-chip channel . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.1.2 Off-chip channel . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.2 Test debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

x

7.3 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.3.1 Pulse responses and DR improvement . . . . . . . . . . . . . . 83

7.3.2 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.3.3 Eye diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.3.4 LMS system identification method . . . . . . . . . . . . . . . 87

7.4 Performance summary . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8 Conclusions 92

8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A FFE coefficient optimization 95

A.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

A.2 Brute force solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

A.3 MATLAB optimization toolbox . . . . . . . . . . . . . . . . . . . . . 99

B Equivalence of first-order delays in FFEs 101

C Pade approximants 105

D Low-frequency nonlinearity simulation 107

D.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

D.2 Transient simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

D.2.1 DFT method . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

D.2.2 LMS method . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

D.3 DC simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

E Unity-gain stage nonlinearity 117

E.1 Analog-inverter transconductor . . . . . . . . . . . . . . . . . . . . . 117

E.2 Unity-gain stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

xi

E.2.1 First-order case . . . . . . . . . . . . . . . . . . . . . . . . . . 119

E.2.2 Second-order case . . . . . . . . . . . . . . . . . . . . . . . . . 119

E.2.3 Third-order case . . . . . . . . . . . . . . . . . . . . . . . . . 120

E.3 Comparison with simulation . . . . . . . . . . . . . . . . . . . . . . . 120

F Unity-gain stage supply rejection 123

F.1 Single-ended . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

F.2 Pseudo-differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

G Switched-capacitor tuning circuit 127

H Gain compression generalization 132

Bibliography 135

xii

List of Tables

7.1 Test equipment for the measurements with the on-chip channel. . . . 79

7.2 Test equipment for the measurements with the off-chip channel. . . . 81

7.3 Performance summary for state-of-the-art RX-FFEs. . . . . . . . . . 91

E.1 Transistor-level simulated Taylor coefficient values, Gjk, for the degen-

erated inverter transconductor load. . . . . . . . . . . . . . . . . . . . 121

xiii

List of Figures

1.1 (a) Visual representation of a backplane system with transmitter, chan-

nel, and receiver (reproduced with permission from [1]) and (b) the

corresponding block diagram. . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Low-rate PAM2 data transmission with simple data recovery via thresh-

old comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 High-rate PAM2 data transmission with errors for simple data recovery

via threshold comparison. . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Comparison of (a) conventional, (b) ADC-based, and (c) proposed

transceiver architectures. . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Insertion loss versus frequency for various channels [2]. . . . . . . . . 14

2.2 Normalized pulse response versus time for various channels [2]. . . . . 14

2.3 Block diagram of an n-tap RX-FFE. . . . . . . . . . . . . . . . . . . 15

2.4 Visualization of the pulse response equalization for a 5-tap RX-FFE. 16

2.5 Pulse response versus time at the coefficient outputs and FFE output. 16

2.6 Normalized pulse responses versus time at the channel output and the

at the FFE output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.7 Transmitted signal, received signal, and equalized signal for a random

sequence of bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

xiv

2.8 Normalized magnitude response versus normalized frequency of the

channel, FFE, and channel+FFE. . . . . . . . . . . . . . . . . . . . . 18

2.9 Normalized pulse response versus time with the discrete pulse response

terms labeled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.10 The received pulses and received signal versus time demonstrating the

peak signal due to ISI. . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Schematic diagram of an N -order lumped-LC approximation of a loss-

less transmission line. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Magnitude (top) and phase (bottom) versus frequency for LC delays

of orders 1, 2, and 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Group delay versus frequency for LC delays of orders 1, 2, and 3. . . 26

3.4 Magnitude (top) and phase (bottom) versus frequency for Bessel delays

of orders 1, 2, and 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Group delay versus frequency for Bessel delays of orders 1, 2, and 3. . 27

3.6 Magnitude (top) and phase (bottom) versus frequency for Pade delays

of orders 1, 2, and 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.7 Group delay versus frequency for Pade delays of orders 1, 2, and 3. . 29

3.8 The spectral norm versus the pole and zero ratio of the delay, α, for

FFEs of orders 2, 3, 4, and 5. . . . . . . . . . . . . . . . . . . . . . . 31

3.9 Magnitude and phase versus frequency for first-order delays with α = 0,

α = 1/3, and α = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.10 Group delay versus frequency for first-order delays with α = 0, α =

1/3, and α = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.11 FFE magnitude (top) and phase (bottom) versus frequency for α = 0,

α = 1/3, and α = 1 with ideal coefficient transformations. . . . . . . . 34

xv

3.12 FFE magnitude (top) and phase (bottom) versus frequency for α = 0,

α = 1/3, and α = 1 with practical coefficient transformations. . . . . 34

3.13 Pulse responses versus time for the channel pulse and equalized pulses

after 5-tap FFEs with α = 0, α = 1/3, and α = 1 optimized with 5-bit

coefficient resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1 The 3-tap FFE block diagram with variable delay type and delay time

for the MATLAB simulation of the optimal delay time. . . . . . . . . 41

4.2 DR improvement versus delay time for a 3-tap FFE with Bessel and

Pade delay types of order 1 (solid), 2 (dashed), and 3 (dotted). . . . . 41

4.3 Block diagram of a 2-tap FFE with first-order Pade delays; variable

delay time, τ ; and optimal coefficient, c2. . . . . . . . . . . . . . . . . 44

4.4 DR improvement versus delay time for the 2-tap FFE in figure 4.3 for

various channel pulse inputs. . . . . . . . . . . . . . . . . . . . . . . . 44

4.5 DR improvement versus pole and zero ratio, α, for τ = 25 ps. . . . . . 46

4.6 DR improvement versus pole and zero ratio, α, for τg = 25 ps. . . . . 46

4.7 Block diagram of an n-tap FFE with variable delay time, τ , and opti-

mal coefficients, c1 to cn. . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.8 DR improvement versus delay time for 3-tap, 4-tap, and 5-tap FFEs

with first-order Pade delays. . . . . . . . . . . . . . . . . . . . . . . . 47

4.9 Block diagram of an n-tap FFE with variable parasitic pole frequency,

fp, at each node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.10 DR improvement versus fp for 3-tap, 4-tap, and 5-tap FFEs with first-

order Pade delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.11 DR improvement versus coefficient resolution (plus sign bit) for 3-tap,

4-tap, and 5-tap FFEs with first-order Pade delays. . . . . . . . . . . 49

xvi

4.12 DR improvement versus main cursor amplitude for 3-tap, 4-tap, and

5-tap FFEs with first-order Pade delays. . . . . . . . . . . . . . . . . 49

5.1 (a) The analog-inverter transconductor and (b) the associated transistor-

level schematic diagram. . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2 Example circuits using the analog-inverter transconductor. . . . . . . 53

5.3 Schematic diagram of the unity-gain stage with parasitics. . . . . . . 55

5.4 Schematic diagram of the inverter-based first-order Pade delay. . . . . 61

5.5 Block diagram of the buffered inverter-based first-order Pade delay. . 62

5.6 Schematic diagram of the buffered inverter-based first-order delay. . . 63

5.7 Schematic diagrams of (a) the single-path Pade-inspired delay of this

work and (b) the two-path Pade delay [3]. . . . . . . . . . . . . . . . 64

5.8 Half-circuit schematic diagram of the inverter-based coefficient. . . . . 65

5.9 Half-circuit schematic diagram of the inverter-based summing circuit. 66

5.10 Half-circuit schematic diagram of the inverter-based FFE. . . . . . . . 68

6.1 The single-path Pade-inspired delay schematic diagram with triode-

degenerated load transconductor. . . . . . . . . . . . . . . . . . . . . 72

6.2 Half-circuit schematic digram for the reduced input capacitance 5-bit

coefficient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.3 Block diagram of the on-chip signal generator including a PRBS gen-

erator and LVDS conversion stage. . . . . . . . . . . . . . . . . . . . 75

6.4 Half-circuit schematic diagram of the output driver. . . . . . . . . . . 76

7.1 Die photo of the proof-of-concept IC fabricated in the TSMC40 GP

process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.2 Test PCB photo and (inset) chip-on-board bonding. . . . . . . . . . . 79

7.3 Test signal paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

xvii

7.4 Measured eye diagram for a 20 Gb/s PRBS signal for the first chip

revision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.5 Post-layout simulated eye diagram for a 20 Gb/s PRBS signal with

additional supply resistance. . . . . . . . . . . . . . . . . . . . . . . . 82

7.6 Measured normalized pulse response for the 0.5 m FR4 PCB trace chan-

nel and the channel+FFE. . . . . . . . . . . . . . . . . . . . . . . . . 84

7.7 Normalized PRBS response generated from the pulse responses in fig-

ure 7.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.8 Measured and simulated integrated noise voltage versus coefficient value. 85

7.9 Eye diagrams for the on-chip channel measurements. . . . . . . . . . 86

7.10 Measured normalized pulse response with and without the FFE equal-

ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.11 Measured normalized PRBS response with and without the FFE equal-

ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.12 Block diagram of the LMS algorithm system identification. . . . . . 88

7.13 Impulse response for the bench and simulated channel response for the

on-chip channel test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.14 Impulse response for the bench and simulated equalized response for

the on-chip channel test. . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.15 Measured frequency response of the channel and channel+FFE for the


7.16 Simulated frequency response of the channel and channel+FFE for the


7.17 Signal distortion ratio versus input signal variance comparing sinusoid

and PRBS signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

D.1 DFT of io[n] with A1 = A2 = 10 mV. . . . . . . . . . . . . . . . . . . 109

xviii

E.1 Schematic diagram of the inverter transconductor for nonlinearity anal-

ysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

E.2 Schematic diagram of the unity-gain stage for nonlinearity analysis. . 118

F.1 Schematic diagram of the unity-gain stage for supply rejection analysis. 124

G.1 The single-path Pade-inspired delay schematic diagram with gate-voltage

tunable triode-degenerated load transconductor. . . . . . . . . . . . . 128

G.2 Contour lines of constant gain and common mode and 50 point Monte

Carlo simulation of converged bias voltages for the switched-capacitor

circuit in figure G.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

G.3 Monte Carlo simulation of the contour lines of constant common mode

with the output common mode forced to (top) half of the supply and

(bottom) the natural common mode. . . . . . . . . . . . . . . . . . . 128

G.4 (a) The schematic diagram for the switched-capacitor gain and com-

mon mode replica tuning circuit and (b) the associated clock phase

diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

xix

Chapter 1

Introduction

1.1 Motivation

Each day more people in more places are accessing more data at higher rates and

data centers need to adapt to keep up with the demand. User data is stored with

ever increasing density in server hardware and, to compound the problem, it is being

accessed more frequently. But the number of wireline links to this information is

limited by the planar structure of the PCBs on which they reside. Therefore, meeting

this increased demand requires increasing the data rates of these already very high-

speed links. The 100 Gigabit Ethernet standard (100GbE) as defined in IEEE 802.3bj

calls for four lanes at 25 Gb/s [4] and the feasibility of 400GbE is currently being

investigated [5]. Substantial innovations will be necessary to enable this standard

and other future standards. To add to the challenge, data centers are already very

power hungry and require elaborate cooling systems. Therefore, the increased data

rate cannot come with a corresponding increase in power consumption, which requires

a disruption to the classical power and speed relationship.

Short-range communication of data at high rates is typically performed with high-

speed transceivers over backplane systems as depicted in figure 1.1. In such systems,

1

2 CHAPTER 1. INTRODUCTION

TX RXChannelData In Data Out

(a)

(b)

Figure 1.1: (a) Visual representation of a backplane system with transmitter, channel,and receiver (reproduced with permission from [1]) and (b) the corresponding blockdiagram.

data is sent by a transmitter through the channel, which consists of the PCB trace,

connectors, etc., and is recovered at the receiver. The data is conventionally encoded

in pulse amplitudes which is referred to as pulse amplitude modulation (PAM). En-

coding 1 bit per symbol requires two amplitude levels and is referred to as PAM2 or

non-return to zero (NRZ). A low-rate example of this method is shown in figure 1.2.

In this case, the received signal resembles the transmitted signal and the data can

be recovered by simply comparing to a threshold (i.e. slicing). This method is only

viable up to a few gigabits per second [6]. For higher-rate transmissions, each pulse

is substantially dispersed by the channel and intersymbol interference (ISI) occurs.

An example of this is shown in figure 1.3 for the same transmit data and channel

response as in figure 1.2 but with a 4× increased data rate. For this case, slicing

without equalization results in errors in the output data. The source of the ISI is

the channel loss at frequencies equal to and less than the Nyquist frequency, which

becomes more problematic at higher data rates. Data recovery in the presence of ISI

is the primary challenge in advancing the data rate and many techniques have been

1.1. MOTIVATION 3

TX…100001010111… RXChannel

V

t

…100001010111…

1 1 1 111

… 0000 0 0 …

V

t

Figure 1.2: Low-rate PAM2 data transmission with simple data recovery via thresholdcomparison.

TX…100001010111… RXChannel

V

t

…000000?11111…

V

t

4× rate increase

errors

1 1 1 111

… 0000 0 0 …

Figure 1.3: High-rate PAM2 data transmission with errors for simple data recoveryvia threshold comparison.

employed to equalize for the channel loss.

The conventional transceiver architecture is depicted in figure 1.4(a). Linear op-

erations are commutative and equalization is equally effective both before and after

the channel. Before the channel, a pre-emphasis transmit feedforward equalizer (TX-

FFE) introduces high-frequency peaking to the transmit signal to counteract the

de-emphasis of the channel. The channel ISI is only observable at the receiver which

complicates the adaptation by requiring a back channel from the receiver to the trans-

mitter [6]. Also, the equalization performance is limited by the peak signal that can

be delivered by the circuit. As a consequence, additional equalization is required after

the channel. On the receive side, the continuous time linear equalizer (CTLE) boosts


the high-frequency signal content similar to the TX-FFE, but because the channel

has already attenuated the signal, the CTLE is not peak-power limited. The CTLE’s

limitation is in its adaptability. In most cases, the CTLE is blind to the channel and

the other equalizer blocks must adapt around it. Also, the high-frequency peaking

boosts the signal and noise indiscriminately, resulting in a noise penalty. This is

where the decision feedback equalizer (DFE) outperforms all other equalizers. The

DFE is an equalization block in which the noise-less recovered bits are scaled and

subtracted from the signal to remove the residual post-cursor ISI without introducing

additional noise. However, the DFE cannot correct for pre-cursor ISI due to causality

constraints. Therefore, the combined equalization of the TX-FFE and CTLE must

sufficiently suppress the pre-cursor ISI while the DFE can clean up the remaining

post-cursor ISI.

DigitalFFE DFE

TXFFE Channel CTLE DFE

ADC

ADC DSPRXFFE

reduced complexity

highpower

reducedpower

this work

Channel

Channel

blindequalization

peak power constrained

Conventional Architecture

1st Generation ADC Architecture

ProposedADC Architecture

simple TX

simple TX

(a)

(b)

(c)

Figure 1.4: Comparison of (a) conventional, (b) ADC-based, and (c) proposedtransceiver architectures.

The conventional transceiver architecture has proven effective in sustaining the

advances in data rates demanded by the evolution of the standards until only recently.

A limitation has been reached in the capacity of the channel for PAM2 signaling. For

many applications, advancing the symbol rate only adds signal content at frequencies

1.1. MOTIVATION 5

where the channel loss is greater than 40 dB [2], and a recent survey of wireline

transceivers found a trend for a 2× decrease in power efficiency for each additional

6 dB in channel loss at Nyquist [7]. For these reasons, scaling the symbol rate is no

longer a viable option for advancing the data rate. Instead, it is better to make use

of the SNR readily available at the lower frequencies and increase the number of bits

per symbol. Modulating with four pulse amplitude levels encodes 2 bits per symbol

and is referred to as PAM4. While this method can enable higher data rates, there is

a penalty in terms of system complexity. It is no longer possible to recover the data

with a single slicer and more complex data recovery circuits must be employed.

One such PAM4 receiver architecture that is gaining popularity is the ADC-based

architecture depicted in figure 1.4(b) [8, 9]. In this architecture, the received signal

is quantized by a high-speed ADC and the equalization is completed in the digital

domain with digital signal processing (DSP). The advantage of this architecture is in

the power and portability of DSP processing, and the disadvantage is in the power

consumption of the high-speed ADC. The required resolution of the ADC is deter-

mined by the degree of equalization necessary in the DSP. For example, a 2 bit ADC

can completely recover the data with no post-processing necessary for a well-behaved

channel that introduces insignificant ISI. For systems with substantial channel loss,

an ADC resolution up to 8 bits can be necessary [9]. Therefore, it is desirable to per-

form some equalization in the analog front end to reduce the resolution requirements

of the ADC and complexity of DSP [10].

A CTLE can be used to perform some equalization prior to digitization, but

recent work has shown that an analog receive-side feedforward equalizer (RX-FFE)

prior to the ADC can further reduce the required ADC resolution while achieving

the same bit error rate (BER) [10]. The proposed transceiver architecture with the

RX-FFE is shown in figure 1.4(c). The objective of the RX-FFE is not to completely

equalize the channel and open the eye, but instead to reduce the dynamic range of the


signal resulting in a relaxation of the ADC resolution requirement. This can result in

substantial power savings for the system. For example, the 10 GS/s 6 bit ADC in [8]

consumes 143 mW of power. A reduction in the required resolution by 1 bit reduces

the ADC power by at least 2× for a saving of over 70 mW. Unfortunately, the power

consumption of previous state-of-the-art RX-FFEs exceeds this value, nullifying any

potential improvements in system performance [3, 11]. In this work, we present an

RX-FFE implemented with low power consumption, low noise, and small chip area

that outperforms these state-of-the-art designs, enabling the RX-FFE for this target

application.

1.2 Organization

The remainder of the thesis is organized as follows:

• Chapter 2 covers some background material that is prerequisite to the work in

this thesis. In §2.1.1, we discuss the characteristics of wireline channels. In

§2.2, we present the architecture and operation of the RX-FFE. In §2.3, we

introduce a metric to measure the performance of an equalizer with respect to

ADC resolution relaxation.

• Chapter 3 looks at analog delays for RX-FFEs. In §3.1, we consider the prop-

erties of various delay approximations with a focus on the Pade delay employed

in this work. In §3.2.1, we prove the equivalence of first-order delays in the

application of RX-FFEs and consider the practical limitations of the theorem.

• Chapter 4 studies the design space of RX-FFEs through MATLAB simulations.

The impact on performance is investigated for delay type, delay time, number of

taps, parasitic bandwidth, coefficient resolution, and main-cursor attenuation.

1.2. ORGANIZATION 7

The results from this chapter are used to guide architecture choices in chapter

5 and design decisions in chapter 6.

• Chapter 5 introduces the inverter-based RX-FFE. In §5.1, the analog-inverter

transconductor is discussed. In §5.2, we cover the performance equations for the

inverter-based unity-gain stage on which the performance of the FFE depends.

In §5.3, we introduce the single-path Pade delay used in this work. Finally, we

end the chapter with an overview of the complete inverter-based RX-FFE in

§5.6.

• Chapter 6 presents some of the design challenges and solutions for the proof-of-

concept integrated circuit.

• Chapter 7 covers the test procedures and provides the measurement results for

the proof-of-concept integrated circuit. In §7.3.1, the measured ADC resolution

relaxation performance of the RX-FFE is presented. In §7.4, a performance

summary is given, comparing the proof-of-concept RX-FFE performance to

previous state-of-the-art designs.

• Finally, chapter 8 concludes the thesis and outlines future work directions.

Chapter 2

Background

2.1 Channel characteristics

In this section, we describe the channel characteristics for typical backplane systems.

We start with transmission line theory to understand the frequency dependence of

the channel loss. Then we introduce some typical channels that demonstrate these

characteristics. These channels are similar to the channel for the target application

and are used as a reference for FFE MATLAB simulations in chapter 4.

2.1.1 Transmission line model

The channel of a backplane system can be modeled as a lossy transmission line. For

a single frequency of excitation, the forward-propagating wave at the position z and

time t has the form [12]

V (z, t) = V +e−αz cos (ωt− βz) (2.1)

where V + is the wave amplitude, β is the phase constant, ω is the angular frequency,

and α is the attenuation constant. The attenuation constant is comprised of the sum

8

2.1. CHANNEL CHARACTERISTICS 9

of the conduction loss attenuation constant, αc, and the dielectric loss attenuation

constant, αd:

α = αc + αd

=1

2RZ−1

o︸︷︷︸αc

+1

2GZo︸︷︷︸αd

(2.2)

where Zo is the characteristic impedance, R is the distributed line resistance, and G

is the distributed line conductance. In the ideal case, where there is no frequency

dependence of R or G, the loss is a constant factor e−αl across all frequency, where

l is the length of the transmission line. The frequency dependence comes from the

frequency dependence of the terms R and G, the source of which is covered in the

following sections.

Conduction loss

Conduction loss is dominated by the skin effect. In a good conductor, the wave

amplitude is concentrated at the surface and decreases by a factor e−dδ at depth d.

Because of the exponential nature of the decay, the field and associated currents are

concentrated in the first skin depth, δ. The expression for the skin depth is [12]

δ =1√πfµσ

. (2.3)

Resistance is inversely proportional to this depth resulting in the proportionality

αc ∝ R ∝√f. (2.4)

10 CHAPTER 2. BACKGROUND

Dielectric loss

Dielectric loss is due to dissipation of electromagnetic energy in the dielectric in the

form of heat. As opposed to the conduction loss, dielectric loss is directly proportional

to frequency [13]:

αd ∝ f. (2.5)

Therefore, at lower frequencies the conduction loss is dominant, but as frequencies

increase the dielectric loss begins to dominate. The value of the coefficient, αd, is

dependent on the dielectric in the backplane. FR4 is a low-cost glass-reinforced epoxy

that serves as the dielectric in most printed circuit boards (PCBs). MEGTRON6 is

a low-loss alternative that is gaining popularity, but it is more expensive than FR4.

A comparison of their performance is made in §2.1.4.

Total loss

The total loss of the channel is referred to as the insertion loss (IL). This loss is the

sum of the conduction loss, dielectric loss, and frequency-independent loss. Therefore,

a good model for the insertion loss over frequency is (in terms of dB)

IL(f) = a0 + a1

√f + a2f. (2.6)

As an example, the IEEE standard 802.3bj for 100GBASE-KR4 calls for a maximum

insertion loss of (with f in GHz) [4]

IL(f) ≤ 1.5 + 4.6√f + 1.318f. (2.7)


2.1.2 Linear system model

A simplification of the transmission line model is possible by assuming that any

non-ideal effects of the transmission line are negligible (e.g. reflections due to load

mismatch) and expressing the frequency dependent channel loss as a linear transfer

function. Let H(f) be the channel transfer function based on the IL(f) with the

magnitude response

|H(f)| = 10−IL(f)20 (2.8)

and the associated impulse response

h(t) = F−1H(f). (2.9)

Assuming the high-frequency dielectric loss dominates, then

|H(f)| = e−κ|2πf | (2.10)

where κ = ln(10)2π

a2. We can deconstruct H(f) into the symmetric and antisymmetric

components, Hsym(f) and Hasym(f):

Hsym(f) = e−κ|2πf | (2.11)

Hasym(f) = je−κ|2πf | (u(f)− u(−f)) (2.12)

where u(f) is the Heaviside unit step function. The inverse Fourier transforms of the

symmetric component is the even portion of h(t) which can be expressed as

he(t) = F−1Hsym(f) =1

πκ

1(tκ

)2+ 1

(2.13)

ho(t) = F−1Hasym(f) =1

πκ

(tκ

)(tκ

)2+ 1

. (2.14)


For the simple case where the even and odd contributions are equal, the channel

impulse response has the form

h(t) =1

πκ

(tκ

)+ 1(

tκ

)2+ 1

. (2.15)

Because κ ∝ a2, increasing loss results in increased pulse dispersion. An increase

in a2 can occur due to the properties of the dielectric or due to increasing channel

length. Some example channels that exhibit these properties are considered in §2.1.4.

2.1.3 Pulse response

As described in §2.1.2, the channel can be modeled as a linear system characterized

with the impulse response h(t) such that

rx(t) = tx(t) ∗ h(t) (2.16)

where tx(t) is the signal transmitted over the channel and rx(t) is the signal received

after the channel. The pulse response of the channel, p(t), is defined as the received

signal for a transmitted unit pulse

p(t) = rect(t/T ) ∗ h(t) (2.17)

where

rect(t/T ) =

1 |t| ≤ T/2

0 |t| > T/2

(2.18)


is the unit pulse of width T . We are interested in the received signal sampled at the

baud rate, which we represent as

p[n] = p(nT + τ) (2.19)

where τ is chosen such that max(p(t)) = p(τ) resulting in

p[0] = max(p(t)). (2.20)

Notice that τ is essentially the delay of the channel. The term T is also know as the

unit interval (UI) and p[0] is the main cursor. Ideally, p[n] = 0 for all n 6= 0, and

when this is not the case, inter-symbol interference (ISI) is said to occur. The non-

zero terms for n < 0 are referred to as pre-cursors and are components from future

symbols that interfere with the present symbol. Similarly, the non-zero terms with

n > 0 are post-cursors, which are components from past symbols that interfere with

the present symbol. The pulse response for some example channels are considered in

§2.1.4.

2.1.4 Typical channels

The insertion loss versus frequency is plotted in figure 2.1 for channels with the

following properties [2]:

• 0.76 m PCB trace with MEGTRON6 dielectric

• 0.76 m PCB trace with FR4 dielectric

• 1.09 m PCB trace with FR4 dielectric.

For the same dielectric, increasing the channel length increases the loss, as expected.

The channel with the MEGTRON6 dielectric has substantially less attenuation than


0 5 10 15

0

5

10

15

20

25

30

35

40

Frequency (GHz)

InsertionLoss(dB)

0.76m MEG0.76m FR4

1.09m FR4

Figure 2.1: Insertion loss versus fre-quency for various channels [2].

0 200 400 600 800−0.2

0

0.2

0.4

0.6

0.8

1

Time (ps)

Normalized

PulseRespon

se 0.76m MEG0.76m FR41.09m FR4

Figure 2.2: Normalized pulse responseversus time for various channels [2].

the FR4 channel of the same length, demonstrating the improvement in the dielectric

loss coefficient. For each of these channels, the pulse response is simulated with an

input pulse width T = 50 ps corresponding to a 20 GBd symbol rate1. The normalized

output is plotted in figure 2.2. Higher channel loss corresponds with increased pulse

dispersion, as expected. These channels are characteristic of the channel for the target

application and are used as a reference for FFE MATLAB simulations in chapter 4.

2.2 Receive feedforward equalizer (RX-FFE)

The analog RX-FFE is an equalization block that consists of continuous-time analog

delays, adaptable coefficients, and a summing circuit (see figure 2.3) [14]. It comple-

ments the blind equalization of the CTLE with its agility from the adaptability of

its coefficients. The adaptation of the coefficients is simplified as compared to the

TX-FFE because no back channel is required and it is compatible with well known

adaptation schemes such as the LMS algorithm [6].

The challenge is in the implementation of the analog delays. A digital FFE placed

1Bd is the unit symbol for baud with is the unit of symbol rate.

2.2. RECEIVE FEEDFORWARD EQUALIZER (RX-FFE) 15

after the ADC is relatively simplistic in its implementation, but the penalty is in

the required complexity of the ADC. Performing some of the equalization in the

analog front-end relaxes the resolution requirements of the ADC by reducing the

signal dynamic range (see §2.3.1 for a detailed discussion). In addition, a digital

FFE boosts the high-frequency ADC noise which is an issue that is mitigated by

the RX-FFE. For these reasons, it is worthwhile to trade power and complexity in

the RX-FFE for a reduction in power and complexity of the ADC and DSP. Previous

state-of-the-art designs succeeded in demonstrating the innate agility of the RX-FFE’s

equalization capabilities, but they came up short in terms of power, noise, and area

performance necessary for practical systems [3, 11]. The primary focus of this work is

the design of an RX-FFE with low power, low noise, and low area for ADC-resolution

relaxation in ADC-based links. Throughout the remainder of this text we will refer

to the RX-FFE simply as “FFE” and make a distinction only in the other cases.

2.2.1 FFE operation

In this section, we discuss the details of FFE operation with an example. Consider

the 5-tap FFE with ideal delays, coefficients, and summing circuit that is depicted

in figure 2.4. The transmitted pulse is dispersed by the channel before arriving at

vi

vo

D(s)

c1

1 n-1

cn

D(s)

c2

Figure 2.3: Block diagram of an n-tap RX-FFE.


+

D(s)

Channel

D(s)

D(s)

D(s)

c1

c2

c3

c4

c5

p2

p1

p3

p4

p5

po

pi

Figure 2.4: Visualization of the pulseresponse equalization for a 5-tap RX-FFE.

0 1 2 3 4 5 6 7

−1

−0.5

0

0.5

1

Summed PulsesEqualized Pulse

PulseRespon

se

Time (UI)

p1

p2

p3

p4

p5

po

Figure 2.5: Pulse response versus timeat the coefficient outputs and FFEoutput.

the FFE input. This signal is then delayed by the analog delays resulting in a family

of five pulses, all offset in time relative to one another. These pulses are scaled and

summed to create the equalized output pulse.

For this example, we fixed c2 = 1 as the main tap. It is intuitive that c1 can

be sized to remove the first pre-cursor and c3 can be chosen to remove the first

post-cursor, etc. The problem is complicated by the ISI of the summed pulses, but

0 1 2 3 4 5 6 7

0

1

Equalized PulseChannel Pulse

Normalized

PulseRespon

se

Time (UI)

Figure 2.6: Normalized pulse re-sponses versus time at the channel out-put and the at the FFE output.

0 5 10 15

0

1

2

Transmit SignalReceive SignalEqualized Signal

Normalized

Amplitude

Time (UI)

Figure 2.7: Transmitted signal, re-ceived signal, and equalized signal fora random sequence of bits.

2.3. METRICS OF FFE PERFORMANCE 17

adaptation schemes exist that can optimize the coefficients [6]. The optimization

method used for this example is outlined in appendix A. The pulses scaled by these

optimal coefficients are shown in the schematic in figure 2.4 as well as in the plot in

figure 2.5. The first post-cursor is subtracted by the third-tap pulse scaled by the

large negative value of c3. The other coefficients are relatively small and c5 is nearly

zero, suggesting that a 4-tap FFE would be sufficient for this example channel. The

sum of the pulses (i.e. the FFE output) is also shown in figure 2.5. The main-cursor

amplitude is attenuated as compared to the input pulse, which is a consequence of

FFEs with coefficients constrained to less than unity. The improvement is most easily

observed in the normalized pulse responses in figure 2.6. As compared to the channel

pulse, the ISI has been substantially reduced in the equalized pulse response.

Figure 2.7 shows the transmit signal, received signal, and equalized signal with

normalized main-cursor amplitude for a random sequences of bits. The reduction in

signal dynamic range is readily apparent from this plot. A more detailed mathematical

analysis of this concept is covered in §2.3.1.

Figure 2.8 shows the normalized magnitude response of the channel, FFE, and

channel+FFE. The attenuation of the channel is accurately inverted by the transfer

function of the FFE up to the Nyquist frequency, explaining the reduction in ISI in

the time domain (see figure 2.6).

2.3 Metrics of FFE performance

2.3.1 Peak-to-main ratio (PMR)

Pulse amplitude modulation (PAM) is a modulation scheme in which the bits are

encoded in the amplitude of the transmitted pulses. Because the channel is a linear


10−1

100

−40

−30

−20

−10

0

10

20

30

40

Channel+FFEChannelFFE

Normalized

Magnitude(dB)

Normalized Frequency ((UI)−1)

Figure 2.8: Normalized magnitude response versus normalized frequency of the chan-nel, FFE, and channel+FFE.

system, linear superposition holds and the receive signal is

rx[n] =∞∑

k=−∞

bkp[n− k] (2.21)

where bk is the pulse amplitude chosen from a set a symbols. For PAM2 (also referred

to as NRZ), bk ∈ −1, 1 and for PAM4, bk ∈ −1, −1/3, 1/3, 1. In either case,

the maximum possible received signal occurs when the contribution from each of the

pre-cursors and post-cursors adds constructively. This is referred to as the peak signal

and is mathematically represented as

peak =∞∑

k=−∞

|p[k]| . (2.22)

For the pulse in figure 2.9, all of the ISI terms are positive and the peak signal occurs

for an infinite sequence of ones. Because the ISI terms are concentrated in a few UI

around the main cursor, the peak occurs after a short sequence of ones as depicted

in figure 2.10. This excess signal results in an increase in the dynamic range (DR)


−3 −2 −1 0 1 2 3−0.2

0

0.2

0.4

0.6

0.8

1

Time (UI)

PulseRespon

se

p[0]

p[1]

p[2]

p[−1]

p[−2]

Figure 2.9: Normalized pulse responseversus time with the discrete pulse re-sponse terms labeled.

−5 −4 −3 −2 −1 0 1 2 3 4 5−0.5

0

0.5

1

1.5

2

2.5

3

Time (UI)

PulseRespon

ses

p[0]

r[0] =∑

kp[k]Peak

Main

Figure 2.10: The received pulses andreceived signal versus time demon-strating the peak signal due to ISI.

requirement of the ADC in ADC-based links. To see this, first consider the signal-

to-noise ratio (SNR) of the system which is determined by the main cursor, p[0], and

can be written as

SNRsys =(p[0])2

v2n

(2.23)

where v2n is the worst case noise that can be tolerated for a given main cursor ampli-

tude. In an ADC-based system, the ADC must quantize the entire DR of the signal

while attaining the SNR fixed by the main cursor amplitude. So the SNR of the ADC

needs to be

SNRADC =peak2

v2n

(2.24)

and the excess SNR is the ratio of (2.24) to (2.23) resulting in

excess SNR =SNRADC

SNRsys

=

(peak

main cursor

)2

= (PMR)2 (2.25)


where we defined the term peak-to-main ratio (PMR) as

PMR =

∑∞k=−∞ |p[k]||p[0]|

. (2.26)

Notice that for an ideal pulse with no ISI, PMR = 1. The metric in (2.26) is very

important because it represents the relation between the ISI of the channel and the

excess SNR required by the ADC. For example, a reduction in the PMR by 2×

is equivalent to reducing the SNR requirement of the ADC by 1 bit. Therefore, a

powerful metric for the equalization performance of a block is the ratio of the PMR

at the input to the PMR at the output. This ratio represents the signal DR reduction

due to the block and is termed the DR improvement:

DR improvement =PMRin

PMRout

. (2.27)

2.3.2 Eye opening equivalence

The analysis in §2.3.1 results in the metric, PMR, in (2.26) that characterizes an

equalizer’s performance in regards to the reduction of the DR of the signal. Another

typical objective of an equalizer is to reduce the ISI in order to maximize the eye

opening. In this section, we will show that minimizing the PMR is equivalent to

maximizing the eye opening. To prove this, we will find an expression for the eye

opening in terms of the pulse response and show that maximizing this expression is

equivalent to minimizing the PMR.

The eye opening, in the absence of noise, is the difference between the minimum

signal for a positive symbol and the maximum of a signal for a negative symbol. The

first term is equivalent to finding the minimum of (2.21) for b0 = 1 and n = 0 which


can be expressed as

min (rx[0])|b0=1 = min

(∞∑

k=−∞

bkp[−k]

)∣∣∣∣∣b0=1

= p[0]−∑k 6=0

|p[k]| . (2.28)

Similarly, to maximize the signal for a negative symbol

max (rx[0])|b0=−1 = −p[0] +∑k 6=0

|p[k]| . (2.29)

Taking the difference between (2.28) and (2.29) we obtain an expression for the eye

opening as

eye opening = min (rx[0])|b0=1 − max (rx[0])|b0=−1

= 2p[0]

(2−

∑∞k=−∞ |p[k]|p[0]

)= 2p[0] (2− PMR) . (2.30)

We can observe a few things directly from the eye opening expression in (2.30):

1. It is proportional to the main cursor amplitude, p[0]. This is intuitive because,

in a linear system, scaling the signal would proportionally scale the eye opening.

2. It is general in that it will give a negative number for a closed eye, and this

number is more negative for higher ISI.

3. It is closely related to the PMR in that it is positive (i.e. the eye is open) for

PMR < 2 and negative (i.e. the eye is closed) otherwise.

4. Its magnitude is maximized when the PMR is minimized. For the ideal case,

PMR = 1 and the eye opening is 2p[0], as expected.


Therefore, although it is the primary objective of this work to minimize ADC DR

requirements (i.e. minimize PMR), this is equivalent to maximizing the eye opening.

As a result, the FFE serves a secondary benefit in that the equalization effort in DSP

will be maximally reduced for an optimal FFE design.

Chapter 3

Analog delays for FFEs

3.1 Delay approximations

For an analog FFE to be viable for high-speed link receivers, it must be implemented

with low power, low noise, and low area. The greatest obstacle to achieving these

goals is in the design of the analog delay. As compared to a TX-FFE or digital FFE,

where the delays can be implemented in the digital domain, the analog RX-FFE

requires an analog delay implementation. In this section, we consider various delays

as approximations to the ideal delay. First, we introduce the ideal delay and discuss

its properties.

3.1.1 Ideal delay

The objective of an analog delay, at the most basic level, is to implement a block that

takes a time-varying input voltage, v(t), and outputs a delayed version, v(t− τ). To

understand how this might be accomplished, it is useful to transform to the frequency

23

24 CHAPTER 3. ANALOG DELAYS FOR FFES

domain with the Laplace transform resulting in [15]

Lv(t− τ) = e−sτV (s) = D(s)V (s) (3.1)

where V (s) = Lv(t) and

D(s) = e−sτ (3.2)

is the transfer function of the ideal delay. Decomposing this into magnitude and

phase we see that

|D(jω)| = 1 (3.3)

6 D(jω) = −τω (3.4)

τg = − d

dω6 D(jω) = τ. (3.5)

This transfer function can be exactly realized with a lossless transmission line, but

the area required would be excessive. For example, achieving a delay of 25 ps with a

conductor in silicon dioxide (εr = 3.9) requires a length of 1.92 mm which is too large

for a typical integrated circuit application. In addition, any practical implementation

will suffer from the high-frequency loss mechanisms due to skin depth and dielectric

loss. These topics are discussed in more detail in §2.1 with respect to channel charac-

teristics, but the essential concept is the same here. For these reasons, we investigate

lumped-circuit approximations to the ideal delay in the following sections.

3.1.2 Lumped delay line

Taking inspiration from the lumped-LC model of an infinitesimal transmission-line

segment, one possible delay implementation is shown in figure 3.1. For equal values

of all the capacitors and all the inductors, this circuit represents an N -order lumped

3.1. DELAY APPROXIMATIONS 25

approximation to a lossless transmission line. The delay per section is [16]

τ1 =√LC (3.6)

with the total delay

τtot = N√LC. (3.7)

The cutoff frequency is [16]

ωh =2√LC

(3.8)

above which the lumped line’s characteristics are degraded substantially. Since the

goal is to obtain a specific delay time, it is illustrative to substitute (3.7) into (3.8)

resulting in

ωh =2N

τtot

(3.9)

which shows that the accuracy of the approximation increases with the order, N .

Unfortunately, increasing N requires increasing the number of inductors resulting in

a large area penalty. Furthermore, the finite Q factor of the inductors will introduce

loss into the delay.

The magnitude and phase response of the lumped-LC delay with τ = 25 ps are

shown in figure 3.2 for orders 1, 2, and 3. The bandwidth of the delay response

(i.e. the range over which the magnitude and phase response match that of the ideal

delay) increases proportional to N as suggested by (3.9). The discrepancy in the

L1 LN

C1 CNvi vo

Figure 3.1: Schematic diagram of an N -order lumped-LC approximation of a losslesstransmission line.


10−1

100

101

102

−20

−10

0

10

10−1

100

101

102

−300

−200

−100

0

100

N = 1N = 2N = 3Ideal

Frequency (GHz)

Frequency (GHz)

Magnitude(dB)

Phase(D

eg)

Figure 3.2: Magnitude (top) andphase (bottom) versus frequency forLC delays of orders 1, 2, and 3.

10−1

100

101

102

0

10

20

30

40

50


Frequency (GHz)

GroupDelay

(ps)

Figure 3.3: Group delay versus fre-quency for LC delays of orders 1, 2,and 3.

delay behavior is best observed in the group delay plot shown in figure 3.3. While

the agreement with the ideal group delay curve does improve with N , there are errors

below the Nyquist frequency even for N = 3 which limits the effectiveness of this

structure as a delay.

3.1.3 Bessel delays

In terms of group delay, Bessel delays are an even better delay approximation than the

lumped-LC delay. The group delay of the Bessel filter is maximally flat by design.

That is, for an N -order Bessel delay, the first N − 1 terms in the Taylor series

expansion of the group delay at ω = 0 are zero [17]. In this way, this delay type

optimally emulates the linear phase of an ideal delay. For the first three orders, the


Bessel delay transfer functions for a delay of τ are [17]

Dbs1(s) =1

1 + (sτ)(3.10)

Dbs2(s) =3

3 + 3(sτ) + (sτ)2(3.11)

Dbs3(s) =15

15 + 15(sτ) + 6(sτ)2 + (sτ)3. (3.12)

The first-order case is simply a first-order pole. For higher orders, complex poles are

required and the response can be realized with a similar LC circuit as in figure 3.1

incurring a similar inductor-area penalty. Alternatively, active circuit realizations are

complex and result in high-power and high-noise penalties.

The magnitude and phase response of the Bessel delay with τ = 25 ps are shown

in figure 3.4 for orders 1, 2, and 3. The magnitude response begins to roll off at a low

frequency, which is a consequence of optimizing the pole locations for the maximally-

flat group delay while ignoring the magnitude response. The benefits of the Bessel

delay can be best observed in the group delay plot shown in figure 3.5. The group

delay matches the constant group delay of the ideal case up to a high frequency.

10−1

100

101

102

−20

−10

0

10

10−1

100

101

102

−300

−200

−100

0

100


Frequency (GHz)

Frequency (GHz)

Magnitude(dB)

Phase(D

eg)

Figure 3.4: Magnitude (top) andphase (bottom) versus frequency forBessel delays of orders 1, 2, and 3.

10−1

100

101

102

0

5

10

15

20

25

30


Frequency (GHz)

GroupDelay

(ps)

Figure 3.5: Group delay versus fre-quency for Bessel delays of orders 1,2, and 3.


It is possible to extend the bandwidth of the group delay by using an all-pole

equal-ripple delay [17]. This delay is similar in concept to a Chebyshev or elliptic

response where the ripple is bounded in the magnitude response, but in this case

the bound is on the ripple of the group delay. The improvement of the group delay

bandwidth is only marginal as compared to the Bessel delay so this delay type is not

given further consideration here.

3.1.4 Pade delays

Although Bessel delays have the maximally flat group delay for an all-pole transfer

function, the additional degrees of freedom introduced by adding zeros can result

in an improved approximation. One method to approximate the ideal delay with

poles and zeros is to use the Pade approximant. The Pade approximant gives the

best rational function approximation of a desired function in terms of matching the

highest possible number of Taylor series coefficients [18]. The Taylor expansion of the

ideal delay in (3.2) is

D(s) = e−sτ = 1− (sτ) +1

2(sτ)2 − 1

6(sτ)3 +

1

24(sτ)4 − 1

120(sτ)5 +O

((sτ)6

).

The derivation of the Pade approximant for the function ex is covered in appendix C.

Substituting x → (−sτ) into the expressions in (C.8), (C.9), and (C.10) results in

the Pade delays

Dpd1(s) =1− 1

2(sτ)

1 + 12(sτ)

(3.13)

Dpd2(s) =1− 1

2(sτ) + 1

12(sτ)2

1 + 12(sτ) + 1

12(sτ)2

(3.14)

Dpd3(s) =1− 1

2(sτ) + 1

10(sτ)2 − 1

120(sτ)3

1 + 12(sτ) + 1

10(sτ)2 + 1

120(sτ)3

. (3.15)


To see how these delays are approximations of the ideal delay we expand their Taylor

series by the method of polynomial long division obtaining

Dpd1(s) = 1− (sτ) +1

2(sτ)2 − 1

4(sτ)3 +O

((sτ)4

)(3.16)

Dpd2(s) = 1− (sτ) +1

2(sτ)2 − 1

6(sτ)3 +

1

24(sτ)4 − 1

144(sτ)5 +O

((sτ)6

)(3.17)

where the first non-matching coefficient is boxed for each case.

The magnitude and phase response of Pade delays with τ = 25 ps are shown in

figure 3.6 for orders 1, 2, and 3. The left-half plane poles exactly match the right-half

plane zeros, canceling in magnitude and summing in phase. Therefore the magnitude

response, in the absence of any additional parasitic poles, is ideal across all frequencies.

The phase response is also in good agreement with that of the ideal delay which can

be best seen in the group delay plot in figure 3.7. Due to the additional degrees of

freedom introduced by the zeros, the group delay matches to an even higher frequency

(for a given order) than that of the Bessel delay. In particular, the bandwidth of the

constant group delay of Dpd1(s) is approximately 2× greater than that of Dbs1(s)

10−1

100

101

102

−20

−10

0

10

10−1

100

101

102

−300

−200

−100

0

100


Frequency (GHz)

Frequency (GHz)

Magnitude(dB)

Phase(D

eg)

Figure 3.6: Magnitude (top) andphase (bottom) versus frequency forPade delays of orders 1, 2, and 3.

10−1

100

101

102

0

5

10

15

20

25

30


Frequency (GHz)

GroupDelay

(ps)

Figure 3.7: Group delay versus fre-quency for Pade delays of orders 1, 2,and 3.


and nearly equal to Dbs2(s). Because the first-order pole and zero of Dpd1(s) can be

realized more simplistically in practice than the complex pole pair in Dbs2(s), it has

been a popular choice as an analog delay in many previous designs [3, 19, 20].

3.2 Equivalence of first-order delays

In this section, we introduce a set of first-order delays that are equivalent for FFEs

in terms of realizable transfer functions. In §3.2.1, we outline the theory behind the

equivalence. In §3.2.2, we investigate the practical consequences of this theorem in

terms of coefficient spread. Finally, in §3.2.3 we present an example to demonstrate

the strengths and limitations of the theory for a practical case.

3.2.1 Theorem

Any transfer function obtainable by an FFE with first-order Pade delays can be

exactly replicated by an FFE with delays consisting of a first-order pole and zero with

arbitrary offset, α. This result is proven in appendix B and the result is presented

here. For any feasible transfer function of an N -tap FFE with first-order Pade delays

having the form

Hpd1(s) =N−1∑n=0

cnDnpd1(s) (3.18)

we can create an equivalent FFE with delays

Dα(s) =1− 1

2αsτ

1 + 12sτ

(3.19)

such that

Hα(s) =N−1∑n=0

cαnDnα(s) = Hpd1(s). (3.20)

3.2. EQUIVALENCE OF FIRST-ORDER DELAYS 31

The coefficients cαn can be obtained from the coeffients cn by a matrix multiplication.

If we define the vectors

c1 =[c1 c2 · · · cN

]T(3.21)

cα =[cα1 cα2 · · · cαN

]T, (3.22)

then, by the definition of the matrix Mα in (B.12),

cα = Mαc1. (3.23)

This result is leveraged for the delay implementation in the proof-of-concept FFE

(see §5.3.1).

3.2.2 Coefficient spread

The coefficient spread of a set of coefficients is defined as the ratio of the maximum

coefficient magnitude to the minimum. It is an important metric because it sets the

requirement on the bit resolution of the coefficient’s circuit realization. Therefore,

10−3

10−2

10−1

100

101

0

10

20

30

40

50

N=2N=3

N=4

N=5

Pole/Zero Ratio α

‖Mα‖ 2

Figure 3.8: The spectral norm versus the pole and zero ratio of the delay, α, for FFEsof orders 2, 3, 4, and 5.


it is necessary to understand the effect of the transformation Mα on the coefficient

spread. Unfortunately, this transformation is a complicated process with respect to

the coefficient spread and no closed-form expression exists. Instead, what we can

easily compute is the spectral norm, ‖Mα‖2, which gives us the bound

‖cα‖2

‖c1‖2

≤ ‖Mα‖2. (3.24)

Note that this bound is not equal to the coefficient spread, but it is related and serves

as a useful proxy to enable a closed-form analysis. This bound is plotted in figure

3.8 versus α for FFEs of order 2, 3, 4, and 5. For the limit α→ 0, the spectral norm

approaches the finite value ‖M0‖2. In the next section, we investigate an example to

illustrate the coefficient spread penalty incurred in a practical case and compare that

to the bound in figure 3.8.

3.2.3 Example transformation

Consider three different 5-tap FFEs with first-order delays with pole and zero ratio

α = 0, α = 13, and α = 1 with τ = 25 ps for all cases. The magnitude response and

phase response of these delays are substantially different as shown in figure 3.9. In

fact, the group delay is not even equal for these delays (see figure 3.10)1, but the

analysis in the previous section shows that these delays can be used to construct

FFEs with equivalent transfer functions.

To show an example of this, consider the channel pulse for the 1.09 m FR4 trace

in figure 2.2. For the α = 1 case, the optimum coefficients to maximize the DR

1For this family of delays, τg = 12 (1 + α)τ . The fact that such a wide range of group delays can

achieve the same equalization demonstrates the agility of the FFE to absorb changes in its delays.


improvement as defined in (2.27) are (see appendix A.3 for the optimization method)

c1 =[0.286 1 −0.857 −0.857 0.571

]T. (3.25)

Using the transformation matrix we find the coefficients for the equivalent transfer

functions with α = 13

and α = 0 to be

c13

= M13c1 =

[−0.286 1.393 2.893 −6.750 2.893

]T(3.26)

c0 = M0c1 =[−0.143 −4.286 20.57 −25.14 9.143

]T. (3.27)

The magnitude and phase for these FFEs are plotted in figure 3.11 and they are

identical, as expected. The norm of the α = 1 case is ‖c1‖2 = 1.696 and the increases

in the norms due to the transformations are

‖c13‖2

‖c1‖2

= 4.728 ≤ 10.049 =∥∥∥M1

3

∥∥∥2

(3.28)

‖c0‖2

‖c1‖2

= 20.06 ≤ 46.042 = ‖M0‖2 (3.29)

10−1

100

101

102

−20

−10

0

10

10−1

100

101

102

−200

−100

0

100

Frequency (GHz)

Frequency (GHz)

Magnitude(dB)

Phase(D

eg)

α = 0α = 1/3

α = 1

Figure 3.9: Magnitude and phase ver-sus frequency for first-order delayswith α = 0, α = 1/3, and α = 1.

10−1

100

101

102

0

10

20

30

40

50

Frequency (GHz)

GroupDelay

(ps)

α = 0

α = 1/3

α = 1

Figure 3.10: Group delay versus fre-quency for first-order delays with α =0, α = 1/3, and α = 1.


10−1

100

101

102

103

−20

−10

0

10

10−1

100

101

102

103

−300

−200

−100

0

100

Frequency (GHz)

Frequency (GHz)

Magnitude(dB)

Phase(D

eg)

α = 0

α = 1

α = 1/3

Figure 3.11: FFE magnitude (top) andphase (bottom) versus frequency forα = 0, α = 1/3, and α = 1 with idealcoefficient transformations.

10−1

100

101

102

103

−20

−10

0

10

10−1

100

101

102

103

−300

−200

−100

0

100

Frequency (GHz)

Frequency (GHz)

Magnitude(dB)

Phase(D

eg)

α = 0

α = 1

α = 1/3

Figure 3.12: FFE magnitude (top) andphase (bottom) versus frequency forα = 0, α = 1/3, and α = 1 with prac-tical coefficient transformations.

which are more than 2× less than the bound in each case. The increases in the

coefficient spread for these cases are

coefficient spread (α = 13)

coefficient spread (α = 1)= 6.75 (3.30)

coefficient spread (α = 0)

coefficient spread (α = 1)= 50.2 (3.31)

where the coefficient spread of the α = 1 case is

coefficient spread (α = 1) =max(|c1|)min(|c1|)

= 3.5. (3.32)

This demonstrates that, while the coefficient spread is not exactly the ratio of the

norms, these two metrics are closely related.

The increase in the coefficient spread for both cases is significant, but it can be

reduced if a discrepancy in the transfer functions can be tolerated. For example, it is

easy to imagine for the α = 0 case that the smallest coefficient could be set to zero

with very little impact on the overall transfer function. With this minor change, the


0 100 200 300 400−0.2

0

0.2

0.4

0.6

0.8

1

Time (ps)

PulseRespon

se

After Channel

After FFE

α = 0

α = 1/3

α = 1

Figure 3.13: Pulse responses versus time for the channel pulse and equalized pulsesafter 5-tap FFEs with α = 0, α = 1/3, and α = 1 optimized with 5-bit coefficientresolution.

new coefficient spread is just 5.87 which represents an increase of just 1.68×. This

is substantially less than original solution with only a minor increase in mismatch

between the transfer functions.

To better understand these trade-offs, we optimize each FFE independently with

a fixed 5-bit coefficient resolution (see appendix A.2 for the optimization method).

The fixed coefficient resolution essentially puts an upper bound on coefficient spread

for all three cases. The resultant transfer functions are plotted in figure 3.12. Al-

though there is significant discrepancy in the high-frequency magnitude and phase,

there is insignificant signal power concentrated at these frequencies, so the impact on

equalization performance is limited. This is a fact that is best illustrated in the time

domain (see figure 3.13) where the reduction of the ISI and the associated PMR are

essentially equivalent for each case.

The conclusion is that the dependence of the FFE performance on α is complex

and needs to be carefully considered in the FFE design. The FFE performance

dependence on α is given additional consideration in §4.4.3.


3.3 Summary

In this chapter, we considered various analog delay approximations for FFEs. We

determined that the first-order Pade delay provides a good design choice in terms of

performance versus the implementation complexity. In addition, we showed that the

performance of all first-order delays in FFEs is equivalent in terms of the achievable

FFE transfer functions. The drawback was shown to be the increase in the coefficient

spread. For the example fifth-order FFE with α = 1/3, the equalization performance

degradation as compared to the Pade delay was demonstrated to be marginal. We

leverage this in the implementation of the single-path Pade-inspired delay introduced

in §5.3.1 and utilized in the proof-of-concept FFE design.

Chapter 4

Analog FFEs in high-speed links

4.1 FFE design parameters

The design space of generic FFEs is high dimensional and includes the following

parameters.

1. Delay type. The impact of delay type (i.e. ideal, Bessel, and Pade) on FFE

performance is investigated in §4.3. The ideal delay outperforms all other de-

lays, but has no exact lumped-circuit realization. The Pade delays outperform

all Bessel delays in the example simulations and the third-order Pade delay

approaches the performance of the ideal delay. At the optimal delay time, the

first-order Pade delay performs close to the optimum which, along with the

potential for simplicity in circuit realization, makes it a suitable design choice.

The first-order Pade delay is therefore chosen for the simulations of all of the

following FFE parameters.

2. Delay time. The impact of delay time on FFE performance is investigated

in §4.4. In §4.4.1, a mathematical analysis shows that the optimal delay time

should depend on the channel more so than the system baud rate. This is in

37

38 CHAPTER 4. ANALOG FFES IN HIGH-SPEED LINKS

opposition to the commonly held view that the delay time should be related to

the UI1. This analysis is supported by simulations in §4.4.2 which characterize

the optimal delay time for first-order Pade delays for various channels. In

§4.4.3, the FFE performance is simulated versus the pole and zero ratio, α, for

the cases of constant τ and constant group delay, τg. A reasonable design choice

is τ = 25 ps and is the chosen parameter unless otherwise stated.

3. Number of taps. The impact of number of taps, n, on FFE performance is in-

vestigated in §4.5. Because the power is directly proportional to this parameter,

it is a critical choice in the FFE design. There are diminishing returns on DR

improvement for n > 3, but sensitivity to delay time decreases for additional

taps. For the proof-of-concept design in this work, we choose a 5-tap FFE to

maximize the equalization performance, but the remainder of the simulations in

this chapter compare 3-tap, 4-tap, and 5-tap FFEs to demonstrate the impact

on parameter sensitivity of more aggressive designs.

4. Parasitic pole frequency. The impact of the parasitic pole frequency on

FFE performance is investigated in §4.6. Each node in the FFE implementation

introduces an unwanted parasitic pole, limiting the equalization performance.

Increasing the frequency of this pole improves the performance, but comes with

a cost of power (i.e. increased drive strength) or area (i.e. peaking inductors).

Therefore, this is a critical design parameter and its impact on FFE performance

must be well understood. Based on the simulations, a reasonable design target

is fp = 20 GHz with substantial performance degradation for fp < 10 GHz.

5. Coefficient resolution. The impact of coefficient resolution on FFE perfor-

mance is investigated in §4.7. Excessive coefficient resolution complicates the

1The optimal delay time is related to the UI when the objective is to completely equalize thechannel as it is in [3, 11]. When only partial equalization is the objective, the channel characteristicsdominate and dictate the optimal delay time.

4.2. SIMULATION METHODOLOGY 39

design and directly increases the parasitic capacitance at the delay output node.

Insufficient resolution results in suboptimal equalization performance. There

are diminishing returns for bits ≥ 5 suggesting that 5 bits (plus sign) is a rea-

sonable design choice. An aggressive design may pursue even lower resolution

coefficients to push the power and bandwidth performance.

6. Main cursor attenuation. The impact of main cursor attenuation on FFE

performance is investigated in §4.8. Attenuation is an unavoidable consequence

of FFEs with unity-gain limited coefficients and, as a parameter, it trades with

equalization performance and is a function of all the aforementioned design pa-

rameters. In particular, it is a parameter in the coefficient optimization and a

strong function of the channel characteristics as well as the target DR improve-

ment. The trade-off between DR improvement and main cursor attenuation is

the focus of §4.8. The performance degrades rapidly when the limit on attenu-

ation is greater than 12. For this reason, the main cursor attenuation is limited

to 12

in all other simulations (see §4.2 for details).

The design space is too complicated to cover completely. The simulations and plots

in this chapter represent a succinct subset of the possibilities to highlight the general

trade-offs and guide the design decisions for this work.

4.2 Simulation methodology

For each simulation, the optimal coefficents, c, are determined using the methods

detailed in appendix A. To summarize the methodology, (unless otherwise stated)

the coefficient’s magnitudes are bounded by unity, the main cursor tap is fixed at

unity, the main cursor is tap number 2, and the main cursor attenuation at the FFE

output is limited to 12. The pulse at the input of the FFE is defined as pi and at the


output as po. As mathematical expressions, these constraints can be stated as

|ck| ≤ 1 for all k (4.1)

c2 = 1 (4.2)

max(po) ≥1

2. (4.3)

For the input pulse, pi, we choose the pulse response of the 1.09 m FR4 PCB trace

from [2] plotted in figure 4.2. Due to the long trace, there is significant dispersion for

this channel, and the PMR for this pulse is

PMRch = 6.0. (4.4)

The minimum PMR for the FFE output pulse for which the constraints are satisfied

is defined as PMReq which is found with the methods in appendix A. Using this term,

the performance metric defined in (2.27) is

DR Improvement =PMRch

PMReq

. (4.5)

Because PMReq is the minimum and PMRch is a constant, the DR improvement is

maximized under the constraints. For each point on the plots in this chapter, the

optimal FFE coefficients are found to determine the optimum DR improvement for

the given FFE parameters.

4.3 Delay type

For the 3-tap FFE in figure 4.1, the simulation methodology in §4.2 is repeated to

simulate for delay times from 10 ps to 100 ps for ideal, Bessel, and Pade delay types

of orders 1, 2, and 3. The result is plotted in figure 4.2.

4.4. DELAY TIME 41

pi

po

delay

c1

τ 1 c3

typedelay

τ

type

Figure 4.1: The 3-tap FFE block dia-gram with variable delay type and de-lay time for the MATLAB simulationof the optimal delay time.

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

Delay Time (ps)

DR

Improvement

BesselPadeIdeal

Figure 4.2: DR improvement versusdelay time for a 3-tap FFE with Besseland Pade delay types of order 1 (solid),2 (dashed), and 3 (dotted).

For this simulation, the ideal delay time is approximately 30 ps, independent of

delay type. The first-order Pade delays outperform all Bessel delays and the ideal

delay puts an upper bound on the performance. For the optimal delay time of 30 ps,

the performance of the first-order Pade delays is within 5% of the ideal delays. This

fact, along with the potential for simplicity in circuit realization, makes it a good

design choice. The first-order Pade delay is therefore chosen for the simulations in

the following sections.

4.4 Delay time

4.4.1 Mathematical analysis

In §2.3.1 we defined the pulse response in (2.17) which we repeat here for convenience:

p(t) = rect(t/T ) ∗ h(t). (4.6)


From this equation, we see that the pulse response depends on both the impulse

response of the channel, h(t), and the transmitted pulse width, T . There are two

interesting limiting cases to consider: low baud rate and high baud rate. For the low

baud rate case, the transmitted pulse width, T , is long compared to the length of the

impulse response. Therefore, a reasonable approximation is that h(t)→ δ(t) and

pl(t) ≈ rect(t/T ) ∗ δ(t)

= rect(t/T ). (4.7)

This is intuitive in that, for a low baud rate, a transmitted pulse passes unimpeded

through the channel. The second limiting case is for a high baud rate where the

pulse width, T , is sufficiently narrow such that we can make the approximation that

rect(t/T )→ δ(t) and

ph(t) ≈ δ(t) ∗ h(t)

= h(t). (4.8)

A consequence is that, for a sufficiently high baud rate, the pulse response approaches

the impulse response and is independent of the baud rate. This is supported by visual

inspection of the pulse responses in figure 2.2. A broadening of the transmit pulse

width will result in modest changes to the pulse response due to the substantial

dispersion of the channel relative to the baud samples. Therefore, for high baud rate

systems, such as the 20 GBd target for this work, the performance of the FFE in

terms of the delay time is determined by the channel characteristics more so than

the baud rate. In §4.4.2, we support this analysis with MATLAB simulations that

demonstrate that the optimum delay time is a function of the channel characteristics.

Delay characteristics other than the group delay impact the performance and

4.4. DELAY TIME 43

must be considered. The best example of this is a consequence of the first-order delay

equivalence theorem in §3.2. Because FFEs with delays of the form

Dα(s) =1− 1

2αsτ

1 + 12sτ

(4.9)

can achieve identical transfer functions, their achievable DR improvement is indistin-

guishable, but the group delay of Dα(s) is

τg =1

2(1 + α)τ (4.10)

which is an affine function in α. For α = 1 (i.e. the first-order Pade delay), τg = τ

but for all other α the τ in the delay definition does not equal the group delay. As a

result, the ideal delay time is not only dependent on the channel characteristics, but

also on the delay type. Therefore, a careful study of the FFE performance should be

made for the anticipated channel characteristics and the specific delay type chosen.

In §4.4.3, the DR improvement versus α is simulated for the cases with constant τ

and constant τg.

4.4.2 Channel characteristic dependence

In §4.4.1, a mathematical analysis shows that the optimal delay time should depend

on the channel more so than the system baud rate. The actual value of the optimal

delay time for a given channel is difficult to determine mathematically. Instead, it is

easiest to simulate the FFE performance with realistic channel pulses.

The block diagram for the 2-tap FFE used in this simulation is shown in figure

4.3. The first tap is chosen as the main cursor so that the second coefficient, c2, is

adapted to minimize the post-cursor ISI (i.e. maximize DR improvement). The delay

is a first-order Pade delay with the delay time, τ , swept from 10 ps to 100 ps. In


pi

po

c2

τ

1

Dpd1(s)

Figure 4.3: Block diagram of a 2-tapFFE with first-order Pade delays; vari-able delay time, τ ; and optimal coeffi-cient, c2.

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

0.76 m Meg0.76 m FR41.09 m FR4

Delay Time (ps)

DR

Improvement

Figure 4.4: DR improvement versusdelay time for the 2-tap FFE in figure4.3 for various channel pulse inputs.

essence, this simulation is subtracting a Pade delayed pulse from the original pulse

to maximally cancel the post-cursor ISI. The simplicity helps to isolate the impact of

the channel characteristics on the optimal delay time without the complications from

the other FFE parameters. For the channels, we use those introduced §2.1.4 with the

pulse responses in figure 2.2 [2]:

• 0.76 m PCB trace with MEGTRON6 dielectric

• 0.76 m PCB trace with FR4 dielectric

• 1.09 m PCB trace with FR4 dielectric.

The optimum DR improvement versus delay time for these channel pulses is plotted in

figure 4.4. Higher channel attenuation results in a larger optimal delay time because

the subtracted pulse is most effective when it is delayed to coincide with the additional

post-cursor ISI as compared to the other channels. In addition, the optimum DR

improvement increases with the channel attenuation because there is more ISI and,

therefore, more equalization possible. As a result, the FFE is most effective in systems

with high channel attenuation.

4.4. DELAY TIME 45

The conclusion from these simulations is that the optimal delay time is closely

related to the channel characteristics. It is shown in §4.5 that this can be somewhat

mitigated by adding FFE taps, but an accurate choice or adaptability of the delay

time is still a necessity for optimum FFE performance.

4.4.3 First-order delays

In this section, we investigate the impact of delay time on FFE performance with

respect to the first-order pole and zero offset, α. For the first simulation, τ = 25 ps

so that the group delay varies with α as

τg =1

2(1 + α)25 ps. (4.11)

Using the methodology in §4.2 the optimum DR improvement is found for FFEs of

order 3, 4, and 5 with delay pole and zero offset in the range 0 ≤ α ≤ 1. The result

is plotted in figure 4.5. While the performance is relatively consistent versus α for

n = 4 and n = 5, the n = 3 case shows significant degradation for small α.

For the second simulation, τg = 25 ps, which is achieved by changing τ as a

function of α as defined in the expression

τ =25 ps

12(1 + α)

. (4.12)

This is repeated for each α and the result is plotted in figure 4.6. Using the method-

ology in §4.2 the optimum DR improvement is found for FFEs with order 3, 4, and

5 with delay pole and zero offset in the range 0 ≤ α ≤ 1. The result is plotted in

figure 4.6. The n = 3 case from this simulation outperforms the previous simulation,

but it again substantially under-performs the n = 4 and n = 5 cases. Also for this

simulation, there is a greater dependence on α for the n = 4 and n = 5 cases.


0 0.2 0.4 0.6 0.8 11

1.5

2

2.5

3

3.5

4

n=3n=4n=5

Pole and Zero Ratio, α

DR

Improvement

uation

Figure 4.5: DR improvement versuspole and zero ratio, α, for τ = 25 ps.

0 0.2 0.4 0.6 0.8 11

1.5

2

2.5

3

3.5

4

n=3n=4n=5

Pole and Zero Ratio, α

DR

Improvement

uation

Figure 4.6: DR improvement versuspole and zero ratio, α, for τg = 25 ps.

In both simulations, the optimum case occurs for α = 1, but the 5-tap FFE

limits the performance degradation for α < 1. Over a wide range of α, the 5-tap FFE

outperforms the 4-tap FFE with α = 1. Therefore, an FFE using delays implemented

with α < 1 can recover the performance by adding a tap. This technique is used in

the proof-of-concept design in this work where α = 13

delays are employed in a 5-tap

FFE.

4.5 Number of taps

The number of taps in the FFE, n, determines the number of delays, coefficients,

and inputs in the summing circuit. Therefore, n is directly proportional to the FFE

power and must be carefully considered in the design process. Figure 4.7 shows the

block diagram of an n-tap FFE with first-order Pade delays and variable delay time,

τ . To investigate the impact of n on FFE performance, the simulation methodology

in §4.2 is repeated for τ from 10 ps to 100 ps for 3-tap, 4-tap, and 5-tap FFEs. The

result is plotted in figure 4.8.

An (n + 1)-tap FFE can realize any transfer function that an n-tap FFE can

4.6. PARASITIC POLE FREQUENCY 47

pi

po

Dpd1(s)

c1

1 n-1

τ

1 cn

τ

Dpd1(s)

Figure 4.7: Block diagram of an n-tapFFE with variable delay time, τ , andoptimal coefficients, c1 to cn.

0 20 40 60 80 1001

1.5

2

2.5

3

3.5

4

n = 3n = 4n = 5

Delay Time (ps)

DR

Improvement

Figure 4.8: DR improvement versusdelay time for 3-tap, 4-tap, and 5-tapFFEs with first-order Pade delays.

by simply setting cn+1 = 0. Therefore, performance improves with the number of

taps, which is unsurprisingly the case for this plot. There is not a significant gain

in performance for n > 3, though. The benefit of increasing n is that the sensitivity

to delay time decreases. For n = 3, a very narrow range of delay times can achieve

optimum DR improvement, but the range for which the n > 3 cases can meet or exceed

that optimum is much wider. This insensitivity can be explained by considering the

n = 5 case. With c4 = c5 = 0, the FFE is reduced to a 3-tap FFE with delay time

τ , but for c2 = c4 = 0, the FFE reduces to a 3-tap FFE with delay time 2τ . By

interpolating between these cases, an effective delay time in the range of τ to 2τ can

be obtained.

4.6 Parasitic pole frequency

Low-frequency parasitic poles can limit the equalization performance, but increasing

the pole frequency comes at a cost of power (i.e. increased drive strength) or area

(i.e. peaking inductors). Therefore, this is a critical design parameter and its impact

on FFE performance must be well understood. Figure 4.9 shows the block diagram


pi

po

c1

1 n-1

τ=25ps

1 cn

τ=25ps

fp fp fp

fpfpfp

fp

Dpd1(s) Dpd1(s)

Figure 4.9: Block diagram of an n-tapFFE with variable parasitic pole fre-quency, fp, at each node.

100

101

102

1

1.5

2

2.5

3

3.5

4

n = 3n = 4n = 5

Bandwidth, fp (GHz)

DR

Improvement

Figure 4.10: DR improvement versusfp for 3-tap, 4-tap, and 5-tap FFEswith first-order Pade delays.

of an n-tap FFE with first-order Pade delays with constant τ = 25 ps and variable

parasitic pole frequency, fp, at each node. The simulation methodology in §4.2 is

repeated for fp from 1 GHz to 100 GHz and for 3-tap, 4-tap, and 5-tap FFEs. The

result is plotted in figure 4.10. The FFE performance increases monotonically with fp,

as expect. For fp < 10 GHz, the performance falls off substantially. Based on these

results, a reasonable design target is fp = 20 GHz, above which there are diminishing

performance benefits.

4.7 Coefficient resolution

Excessive coefficient resolution complicates the design and directly increases the par-

asitic capacitance at the delay output. Insufficient resolution results in suboptimal

FFE performance. The block diagram for this simulation is similar to the n-tap FFE

in figure 4.7, but with constant τ = 25 ps and variable coefficient resolution in bits.

4.7. COEFFICIENT RESOLUTION 49

1 2 3 4 51

1.5

2

2.5

3

3.5

4

n = 3n = 4n = 5

Coefficient Resolution (bits)

DR

Improvement

Figure 4.11: DR improvement versuscoefficient resolution (plus sign bit)for 3-tap, 4-tap, and 5-tap FFEs withfirst-order Pade delays.

0 0.2 0.4 0.6 0.8 11

1.5

2

2.5

3

3.5

4

n = 3n = 4n = 5

Main Cursor Amplitude

DR

Improvement

Figure 4.12: DR improvement versusmain cursor amplitude for 3-tap, 4-tap, and 5-tap FFEs with first-orderPade delays.

The least-significant bit of each coefficient as a function of the bits is

LSB =1

2bits(4.13)

and the optimum coefficients are found through a brute-force sweep as outlined in

appendix A.2. The simulation methodology in §4.2 is repeated for coefficient resolu-

tions from 1 to 5 bits (plus sign bit) and for 3-tap, 4-tap, and 5-tap FFEs. The result

is plotted in figure 4.11.

There are diminishing returns for bits ≥ 3 suggesting that 3 bits (plus sign) is a

reasonable design choice for an aggressive design that pushes the power and bandwidth

performance. For the proof-of-concept design in this work, we choose a resolution of

5 bits (plus sign) to provide some design margin.


4.8 Main cursor attenuation

Attenuation is an unavoidable consequence of FFEs with unity-gain limited coef-

ficients. It is a function of all the other FFE parameters and can be traded for

equalization performance. In particular, it is a parameter in the coefficient optimiza-

tion (see appendix A) and a strong function of the channel characteristics as well as

the target DR improvement. The block diagram for this simulation is similar to the

n-tap FFE in 4.7, but with constant τ = 25 ps. The optimal coefficients are found

using the method in appendix A.3 where the threshold in the constraint on the main

cursor amplitude (i.e. max(po) ≥ threshold) is swept. This process is repeated for

3-tap, 4-tap, and 5-tap FFEs. The result is plotted in figure 4.12.

The performance degrades rapidly for output main cursor amplitudes greater than

12. For this reason, the threshold is fixed at 1

2in all other simulations in this chapter,

as outlined in §4.2. This value is a strong function of the channel with a higher atten-

uation required to equalize for channels with low ISI. Therefore, the FFE performs

best in systems with substantial ISI, which is the target of this work.

4.9 Summary

In this chapter, we used MATLAB simulations to investigate the impact of various

FFE design parameters on the equalization performance. The following conclusions

are consequences of these simulations:

• First-order delays with pole and zero offsets less than one have limited equal-

ization performance degradation as compared to Pade delays for 5-tap FFEs.

• A 5-tap FFE provides robustness to delay time and channel characteristics.

• The parasitic pole frequency at each node in the FFE should exceed 20 GHz.

4.9. SUMMARY 51

• A coefficient resolution of 5 bits plus sign is sufficient to achieve the maximum

FFE performance with some design margin.

• A delay time of 25 ps is a reasonable design choice for < 1 m PCB traces in FR4

dielectric.

These results are used to guide the architecture choices in chapter 5 and the design

decisions in chapter 6.

Chapter 5

Inverter-based FFE

5.1 Analog-inverter transconductor

The analog-inverter transconductor is the fundamental building block for the inverter-

based FFE. As depicted in figure 5.1, the circuit architecture of the inverter transcon-

ductor is identical to that of a digital inverter, but the operating point is constrained

to the saturation region in the small-signal range about mid-supply. In this region,

the block behaves as a linear transconductor with

io = Gmvi (5.1)

where the total transconductance is a sum of the NMOS and PMOS small-signal

transconductance:

Gm = gm,n + gm,p. (5.2)

Due to the class AB operation, this transconductor is efficient in terms of noise, power,

and bandwidth while maintaining reasonable linearity performance.

This transconductor is a versatile building block and can be used to achieve many

functions (see figure 5.2) [21]. A simple configuration is the unity-gain stage as shown

52

5.1. ANALOG-INVERTER TRANSCONDUCTOR 53

vi vo

io = Gmvi

gm,n

gm,p

Gm = gm,n + gm,p

vi Gm vo

(b)(a)

Figure 5.1: (a) The analog-invertertransconductor and (b) the associatedtransistor-level schematic diagram.

vi voGm Gm

(a) Unity Gain (b) Summing

(c) Coefficient (d) Delay

vi vo

C

Gm Gm

vi1 voGm Gm

Gmvi2

vo = vi vo = vi1 + vi2

Gm

aGm

vo = avi vo = Dpd1(s) vi

Figure 5.2: Example circuits using theanalog-inverter transconductor.

in figure 5.2(a) where the self-biased load transconductor behaves as a small-signal

resistive load. This stage is modified to realized the FFE summing circuit (see figure

5.2(b)), coefficients (see figure 5.2(c)), and first-order Pade delays (see figure 5.2(d)).

The capacitor in the delay implements both the pole and the zero and the transfer

function in the absence of parasitics is

Dpd1(s) =1− s C

Gm

1 + s CGm

(5.3)

where

τ = 2C

Gm

. (5.4)

In reality, the transconductor has non-zero output conductance

Go = gds,n + gds,p, (5.5)

input capacitance1

Ci = Cgs,n + Cgs,p, (5.6)

1The miller capacitance due to Cgd is dependent on the transconductor load and, therefore, it isnot considered here. For the unity-gain configuration, a capacitance of 2(Cgd,n + Cgd,p) should beadded to the input capacitance.

54 CHAPTER 5. INVERTER-BASED FFE

and output capacitance

Co = Cdg,n + Cdb,n + Cdg,p + Cdb,p. (5.7)

The intrinsic gain of the transconductor is defined as

Ai =Gm

Go

=gm,n

gds,n

θi +gm,p

gds,p

(1− θi) (5.8)

where we defined the interpolation parameter

θi =gds,n

gds,n + gds,p

≤ 1. (5.9)

The intrinsic gain of the transconductor is an interpolation between the intrinsic gains

of its transistors.

The gain stage in figure 5.2(a) is a building block common to all the blocks in the

inverter-based FFE, and its performance determines the overall performance of the

FFE. Therefore, we investigate the details of this circuit in §5.2.

5.2 Unity-gain stage

Figure 5.3 shows the schematic diagram of the unity-gain stage with the parasitics

explicitly represented. The total parasitic output conductance is

Go,tot = 2Go (5.10)

5.2. UNITY-GAIN STAGE 55

Go,totCi Co,tot

GmGmvi vo

Figure 5.3: Schematic diagram of the unity-gain stage with parasitics.

and the output capacitance is

Co,tot = Ci + 2Co. (5.11)

In this section, we investigate the behavior of this circuit to aid in understanding the

performance of the overall FFE in §5.6. First, we look at the linear behavior: the

gain in §5.2.1 and bandwidth in §5.2.2. Next, we investigate the statistical behavior:

the noise in §5.2.3 and mismatch in eq. (5.25). Finally, we give an overview of the

nonlinear behavior: supply rejection in §5.2.5 and nonlinearity in §5.2.6.

5.2.1 Gain

The small-signal gain is

A =vo

vi

=Gm

Gm + 2Go

=1

1 + 2A−1i

. (5.12)

The finite Ai due to the non-zero Go reduces the gain below unity. To compensate

for this, the load transconductor is scaled by the factor

β =Ai − 1

Ai + 1. (5.13)


For the TSMC40 GP process with minimum length devices, Ai ≈ 6 which corresponds

to β = 0.71. This scale-factor can be achieved through various means. One option

is to reduce the load device widths, but this reduces the matching between the input

and load as well as complicates the layout. A more robust method is to use a tunable

load transconductor controlled by a calibration circuit, which is discussed in §6.1.

5.2.2 Bandwidth

Maximizing the bandwidth is crucial to maximizing FFE performance potential, as

demonstrated in §4.6. For optimal FFE performance, the bandwidth of each inter-

nal node should be greater than 20 GHz, but using inductors to resonate parasitic

capacitance results in a substantial area penalty (see [11]). The bandwidth of the

unity-gain stage is fixed by the transit frequency of the process, achieving a wide

bandwidth without the use of inductors.

The unloaded bandwidth of the gain stage is

ωc =Gm + 2Go

Ci + 2Co

≈ gm,n + gm,p

Cgs,n + Cgs,p

= θcωT,n + (1− θc)ωT,p (5.14)

where we defined the interpolation parameter

θc =Cgs,n

Cgs,n + Cgs,p

≤ 1 (5.15)


and the NMOS and PMOS transit frequencies

ωT,n =gm,n

Cgs,n

(5.16)

ωT,p =gm,p

Cgs,p

. (5.17)

The unloaded bandwidth is an interpolation between the transit frequencies of the

transistors which are set by the gmID

factors. These factors are fixed by the VGS − Vtwhich is itself fixed by the supply voltage. Therefore, little can be done to increase the

unloaded bandwidth. In a practical implementation, the gain stage is always loaded

by the capacitance of the following stage. In the limiting case where the capacitance

is much greater than Co,tot, the bandwidth is

ωc =Gm

CL

=

(gm,n

IDS,n

+gm,p

ISD,p

)C−1

L

ITOT

2(5.18)

where increasing the power increases the speed. For this FFE design, the actual

bandwidth is dependent on both the self-loaded capacitance and the next-stage ca-

pacitance. The result is a trade-off between power and bandwidth at each node in

the FFE, which is discussed in more detail in §5.3.1.

5.2.3 Noise

Assuming, for simplicity, that the output conductance is zero, the spot noise power

is

v2n,o

∆f=v2

n,i

∆f= 2γeff4kTG−1

m

= 2v2

n

∆f(5.19)


where we defined the effective transconductor gamma factor

γeff = θnγn + (1− θn)γp, (5.20)

the interpolation parameter

θn =gm,n

gm,n + gm,p

≤ 1, (5.21)

and the spot noise power of a single transconductor

v2n

∆f= γeff4kTG−1

m . (5.22)

The effective gamma factor is an interpolation between the NMOS and PMOS gamma

factors. To reduce the noise it is necessary to increase the transconductance. The

gmID

factors are fixed by the supply voltage and the device width ratio is fixed by the

common mode. As a result, the only way to reduce the noise is to increase the power

by scaling the NMOS and PMOS devices proportionally.

The total noise accumulates through the stages of the FFE. Depending on the

coefficient values, the noise can add constructively, compounding the problem. This

is discussed in §5.6.1.

5.2.4 Mismatch

The threshold voltage mismatch for the NMOS and PMOS transistor are [22]

σp =AVT,p√WpLp

(5.23)

σn =AVT,n√WnLn

. (5.24)


The total mismatch of the unity-gain stage is

σtot =

√1

2σ2

p +1

2σ2

n. (5.25)

Mismatch between the positive and negative half-circuits limits the effectiveness of the

pseudo-differential implementation at combating the supply noise and second-order

nonlinearities. The total mismatch accumulates through the stages of the FFE. De-

pending on the coefficient values, the mismatch can add constructively, compounding

the problem. This is discussed in §5.6.2.

5.2.5 Supply rejection

In appendix F.1, we show that the single-ended supply rejection of the gain stage is

vo

vdd

≈ 2gm,p

gm,p + gm,n

≈ 1 (5.26)

which shows that the supply noise passes unimpeded to output. To reduce the impact

of this issue, the final FFE is implemented pseudo-differentially, which rejects the

single-ended supply noise to the extent that the circuit is balanced.

A fully-balanced pseudo-differential circuit will completely reject first-order supply

noise, but this does not mean it is completely immune to supply noise. There is a

second-order effect in which the supply noise is mixed with the input signal. The

details are discussed in appendix F.2, but the result is

vod = vid + a11vidvdd (5.27)


where the conversion gain, a11, is approximately

a11 ≈ −g

(p)20

g(p)10

≈ −1

4

(gm,p

ISD,p

). (5.28)

The last expression is a square-law approximation, but gives some intuition as to the

magnitude of the conversion gain. The extracted factor is about 2× this result, but

the linear dependence on the gmID

factor is reasonably accurate.

Due to the lack of first-order supply rejection in this circuit and the limitations

of the pseudo-differential implementation, some additional supply rejection circuitry

such as an LDO should be considered for a robust implementation. We did not

implement this in the proof-of-concept design due to time constraints.

5.2.6 Nonlinearity

In appendix E, we show that the nonlinearity of the gain stage can be modeled as a

Taylor series and is dominated by the first few terms

vo ≈ a1vi + a2v2i + a3v

3i (5.29)

where

a2 ≈G20

G10

(1 + β) (5.30)

a3 ≈G30

G10

(1− β) +G21

G10

(1 + β) + a2G20

G10

(2β) (5.31)

and

Gjk = g(n)jk − (−1)j+kg

(p)jk . (5.32)

5.3. DELAY 61

vi vo

C

Gm Gm

Figure 5.4: Schematic diagram of the inverter-based first-order Pade delay.

The terms g(n)jk and g

(p)jk are the Taylor series coefficients for the NMOS and PMOS

transistors as defined in (E.2) and (E.1), respectively. The second-order nonlinearity

is reduced by the pseudo-differential implementation and the third-order term, a3,

dominates.

5.3 Delay

5.3.1 Single-path Pade-inspired delay

As shown in §4.3, the first-order Pade delay provides a good trade-off between FFE

performance and delay complexity. Figure 5.4 shows the schematic diagram of the

inverter-based implementation with the transfer function

D(s) = −Dpd1(s) = −1− s C

Gm

1 + s CGm

(5.33)

where

τ = 2C

Gm

. (5.34)

This delay realization combines the power, noise, and bandwidth performance of the

inverter transconductor with simplicity and equalization performance of the first-order

Pade delay. The limitation is revealed by analyzing a cascade of two delays. The delay


vo

C

Gm Gmvi buffer

Figure 5.5: Block diagram of the buffered inverter-based first-order Pade delay.

input impedance is determined by the Miller capacitor and can be expressed as

Zin =1

sC (1 +Dpd1(s)). (5.35)

As a consequence, the transfer function of the first delay is disrupted and has the

formvo1

vi1

= −1− s C

Gm

1 + s CGm

(2 +Dpd1(s)). (5.36)

One solution is to insert a buffer before each delay to isolate the input impedance as

shown in figure 5.5, but this adds additional power and the buffer capacitance loads

the previous delay output. Therefore, the impact of the buffer stage needs to be

carefully considered. Using a scaled replica of the gain stage in the delay as a buffer

with a scale factor β (see figure 5.6), the delay transfer function becomes

D(s) =1− s C

Gm

1 + s CGm

(1 + 2β−1). (5.37)

In the limit where β →∞, the delay behaves like a first-order Pade delay. In general,

the transfer function represents a first-order delay as defined in (3.19) with pole and

zero ratio

α =1

(1 + 2β−1), (5.38)

5.3. DELAY 63

τ factor from the delay definition

τ = 2(1 + 2β−1)C

Gm

, (5.39)

and group delay

τg = (2 + 2β−1)C

Gm

. (5.40)

In §3.2.1, we show that any pole and zero ratio can be absorbed into the FFE coeffi-

cient at a cost of coefficient spread. For the case with equal power in the buffer and

delay where β = 1 (i.e. α = 13), the coefficient spread penalty bound for a 5-tap FFE

is (see §3.2.2) ∥∥∥M13

∥∥∥2

= 10.049. (5.41)

The practical trade-off between α and FFE performance is simulated in §4.4.3. For

the 5-tap FFE and α = 13, the FFE performance degradation is just 13 %, which is

an acceptable penalty to pay in trade for the simplicity of this delay implementation.

Additionally, β = 1 is a practical design choice in that the gain stage in the delay is

replicated exactly for the buffer, simplifying the design. For these reasons, we chose

β = 1 for the delays in this work as depicted in figure 5.7(a).

The finite output conductance of the inverter transconductors will limit the gain

and must be addressed. This topic is covered in ??.

β β 1 1

C

vovi

Figure 5.6: Schematic diagram of the buffered inverter-based first-order delay.


Gm

C

vovi

(a) single-path delay (this work)

Gm Gm Gm

(b) two-path delay

vi GmGm Gm Gm vo

Gm

Gm

C

Figure 5.7: Schematic diagrams of (a) the single-path Pade-inspired delay of this workand (b) the two-path Pade delay [3].

5.3.2 Comparison with two-path Pade delay

Figure 5.7(b) shows a two-path implementation of the Pade delay which is an alter-

native to the delay introduced in this work [3, 20]. The two-path delay achieves the

Pade response by subtracting the input from a first-order pole with a gain of two

resulting in

D2p(s) =2

1 + s CGm

− 1 = Dpd1(s). (5.42)

A consequence of this approach is that signal is created and then destroyed, wasting

power while introducing noise and nonlinearity. The total spot noise of this two-path

delay is

v2n,2p

∆f= 4kT (6γeff)G−1

m (5.43)

5.4. COEFFICIENTS 65

vip

vim

5 bits

5 bits

(from other

half circuit)

vo

Figure 5.8: Half-circuit schematic diagram of the inverter-based coefficient.

which is 1.5× the single-path delay and has the total spot noise

v2n,1p

∆f= 4kT (4γeff)G−1

m . (5.44)

When the 1.5× power of this two-path delay is additionally accounted for, this single-

path delay is 2.25×more efficient in terms of noise for the same power. In addition, for

the same power (i.e. each Gm in the two-path scaled down by 1.5×), the single-path

delay has a stronger output drive by a factor of 1.5×. These benefits in performance

come with only a modest cost in FFE performance due to the pole and zero offset,

and account for the FFE performance improvement as compared to [3] (see §7.4 for

comparison of the measured results).

5.4 Coefficients

Figure 5.8 shows the half-circuit schematic diagram of the inverter-based coefficient.

The pseudo-differential implementation is exploited to realize negative coefficients by

feeding the signal from the negative half-circuit to the positive coefficient. The input

transconductors are a set of binary-weighted minimum-sized inverters that can be

switched in or out to achieve the desired coefficient gain. Using inverter transcon-

ductors in the coefficients and summing circuit (see §6.3) results in a ratiometric


½×

1×

2×

1×

½×

1×

vi2

vi1

vi3

vi4

vi5

vo

Figure 5.9: Half-circuit schematic diagram of the inverter-based summing circuit.

FFE design where the common mode at each stage is common to the entire FFE.

In addition, the benefits of the inverter transconductor and gain stage are imparted

on the total FFE. The impact of coefficient resolution is investigated in §4.7. The

conclusion is that 5 bit plus sign resolution is sufficient to ensure an optimal solution

is achievable with margin. We use this margin to absorb for the coefficient spread

introduced by the pole and zero offset in the delay.

5.5 Summing circuit

The half-circuit schematic diagram of the inverter-based summing circuit is shown in

figure 5.9. The use of the inverter transconductors in the summing circuit completes

5.6. FULL FFE 67

the power-efficient ratiometric FFE design. The transconductors are scaled to antici-

pate the relative magnitudes of the coefficients to maximize the dynamic range of the

coefficient resolution. The total output conductance is

GL,tot = 6Go +Gm (5.45)

and the gain for the 1× transconductor is

vo

vi2

=1

1 + 6Ai

. (5.46)

Because the intrinsic gain of the transconductor is close to 6, the approximate ex-

pression for the output voltage is

vo =1

4vi1 +

1

2vi2 + vi3 +

1

2vi4 +

1

4vi5. (5.47)

The main tap is vi2 and the first post-cursor tap, vi3, is 2× larger. This is due to the

coefficient transformation required to absorb the pole and zero offset in the delay as

determined from the analysis in §3.2.2.

5.6 Full FFE

Figure 5.10 shows the half-circuit schematic diagram of the complete inverter-based

FFE. It is constructed entirely from the inverter transconductors gain stage. There-

fore, the total FFE performance is derived from the performance equations of the

gain stage outlined in §5.2.


5.6.1 FFE noise

The dominant noise source of the FFE is the delays. Due to the multi-path nature

inherent to the FFE architecture, the noise from the delays see multiple paths to the

outputs and can add constructively or destructively. The worst case is for the first

delay whose FFE output noise contribution is

v2n,o,d1

∆f= (a2 + a3 + a4 + a5)2 4

v2n

∆f(5.48)

where ak is the gain of the k coefficient path including the scale factor of the summing

circuit and v2n∆f

is defined in (5.22). In the worst case, which occurs for maximum

½×

1×

2×

1×

½×

1×

5+1 bits

delay

cell

delay

cell

delay

cell

delay

cell

5+1 bits

5+1 bits

5+1 bits

5+1 bits

vo

vi

Figure 5.10: Half-circuit schematic diagram of the inverter-based FFE.

5.7. SUMMARY 69

coefficient values and equal signs,

v2n,o,d1

∆f=

81

4

v2n

∆f. (5.49)

This is nearly 3× the total contribution of all the coefficients and the entire summing

circuit combined. For typical coefficient values, this factor is reduced, but the delays

remain the primary contributor to FFE noise and power must be invested to mitigate

this problem. This is discussed in §6.1 while covering the transistor-level design of

the delays.

5.6.2 FFE mismatch

Mismatch is a statistical process and its behavior in the complete FFE is similar to

the noise described in §5.6.1. The first delay has the largest contribution to the FFE

offset, which can be represented as

σo,d1 = (a2 + a3 + a4 + a5)σtot (5.50)

where σtot is the total mismatch of the gain stage defined in (5.25). As a consequence,

the mismatch in the delays is the primary contributor to FFE offset. This is discussed

in §6.1 while covering the transistor-level design of the delays. Additionally, the

tuning for the PVT variations of the delays is performed by a switched-capacitor

circuit which is introduced in appendix G.

5.7 Summary

In this chapter, we introduced the single-path Pade-inspired delay and the inverter-

based FFE. The use of analog-inverter transconductors throughout the FFE results


in a power and noise efficient design with a high bandwidth without the use of area-

intensive inductors. The single-path delay provides a 2.25× improvement in power

efficiency as compared to the popular two-path Pade implementation [3]. This im-

provement translates into a substantial increase in power efficiency of the total FFE

which is dominated by the noise of its delay elements. The measurement results in

chapter 7 support this analysis. The design of the proof-of-concept inverter-based

FFE is covered in chapter 6.

Chapter 6

FFE design

6.1 Delay

The single-path Pade-inspired delay used in this proof-of-concept FFE design was in-

troduced in §5.3.1 where the non-zero output conductance was ignored to simplify the

discussion. This is addressed now by degenerating the load transconductors with tri-

ode devices as shown in figure 6.1. This reduces the load conductance to compensate

for the non-zero output conductance of the transconductors. The gates of the triode

devices can be tied to supply and ground with the residual gain error being absorbed

into the coefficients. A more robust solution is to tune the triode gate voltages to

adjust for gain and common mode PVT variations as discussed in appendix G.

Low-Vt minimum-length devices are used in the delay to maximize the bandwidth.

The device PMOS/NMOS width ratio is chosen to set the common mode at VDD/2 =

0.5 V which is approximately 2× for this process. The remaining degree of freedom in

the device widths determines the power dissipation and the output noise. The delays

are sized such that the total FFE output noise for the worst-case coefficients is limited

to 1 mVRMS. To achieve this, the NMOS widths are Wn = 2×1.5 µm with the PMOS

widths Wp = 4× 1.5 µm. In post-layout simulations, the transconductance is 6.5 mS

71

72 CHAPTER 6. FFE DESIGN

C

Gm GmGmGm vovi

Figure 6.1: The single-path Pade-inspired delay schematic diagram with triode-degenerated load transconductor.

with a current of 430 µA per input transconductor and 350 µA per degenerated load

transconductor for a total half-circuit current of 1.56 mA per delay. The half-circuit

spot noise isvn,d√∆f

= 2.5nV√Hz. (6.1)

To achieve the target delay time of 25 ps as determined in §4.4, the feedthrough

capacitor is implemented as a 13 fF MOM capacitor. The total delay results from a

combination of this capacitor and the gate-drain parasitic capacitance.

6.2 Coefficients

The coefficients are realized as a set of binary-weighted unit-sized inverters as de-

scribed in §5.4. Low-Vt minimum-length devices are used to maximize the band-

width. For the unit inverter, the NMOS devices are sized Wn = 160 nm and the

PMOS devices are sized Wp = 360 nm so that the nominal common mode is half the

supply voltage. To implement the 5 bits plus sign resolution, 2× (25 − 1) = 62 unit

inverters are required. This results in a large capacitance that loads the delay output

and limits the bandwidth from reaching the 20 GHz target determined in §5.2.2. To

6.2. COEFFICIENTS 73

vi vo

8×

4×

2×

1×

b3

b4

b2

b1

b0

b0

8×

Figure 6.2: Half-circuit schematic digram for the reduced input capacitance 5-bitcoefficient.

minimize this capacitance, the least-significant bit (LSB) is implemented with series-

stacked transistors to effectively lengthen the device by a factor of two as depicted

in figure 6.2. With this modification, the new input capacitance is equivalent to 34

unit inverters which is a reduction of approximately 2×. This comes with a match-

ing penalty, but since it is limited to the LSB it does not substantially impact the

performance.

In post-layout simulations, the unity-gain configuration has the transconductance

Gm = 6.4 mS, which is similar to the transconductors in the delay and the summing

circuit. As a result, the bandwidth of delay driving the coefficient is nearly equal to

that of the coefficient driving the summing circuit, which suggests this is an optimal

design point. In post-layout RC-extracted simulations, the bandwidth exceeds 20 GHz

for the delay, coefficients, and summing circuit1, achieving the design target without

1The bandwidth of the coefficient driving the 2× summing circuit transconductor is 15 GHz, butthe simulation plotted in figure 4.9 suggests this should not be detrimental to the overall performance.


resorting to peaking inductors. The load device is made from eight unit-sized inverters

to compensate for the device output conductance. Gain errors for this stage are

absorbed into the coefficient values and require no tuning. The output noise in the

unity-gain configuration isvn,c√∆f

= 3.1nV√Hz

(6.2)

with a half-circuit power consumption of 700 µA.

6.3 Summing circuit

The schematic for the summing circuit is shown in figure 5.9. Low-Vt minimum-

length devices are used to maximize the bandwidth. For the 1× inverter as shown

in figure 5.9, the NMOS width is Wn = 4 × 0.75 µm and the PMOS width is Wp =

4×1.5 µm, which is equivalent to the sizing the delay. In post-layout simulations, the

unit transconductance is 6.2 mS with a current of 485 µA per transconductor. The

total current for the half-circuit summing circuit is 2.91 mA with an output noise of

vn,s√∆f

= 1.9nV√Hz. (6.3)

To compensate for the offset of the FFE an additional transconductor drives the

output node with an off-chip bias voltage setting the input. This transconductor can

source or sink current to adjust the common-mode voltage to trim the offset and set

the FFE output common mode to the desired level to drive the next stage.

6.4. PRBS GENERATOR 75

PRBS7

PRBS7

Phase

Aligner

MU

X

10 GHz

CLK3 bits

3 bitsLVDS Conversion

To

FFE...

Figure 6.3: Block diagram of the on-chip signal generator including a PRBS generatorand LVDS conversion stage.

6.4 PRBS generator

To aid in the testing process an on-chip signal generator was designed. It is con-

structed of a 20 Gb/s pseudo-random bit sequence (PRBS) generator and a low-

voltage differential signal (LVDS) conversion stage as shown in figure 6.3. To achieve

this high data rate, the PRBS generator is built from two (27 − 1) 10 Gb/s PRBS

generators muxed together to form a single 20 Gb/s (27−1) PRBS signal. This signal

is phase-aligned by a weakly coupled inverter chain. The signal is converted to LVDS

using a weak inverter driving a strong self-biased inverter to limit the swing. The

amplitude is configured by the value of the 3-bit load transconductor. The LVDS

conversion inverters are sized similar to the core FFE devices, so the FFE input com-

mon mode is achieved by design. In addition, the PRBS generators can be configured

in a pulse mode to aid in testing and coefficient optimization.

6.5 Output driver

The output drivers are constructed from the same unit inverter transconductor as in

the summing circuit and delays, sharing the common mode with the FFE. A 50 Ω


resistive load is formed with 100 Ω poly resistors from the output to supply and

ground (see 6.4). The load capacitance is dominated by the pads and oscilloscope

input. The bandwidth is therefore fixed by the resulting RC time-constant. The

transconductance is chosen to achieve a nominal gain of unity. The devices are sized

Wn = 14× 0.75 µm and Wp = 14× 1.5 µm.

vi voGm

100

100

Figure 6.4: Half-circuit schematic diagram of the output driver.

Chapter 7

Measurement results

7.1 Test setup

Figure 7.1 shows the die photo of the proof-of-concept integrated circuit (IC) fab-

ricated in the TSMC40 GP process with an FFE area of only 0.003 mm2. The

high-speed inputs and outputs are probed through single-ended ground-signal-ground

(GSG) pads and differential ground-signal-signal-ground (GSSG) pads. The low-

frequency signals (i.e. voltage supply, voltage references, and digital I/O) are bonded

directly to the test PCB via the chip-on-board method as shown in figure 7.2. The

PCB has a low-speed digital interface to the NanoRiver Miniboard GPIO card for scan

chain read and write to set the FFE coefficient values and on-chip signal generator

amplitude. The analog reference voltages Vbn and Vbp (see appendix G) can be set ex-

ternally by on-board DACs (TI DAC128S085). Alternatively, the switched-capacitor

bias circuit can be enabled to set the bias with an externally provided 500 kHz clock

source. Unfortunately, measurement issues limited a complete physical verification

of the switched-capacitor bias circuit (see appendix G). As such, the measurement

results in this chapter are for the delay configured with fixed ground and supply

bias voltages as in figure 6.1. The NanoRiver Miniboard is connected by USB to

77

78 CHAPTER 7. MEASUREMENT RESULTS

G

S

G

G

S

G

G

S

S

G

S

S

G

FFE CH

PRBS

FFE CHPRBS

FFE area 0.003 mm2

FFE EQ

G

S

G

S

S

S

S

S

S

G G

G

GG

PRBS

t-line

FFE EQ

Figure 7.1: Die photo of the proof-of-concept IC fabricated in the TSMC40 GPprocess.

a computer and controlled through Python and MATLAB scripts. The oscilloscope

data is retrieved with the Keysight IVI drivers and the python-ivi library. Additional

communication with the scope and other test equipment is achieved with the VISA

protocol with the PyVISA library. There are multiple signal paths to facilitate the

testing process, which are outlined in the following sections.

7.1.1 On-chip channel

The on-chip channel signal path is shown in figure 7.3(a). The test equipment used

for these measurements is outlined in table 7.1. The bit error rate tester (BERT) is

configured as a 10 GHz clock and probed through GSG pads to the on-chip 20 Gb/s

signal generator. There are two FFEs in the signal path. The first is configured to

emulate the channel and is referred to as the channel FFE. The second is configured

to equalize for the channel by the adaptation of its coefficients and is referred to as the

equalizer FFE. The pseudo-differential FFE output is driven off chip through GSSG

7.1. TEST SETUP 79

chip-on-boardbonded

Figure 7.2: Test PCB photo and (inset) chip-on-board bonding.

pads and dc blocking capacitors to the high-speed oscilloscope. There is a second

signal path shown in figure 7.3(b) with the equalizer FFE removed to measure the

channel response. The benefit of the on-chip channel measurement setup is in that

the total impact of the test system is constant between the channel and channel+FFE

measurements. Therefore, the FFE response can be accurately separated from the rest

of the test system. Another benefit is that the signal generation can be performed on

chip with only a clock input, reducing the complexity of the required test equipment.

A drawback is the limited insertion loss introduced by the channel FFE, as is detailed

in the measurement results in §7.3.3.

To adapt the coefficients, the PRBS generator in figure 7.3(a) is configured in pulse

Table 7.1: Test equipment for the measurements with the on-chip channel.

Use Equipment

Oscilloscope Keysight DSA-X 643AClock Generator Keysight N4872A

Input Probe Infinity 40 GHz GSGOutput Probe Dual Infinity 40 GHz GSSGGPIO Card NanoRiver Miniboard


PRBSFFE

CH

FFE

EQ

PRBSFFE

CH

FFE

EQ

0.5m FR4 PCB Traces

10 GHz

CLK

10 GHz

CLK

From

BERT

To Scope

To Scope

To Scope

(a) On-chip PRBS, channel, and equalizer.

(b) On-chip PRBS and channel.

(c) Off-chip PRBS and channel.

Figure 7.3: Test signal paths.

mode. The pulse passes through the channel FFE and through a single coefficient path

in the equalizer FFE by setting all other coefficients to zero. This pulse response is

measured on the oscilloscope and the process is repeated for all five coefficient paths

to generate a family of pulses. These pulses are processed in MATLAB through

the brute-force optimization process outlined in appendix A.2 to find the optimal

coefficient values. These values are then programmed into the equalizer FFE on the

IC to measure the equalized pulse and equalized PRBS data. A coordinate descent

optimization of the coefficients was performed on the measured output to verify that

the local optimum was found, but this did not significantly improve the performance.

The channel FFE pulse and PRBS data are measured from the signal path in figure

7.3(b) to characterize the ISI reduction and DR improvement due to the FFE. The

measurement results from this test are presented in §7.3.3 and §7.3.4.

7.1. TEST SETUP 81

7.1.2 Off-chip channel

The off-chip channel signal path is shown in figure 7.3(c). The test equipment used

for these measurements is outlined in table 7.2. A BERTScope is used as a signal

generator for pulse and PRBS signals. This signal is then passed through the channel

which is a differential 0.5 m FR4 PCB trace. The channel output is probed on the chip

and passes through a short 50 Ω terminated on-chip differential transmission line to

the FFE input. The pseudo-differential FFE output is driven off chip through GSSG

pads and dc blocking capacitors to the high-speed oscilloscope. The off-chip channel

measurements allow for more accurate channel responses with high insertion loss, but

require a high-speed signal generator and channel. Also, only the off-chip channel

losses can be characterized with the on-chip transmission line losses unknown.

To adapt the coefficients, the BERT referred to in figure 7.3(c) is configured in

pulse mode. The pulse passes through the channel and through a single coefficient

path in the FFE by setting all other coefficients to zero. This pulse response is measure

on the oscilloscope and the process is repeated for all five coefficient paths to generate

a family of pulses. These pulses are processed in MATLAB through the brute-force

optimization process outlined in appendix A.2 to find the optimal coefficient values.

These values are then programmed into the FFE on the IC to measure the equalized

pulse and equalized PRBS data. A coordinate descent optimization of the coefficients

was performed on the measured output to verify that the local optimum was found,

Table 7.2: Test equipment for the measurements with the off-chip channel.

Use Equipment

Oscilloscope Keysight DSA-X 93204APRBS Generator Tektronix BSA286C

Input Probe Dual Infinity 40 GHz GSSGOutput Probe Dual Infinity 40 GHz GSSGGPIO Card NanoRiver Miniboard


but this did not significantly improve the performance. The channel pulse and PRBS

data are measured by bypassing the chip and connecting the channel output directly to

the scope. Additionally, an estimate of the channel response including the effect of the

on-chip transmission line can be made by configuring the first coefficient path to unity,

bypassing all the delays. The high-frequency loss from the coefficient and output

buffer are small compared to the channel and add negligible ISI. The measurement

results from this test are presented in §7.3.1.

7.2 Test debug

The first chip revision had some bugs that prevented the full FFE verification. The

primary issue was an approximately 2× decrease in bandwidth caused by an off-

chip return-current path. The PRBS generator and output buffer were on separate

supplies with separate on-chip grounds. The grounds were connected off-chip on the

PCB, which put a bondwire and the PCB parasitics in the return current path at

the interface between these blocks. Figure 7.4 shows the measured eye diagram for

a 20 Gb/s PRBS signal for the first chip revision. Figure 7.5 shows the post-layout

0 10 20 30 40 50−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Time (ps)

Voltage

(V)

Figure 7.4: Measured eye diagram fora 20 Gb/s PRBS signal for the firstchip revision.

0 10 20 30 40 50−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Time (ps)

Voltage

(V)

Figure 7.5: Post-layout simulated eyediagram for a 20 Gb/s PRBS signalwith additional supply resistance.

7.3. MEASUREMENT RESULTS 83

simulated eye diagram for the same signal with additional supply resistance and bond

wires added. The performance degradation is similar, supporting the return current

path as the source of the bandwidth decrease. For this reason, a second chip revision

was fabricated with the primary change being a shared on-chip ground. The following

measurement results in this chapter are for this second chip revision.

7.3 Measurement results

7.3.1 Pulse responses and DR improvement

The normalized channel and equalized pulse responses are shown in figure 7.6 for

the off-chip 0.5 m FR4 PCB trace channel measurement detailed in §7.1.2. The main

cursor attenuation is not depicted and is 3.03× for this channel and set of coefficients.

The channel pulse has significant ISI and the PMR as defined in (2.26) is

PMRch = 3.87. (7.1)

With the optimal coefficient values, the FFE output pulse has the PMR

PMReq = 1.83 (7.2)

which corresponds to a DR improvement as defined in (2.27) of

DR improvement =PMRch

PMReq

= 2.11. (7.3)

The reduction in signal DR is illustrated by the plot in figure 7.7 which shows the

normalized PRBS responses generated from the associated pulse responses. Although

signals are scaled so that the main cursor amplitude is equal, the peak amplitude is


0 100 200 300 400 500 600

0

0.2

0.4

0.6

0.8

1

ChannelEqualized

Time (ps)

Normalized

Voltage

Figure 7.6: Measured normalizedpulse response for the 0.5 m FR4 PCBtrace channel and the channel+FFE.

0 2 4 6 8−6

−4

−2

0

2

4

6

ChannelEqualized

Time (ns)

Normalized

Voltage

Figure 7.7: Normalized PRBS re-sponse generated from the pulse re-sponses in figure 7.6.

more than 2× lower for the signal equalized by the FFE. This DR improvement

corresponds to a relaxation of the ADC resolution by more than 1 bit. As a point

of reference, the 10 GS/s 6 bit ADC in [8] consumes 143 mW of power. This DR

improvement results in a reduction of the ADC power by more than 2× for a savings

of over 70 mW. For this configuration of coefficients, the FFE consumes only 23 mW,

resulting in a significant reduction in the total system power.

The channel length is limited by the maximum PCB trace length on the test

channel board used during testing. Post-layout simulation results suggest that the

performance increases for even longer PCB traces with higher channel attenuation.

7.3.2 Noise

The noise is measured using the Keysight DSA-X 643A oscilloscope output data.

The baseline noise power is first measured for the oscilloscope with a 50 Ω terminated

input. The FFE is then biased with a dc input common mode voltage and the output

captured with the oscilloscope. The noise power is calculated by subtracting the

baseline from this measurement. This is repeated for coefficient values from 0 to


5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

SimulationMeasurement

Coefficients Value

Noise

(mV

RMS)

Figure 7.8: Measured and simulated integrated noise voltage versus coefficient value.

31 (i.e. best-case to worst-case noise) and the result is plotted in figure 7.8. The

post-layout simulated noise is plotted for comparison and it is in good agreement

with the measured data. The baseline oscilloscope noise is 0.83 mVRMS. With the

basline oscilloscope noise power subtracted, the FFE noise is calculated to be between

0.3 mVRMS and 0.62 mVRMS for all coefficient values. Assuming a conservative system

bandwidth of 5 GHz, this corresponds to spot noise in the range from 4.2 nV√Hz

to

7.7 nV√Hz

.

7.3.3 Eye diagrams

Although it is not the objective of the FFE in this work to completely equalize the

channel, it is nonetheless interesting to investigate the performance for such a scenario.

To do this, the measurement setup with the on-chip signal generator and channel is

used as detailed in §7.1.1. Figure 7.9 shows the measured eye diagrams for 16 Gb/s

and 20 Gb/s PRBS data after the channel FFE and after the equalizer FFE.

For both data rates, the eye is closed after the channel and the FFE is able to open

the eye, demonstrating the FFE performs up to 20 Gb/s. The measured channel and

equalized pulse response for the 20 Gb/s data rate are shown in figure 7.10. There


(a) 16 Gb/s after channel FFE

50 mV/div, 10ps/div

50 mV/div, 10ps/div

(c) 20 Gb/s after channel FFE

7.5 mV/div, 10ps/div

7.5 mV/div, 10ps/div

(b) 16 Gb/s after equalizer FFE

(d) 20 Gb/s after equalizer FFE

Figure 7.9: Eye diagrams for the on-chip channel measurements.

is a significant reduction in the first pre-cursor and post-cursor ISI terms which is

responsible for the eye opening in figure 7.9(d). The PMR of the channel pulse is

PMRch = 2.30 (7.4)

and the FFE output pulse has the PMR

PMReq = 1.62 (7.5)

which corresponds to a DR improvement of

DR improvement =PMRch

PMReq

= 1.42. (7.6)

Figure 7.11 shows the corresponding measured normalized PRBS response with and

without the FFE equalization which illustrates this DR improvement. This corre-

sponds to less than a 0.5 bit ADC resolution relaxation. Because the channel has


100 200 300 400 500

−0.2

0

0.2

0.4

0.6

0.8

1

ChannelEqualized

Time (ps)

Normalized

Voltage

Figure 7.10: Measured normalizedpulse response with and without theFFE equalization.

0 2 4 6 8 10 12−3

−2

−1

0

1

2

3

ChannelEqualized

Time (ns)

Normalized

Voltage

Figure 7.11: Measured normalizedPRBS response with and without theFFE equalization.

limited ISI there is less to equalize for so the FFE is less effective. Furthermore, the

main cursor attenuation is not shown in these normalized plots and it is nearly 7×.

As a result, the FFE is best utilized in systems with significant ISI.

7.3.4 LMS system identification method

Theory

In order to characterize the frequency domain performance and linearity of the FFE,

system identification in the form of the least mean squares (LMS) algorithm is used

[23]. The block diagram in figure 7.12 shows the LMS algorithm adapted for system

identification in this application. The primary challenge is that the input sequence

is only partially known. Although the PRBS sequence is known, imbalanced duty

cycle can introduce additional nonlinearity, adding to the nonlinearity of the FFE.

Therefore, the duty cycle is an additional parameter to be optimized along with

the filter coefficients. With this accounted for, the filter coefficients converge to the

impulse response of the system and the residual signal is the portion that can not be

captured by a linear filter and is defined as the distortion signal. The ratio of the


h [n] h[n]adaptive

filter

FFE

response

distortion

partially

known input

measured

signal

Figure 7.12: Block diagram of the LMS algorithm system identification.

linear signal power to the distortion signal power is defined as the signal-to-distortion

ratio (SDR). From the impulse response, we can also plot the frequency response to

characterize the equalization performance in the frequency domain.

Linear response

Figure 7.13 shows the identified impulse response comparing bench and post-layout

simulation results for the on-chip channel signal path in figure 7.3(b). Similarly,

figure 7.14 shows the identified impulse response comparing bench and post-layout

simulation results for the equalized on-chip channel signal path in figure 7.3(a). The

equalization performance is readily apparent in that the equalized impulse response

is approaching a Dirac delta for the baud samples. The measurement results are in

good agreement with the simulated results.

Figure 7.15 shows the normalized magnitude response for the measured channel

and equalized signals. The equalization of the high-frequency loss is clearly visible

from this plot. The post-layout simulated normalized magnitude responses shown in

figure 7.16 are in excellent agreement with the measured data.


0 100 200 300 400 500−5

0

5

10

15

20x 10

−3

BenchSimulated

Time (ps)

ImpulseRespon

se

Figure 7.13: Impulse response for thebench and simulated channel responsefor the on-chip channel test.

0 100 200 300 400 500−5

0

5

10

15

20x 10

−3

BenchSimulated

Time (ps)

ImpulseRespon

se

Figure 7.14: Impulse response forthe bench and simulated equalized re-sponse for the on-chip channel test.

Nonlinearity

Because this is a non-conventional linearity measurement, it requires careful consid-

eration. To support the measurement result, we compare it to simulation both with

conventional methods using a sinusoid input and the LMS residual method using a

PRBS input. Based on the analysis in appendix H and the statistical properties of

the input signal, the expected difference in SDR between the sinusoid and PRBS sig-

nals is 4.6 dB for a third-order dominated case and 7.7 dB for a fifth-order dominated

case. The actual difference is a sum of the contributions from all cases and is approxi-

mately 8 dB. The SDR for the sinusoid case calculated with the LMS method and the

conventional THD method match exactly, supporting this nonlinearity measurement

technique. For the measured data with the linear responses from the last section and

input signal variance 0.008 V2, the SDR is 35 dB. This is within 3 dB of the simulated

SDR. For this input signal level, the output SNR is approximately 30 dB, suggesting

that the optimal performance occurs for a larger input voltage.


100

101

−30

−25

−20

−15

−10

−5

0

5

10

EqualizedChannel

Frequency (GHz)

Normalized

Magnitude(dB)

Figure 7.15: Measured frequency re-sponse of the channel and chan-nel+FFE for the on-chip channel test.

100

101

−30

−25

−20

−15

−10

−5

0

5

10

EqualizedChannel

Frequency (GHz)

Normalized

Magnitude(dB)

Figure 7.16: Simulated frequency re-sponse of the channel and chan-nel+FFE for the on-chip channel test.

10−3

10−2

30

35

40

45

50

55

60

65

70

Sine SimulationPRBS SimulationPRBS Measured

Input Signal Variance (V2)

SDR

(dB)

Figure 7.17: Signal distortion ratio versus input signal variance comparing sinusoidand PRBS signals.

7.4 Performance summary

Table 7.3 gives the performance summary of previous state-of-the-art FFE designs

and the design in this work for comparison [3, 11, 24]. This work achieved a 2×

reduction in power per tap while maintaining a competitive symbol rate. Due to

the omission of inductors in this design, the FFE area is just 0.003 mm2 which is

a significant improvement compared to previous designs. The noise is reduced by

7.4. PERFORMANCE SUMMARY 91

Table 7.3: Performance summary for state-of-the-art RX-FFEs.

This Work [24] [11] [3]

Power (mW) 20 to 26 80 90Taps 5 7 7Power/Tap (mW) 5.2 9.3 12.8Symbol Rate (GBd) 20 40 25Process 40 nm CMOS 65 nm CMOS 28 nm CMOSSupply Voltage (V) 1 1 1

Spot Noise (nV/√

Hz) 4.2 to 7.7 — 11.4 to 26.6SDR (dB) 35 — —Area (mm2) 0.003 0.75 0.085

almost 3× as compared to [3] with no noise numbers reported in [11]. The primary

sources of these improvements are attributed to the efficient single-path Pade-inspired

delay architecture as described in §5.3.2 and the efficiency of the analog inverter

transconductor as described in §5.1.

Chapter 8

Conclusions

8.1 Summary

As discussed in chapter 1 and chapter 2, an RX-FFE in ADC-based links can reduce

the required ADC resolution resulting in a substantial power reduction. In order to

obtain a net improvement for the system, the RX-FFE must be implemented with

low-power consumption, low noise, and small chip area. The greatest obstacle to

achieving these goals is in the design of the analog delay. In chapter 3, analog delays

for RX-FFEs were investigated and the equivalence of first-order delays was proven

for the application of RX-FFEs.

The design space for FFEs is high dimensional and includes the delay type, delay

time, number of taps, parasitic bandwidth, coefficient resolution, and main cursor

attenuation. The effect of these parameters on the performance in terms of signal

dynamic range reduction were investigated with MATLAB simulations in chapter 4

to guide the architecture choices in chapter 5 and design decisions in chapter 6.

The inverter-based FFE was introduced in chapter 5 along with the single-path

first-order Pade-inspired delay. The design equations of the FFE were covered to guide

92

8.2. FUTURE WORK 93

the practical design decisions in chapter 6. The design of the proof-of-concept inverter-

based FFE IC was discussed in chapter 6. A switched-capacitor tuning circuit was

introduced to tune for gain and common mode PVT variations of the delay element.

In chapter 7, the test and measurement results for the proof-of-concept FFE IC

were presented. The FFE was demonstrated to reduce the signal dynamic range by

2× for a 1 bit ADC resolution relaxation. The total power consumed was less than

26 mW with less than 0.62 mVRMS output noise for all coefficient values and an area of

only 0.003 mm2 in 40 nm CMOS. A technique to estimate the distortion with system

identification was discussed and the signal to distortion ratio was measured to be

35 dB.

8.2 Future work

There are multiple opportunities for future research on this topic. An obvious next

step would be the demonstration of this FFE in an ADC-based link receiver. This

would solidify the system level calculations by demonstrating the ADC relaxation

directly.

Another option is the demonstration of a more aggressive FFE design. The MAT-

LAB simulations in chapter 4 show that a substantial portion of the equalization

performance can be obtained with a 3-tap FFE with delay time tuning and 3-bit

coefficient resolution. This would further reduce the FFE power consumption and

noise, resulting in even greater system performance improvements if a competitive

ADC resolution relaxation can be achieved.

Additional work is necessary to debug the switched-capacitor bias circuit. In

addition to this, a possible improvement would be to tune for the delay time constant

instead of the gain. Coefficient adaptation techniques are a critical component to

enable the RX-FFE in commercial high-speed link receivers. Background calibration

94 CHAPTER 8. CONCLUSIONS

techniques could alleviate the need for delay time or gain tuning by tracking the PVT

variations and absorbing the changes into the coefficients.

Appendix A

FFE coefficient optimization

A.1 Problem formulation

Consider an n-tap FFE with the input pulse response pi(t) and delays with transfer

function D(s). The impulse response of a single delay is

d(t) = L−1 D(s) . (A.1)

The pulse response before delay k is defined to be pk(t) and it follows that

p1(t) = pi(t) (A.2)

pk(t) = pk−1(t) ∗ d(t). (A.3)

The pulses are sampled with period Ts to obtain the discrete sequences

pk[n] = pk(nTs). (A.4)

95

96 APPENDIX A. FFE COEFFICIENT OPTIMIZATION

If we elect tap 2 to be the main tap (i.e. c2 = 1) then the output of the FFE is

po[n] =n∑k=1

ckpk[n]

= p2[n] +∑k 6=2

ckpk[n]. (A.5)

This can be expressed in matrix notation as

po = Pc + p2 (A.6)

where we defined

P =[p1 p3 · · · pn

](A.7)

cT =[c1 c3 · · · cn

]. (A.8)

Adapting the expression for PMR in (2.26) to vector notation

PMR =‖po‖1

‖po‖∞

=‖Pc + p2‖1

‖Pc + p2‖∞(A.9)

where ‖ · ‖k represents the lk-norm. The expression in (A.9) is the objective function

that we want to minimize over the possible values of c (i.e. |ck| ≤ 1). The objective

function contains no penalty for main cursor attenuation, so it is useful to add a

constraint to account for this. The expression in the denominator of (A.9) represents

the main cursor amplitude and we can constrain this to be greater than a threshold.

A.2. BRUTE FORCE SOLUTION 97

Finally, we have the optimization problem

minimizec

‖Pc + p2‖1

‖Pc + p2‖∞subject to |ck| ≤ 1, k = 1, 3, . . . , n

‖Pc + p2‖∞ ≥ threshold.

(A.10)

The numerator and denominator of the objective function are convex, but the ratio

of two convex functions is not convex [25]. In addition, the final inequality constraint

is not convex. As a result, additional techniques are necessary to find the optimal

coefficients.

A.2 Brute force solution

One method to solve this problem is to leverage the fact that the coefficients are

quantized in practical FFE realizations so there is a finite set of possible coefficient

values. In this method, the objective function in (A.9) is evaluated for each valid

set of coefficients that satisfy the constraints. The following code is an example

implementation of this method in MATLAB.

Listing A.1: Brute force PMR optimization in MATLAB

function [ c opt ] = b r u t e f o r c e ( ps , N, th r e sho ld )

i f nargin < 3 ; th r e sho ld = 0 ; end ;

num taps = s ize ( ps , 2 ) ;

c i n i t = ones ( num taps−1, 1)∗(−N) ;

p2 = ps ( : , 2 ) ;

P = ps ( : , [ 1 , 3 : num taps ] ) ;


pmr opt = i n f ;

c = nan ;

while ˜ i s e q u a l ( c , c i n i t )

i f isnan ( c ) ; c = c i n i t ; end ;

po = P∗c/N + p2 ;

i f max( po ) >= thre sho ld

pmr = sum(abs ( po ) )/max( po ) ;

i f pmr < pmr opt

pmr opt = pmr ;

c opt = [ c (1)/N; 1 ; c ( 2 : end)/N ] ;

end

end

c = g e t n e x t c (N, c ) ;

end

end

function next c = g e t n e x t c (N, c )

next c = c ;

for k = 1 : length ( c )

i f c ( k ) == N

next c ( k ) = −N;

else

next c ( k ) = c ( k ) + 1 ;

break ;

end

end

end

A.3. MATLAB OPTIMIZATION TOOLBOX 99

A.3 MATLAB optimization toolbox

Another method is to use constrained nonlinear optimization techniques. MATLAB

supports these methods through the optimization toolbox and the function fmincon().

Because the objective function is not convex in this case, the solution can converge

to a local optimum and is dependent on the initial conditions. The benefit is that

the solution is found much faster than the brute force method in appendix A.2. The

following code is the implementation of this method used in this work.

Listing A.2: Nonlinear PMR optimization in MATLAB with fmincon()

function c opt = optim pmr opt ( ps , thresho ld , x0 )

num taps = s ize ( ps , 2 ) ;

i f nargin < 3

x0 = zeros ( num taps−1, 1 ) ;

else

x0 = x0 ( [ 1 , 3 : num taps ] ) ;

end

Ap = ps ( : , [ 1 , 3 : num taps ] ) ;

bp = ps ( : , 2 ) ;

pmr = @(p) sum(abs (p ) )/max(p ) ;

c r e a t e p u l s e = @( x ) Ap∗x+bp ;

fun = @( x ) pmr( c r e a t e p u l s e ( x ) ) ;

f o r c e e q u a l = true ;

i f f o r c e e q u a l

nonlcon = @( x ) dea l ( [ ] , th r e sho ld − max( c r e a t e p u l s e ( x ) ) ) ;

else


nonlcon = @( x ) dea l ( th r e sho ld − max( c r e a t e p u l s e ( x ) ) , [ ] ) ;

end

lb = −ones ( num taps−1, 1 ) ;

ub = +ones ( num taps−1, 1 ) ;

Aeq = [ ] ;

beq = [ ] ;

A = [ ] ;

b = [ ] ;

opt i ons = opt imset ( ’ MaxFunEvals ’ , 10000 , . . .

’ Algorithm ’ , ’ ac t ive−s e t ’ , . . .

’ Disp lay ’ , ’ none ’ ) ;

x = fmincon ( fun , x0 , A, b , Aeq , beq , . . .

lb , ub , nonlcon , opt ions ) ;

c opt = [ x ( 1 ) ; 1 ; x ( 2 : end ) ] ;

end

Appendix B

Equivalence of first-order delays in

FFEs

The objective of this appendix is to show that an N -tap FFE constructed with delays

of the form

Dα1(s) =1− 1

2α1sτ

1 + 12sτ

(B.1)

having the associated transfer function

Hα1(s) =N−1∑n=0

cnDnα1

(s) (B.2)

can be transformed into an equivalent FFE with delays Dα2(s) and Hα2(s) = Hα1(s)

by an appropriate linear transformation of the coefficients.

To show this, we substitute M = N − 1 to simplify the alegebra and then ex-

pand the expression for Hα(s) into a polynomial of (12sτ)m using binomial expansion

101

102 APPENDIX B. EQUIVALENCE OF FIRST-ORDER DELAYS IN FFES

obtaining

Hα(s) =M∑n=0

cnDnα(s)

=M∑n=0

cn(1− 1

2αsτ)n

(1 + 12sτ)n

=

∑Mn=0 cn(1 + 1

2sτ)M−n(1− 1

2αsτ)n

(1 + 12sτ)M

=

∑Mn=0 cn

(∑M−nk1=0

(M−nk1

)(1

2sτ)k1

)(∑nk2=0

(nk2

)(−α)k2(1

2sτ)k2

)(1 + 1

2sτ)M

=

∑Mn=0 cn

(∑M−nk1=0

∑nk2=0

(M−nk1

)(nk2

)(−α)k2(1

2sτ)k1+k2

)(1 + 1

2sτ)M

=

∑Mn=0 cn

(∑Mm=0 amn(1

2sτ)m

)(1 + 1

2sτ)M

. (B.3)

The numerator of the final expression can be written more elegantly in matrix form

as

bTAαc =[(1

2sτ)M (1

2sτ)M−1 . . . 1

2sτ 1

]a00 a01 . . . a0N

a10 a11 . . . a1M

......

. . ....

aM0 aM1 . . . aMM

c0

c1

...

cM

(B.4)

so that

Hα(s) =bTAαc

(1 + 12sτ)M

. (B.5)

To identify the expression for amn, notice that terms containing (12sτ)m require k1 +

k2 = m. Therefore, the double summation over k1 and k2 from above contributes at

most one term to (12sτ)m for each k1 and can be reduced to a single summation over

the index k. This requires the substitutions k1 → k and k2 → m − k. The double

103

summation will only contribute a term when there exists a k that simultaneously

meets the constraints 0 ≤ k ≤ M − n and 0 ≤ m − k ≤ n. This is equivalent to a

single constraint of the form max(0,m − n) ≤ k ≤ min(M − n, m). Therefore, we

conclude that

amn =

min(M−n,m)∑k=max(0,m−n)

(M − nk

)(n

m− k

)(−α)m−k. (B.6)

From this we see that the matrix Aα depends only on the order of the FFE and α.

Now we can write the transfer functions Hα1(s) and Hα2(s) as

Hα1(s) =bTAα1cα1

(1 + 12sτ)M

(B.7)

Hα2(s) =bTAα2cα2

(1 + 12sτ)M

. (B.8)

From this, we see that to have the equality Hα1(s) = Hα2(s) we need

Aα2cα2 = Aα1cα1 (B.9)

→ cα2 = A−1α2Aα1cα1 . (B.10)

A common case is the transformation from ideal Pade delays into some pole and zero

offset, α. For this case, α1 → 1 and α2 → α resulting in

cα = A−1α A1c1

= Mαc1 (B.11)

where we defined

Mα = A−1α A1. (B.12)

A MATLAB function to generate the matrix Aα is listed here for reference.

104 APPENDIX B. EQUIVALENCE OF FIRST-ORDER DELAYS IN FFES

Listing B.1: MATLAB function to generate the matrix Aα.

function A = create A (N, alpha )

A = nan (N, N) ;

for m = 0 :N−1

for n = 0 :N−1

A(m+1, n+1) = create a mn (m, n , N, alpha ) ;

end

end

end

function a mn = create a mn (m, n , N, alpha )

a mn = 0 ;

for k = max(0 , m−n ) :min(N−1−n , m)

a mn = a mn + (−1)ˆ(m−k)∗ alpha ˆ(m−k ) . . .

∗binom (N−1−n , k )∗binom (n , m−k ) ;

end

end

function y = binom (n , k )

y = f a c t o r i a l (n)/ f a c t o r i a l (n−k )/ f a c t o r i a l ( k ) ;

end

Appendix C

Pade approximants

The Pade approximant is the best rational function approximation to the Taylor

series of a function in the sense that it matches the Taylor coefficients to the highest

possible order [18]. Mathematically, this can be expressed for a function, f(x), as the

rational function, Rm/n(x), with order m in the numerator and n in the denominator

Rm/n(x) =a0 + a1x+ · · ·+ amx

m

1 + b1x+ · · ·+ bnxn(C.1)

such that

f(0) = R(0)

f (1)(0) = R(1)(0)

f (2)(0) = R(2)(0) (C.2)

...

f (m+n)(0) = R(m+n)(0)

where f (n)(0) represents the nth derivative of f(x) at x = 0.

105

106 APPENDIX C. PADE APPROXIMANTS

As an example, consider f(x) = ex with the associated Taylor series

f(x) =∞∑n=0

xn

n!= 1 + x+

1

2x2 + · · · (C.3)

and m = n = 1 so that

R1/1(x) =a0 + a1x

1 + b1x. (C.4)

The terms a0, a1, and b1 constitute three degrees of freedom that allow us to match

the Taylor series to third order. Applying the constraints we see that

f(0) = 1 = a0 = R(0) (C.5)

f (1)(0) = 1 = a1 − a0b1 = R(1)(0) (C.6)

f (2)(0) = 1 = 2b1(a0b1 − a1) = R(2)(0). (C.7)

Therefore a0 = 1, b1 = −12, and a1 = 1

2and we see that

R1/1(x) =1 + 1

2x

1− 12x. (C.8)

Continuing in this fashion, the second-order case is

R2/2(x) =1 + 1

2x+ 1

12x2

1− 12x+ 1

12x2

(C.9)

and the third-order case is

R3/3(x) =1 + 1

2x+ 1

10x2 + 1

120x3

1− 12x+ 1

10x2 − 1

120x3. (C.10)

Appendix D

Low-frequency nonlinearity

simulation

D.1 Problem formulation

This appendix presents three methods for simulating the Taylor series coefficients

for two-dimensional functions. The expressions derived are for the case with voltage

inputs and current output, but these methods apply equally well to the other possible

cases. For this case, the goal is to determine the coefficients gkl in the Taylor series

expansion

io(vi, vo) =∑k, l

gklvki v

lo. (D.1)

Two of the methods rely on transient simulations while the final method uses dc

sweep simulations.

107

108 APPENDIX D. LOW-FREQUENCY NONLINEARITY SIMULATION

D.2 Transient simulation

For the transient methods, the inputs are set to

vi(t) = A1 cos(2πf0t+ φ1) (D.2)

vo(t) = A2 cos(2πMf0t+ φ2) (D.3)

(D.4)

where M is an integer. Sampling at Ts = 1Nf0

by substituting t→ nf0N

we obtain

vi[n] = A1 cos(

2πn

N+ φ1

)(D.5)

vo[n] = A2 cos(

2πMn

N+ φ2

)(D.6)

io[n] =∑k, l

gklvki [n]vlo[n] (D.7)

where N M is an integer. For both of the following methods, a practical choice is

A1 = A2 = 10 mV.

D.2.1 DFT method

The first method to extract the coefficients is in the frequency domain by taking

the DFT of the output current. This method is practical for lab measurements

because the magnitude of the DFT can be approximately measured using a spectrum

analyzer and highly-linear sinusoidal inputs can be generated with signal generators

and bandpass filters.

D.2. TRANSIENT SIMULATION 109

0 20 40 60

−150

−100

−50

0

DFT Bin Number

Magnitude(dB)

∝ g10A1

∝ g20A2

∝ g30A3

∝ g01A1

∝ g02A2

∝ g03A3∝ g21A

3

∝ g11A2

∝ g12A3

Figure D.1: DFT of io[n] with A1 = A2 = 10 mV.

The DFT of the output current is

Io[m] ≡ 1

NDFT(io[n])

=1

N

N−1∑n=0

io[n]e−j2πmnN . (D.8)

The nonlinearity of the transconductor mixes vi and vo which can be observed in the

DFT plot in figure D.1. The following derivation shows that the coefficients can be

determined from their associated DFT bin.

Using the identity

cos(x) =1

2

(ejx + e−jx

)(D.9)

we can see that io[n] is the sum of complex exponentials of the form

Cej2πknN (D.10)

where k is an integer and C is a complex constant. In particular, we see that


vki [n]vlo[n] =

(1

2A1e

jφ1

)k (1

2A2e

jφ2

)lej2π

nN

(k+lM)

+

(1

2A1e

−jφ1)k (

1

2A2e

−jφ2)le−j2π

nN

(k+lM) + · · · (D.11)

Taking into consideration the identity

1

N

N−1∑n=0

ej2πknN =

1 if k = 0 (mod N)

0 otherwise

(D.12)

we can conclude that the term Io[m] depends only on the terms in io[n] containing

ej2πmnN . Therefore,

Io[1] ≈ g10A1

2ejφ1 (D.13)

Io[M ] ≈ g01A2

2ejφ2 (D.14)

Io[k + lM ] ≈ gkl

(A1

2ejφ1

)k (A2

2ejφ2

)l. (D.15)

If the sign of g10 and g01 are known, then

g10 ≈ sgn(g10)2|Io[1]|A1

(D.16)

g01 ≈ sgn(g01)2|Io[M ]|A2

. (D.17)

Alternatively, if φ1 and φ2 are known, then

g10 ≈ 2Io[1]

A1ejφ1(D.18)

g01 ≈ 2Io[M ]

A2ejφ2. (D.19)


For either case, the expression for all other Taylor coefficients is

gkl ≈Io[k + lM ](

Io[1]g10

)k (Io[M ]g01

)l . (D.20)

The following MATLAB code determines the Taylor coefficients using this method.

Listing D.1: MATLAB function to find two-dimensional Taylor coefficients using the

transient DFT method.

function gtrans = c a l c g t r a n s f f t ( f , x , y , th r e sho ld )

i f nargin < 4 ; th r e sho ld = 1e−12; end ;

Ax = max(abs ( x ) ) ;

Ay = max(abs ( y ) ) ;

[ ˜ , kx ] = max(abs ( f f t ( x ) ) ) ; kx = kx − 1 ;

[ ˜ , ky ] = max(abs ( f f t ( y ) ) ) ; ky = ky − 1 ;

N=length ( f ) ;

inds = 1 :N;

f f f t=f f t ( f ) ;

n o t c o r r e c t e d = [ ones (1 , N/2) , zeros (1 , N/ 2 ) ] ;

n o n c o r r e c t e d f f f t = abs ( f f f t ( find ( n o t c o r r e c t e d ) ) ) ;

while any( n o n c o r r e c t e d f f f t > th r e sho ld )

v a l i d i n d s = find ( n o n c o r r e c t e d f f f t > th r e sho ld ) ;

[ ˜ , min ind ] = min( n o n c o r r e c t e d f f f t ( v a l i d i n d s ) ) ;

nonco r r e c t ed inds = inds ( find ( n o t c o r r e c t e d ) ) ;

temp = nonco r r e c t ed inds ( v a l i d i n d s ) ;

m in ind ac tua l = temp ( min ind ) ;

n o t c o r r e c t e d ( min ind ac tua l ) = 0 ;


k = round ( ( min ind actua l −1)/ky ) ;

j = abs (round ( ( min ind actua l−1 − k∗ky )/ kx ) ) ;

x j y k f f t = f f t ( x . ˆ j .∗ y . ˆ k ) ;

g t rans ( j +1, k+1) = real ( x j y k f f t ( min ind ac tua l ) ’ . . .

∗ f f f t ( min ind ac tua l ) )/ abs ( x j y k f f t ( min ind ac tua l ) ) ˆ 2 ;

n o n c o r r e c t e d f f f t = abs ( f f f t ( find ( n o t c o r r e c t e d ) ) ) ;

end

D.2.2 LMS method

The LMS method outlined in this section provides more accurate Taylor coefficient

values as compared to the last section. The limitation is that this method is not

practical for measured data.

First we define the vectors

vTi =[vi[0] · · · vi[N − 1]

](D.21)

vTo =[vo[0] · · · vo[N − 1]

](D.22)

iTo =[io[0] · · · io[N − 1]

](D.23)

gT =[g00 · · · g03 g10 · · · g33

](D.24)

and the matrix

A =[v0i v0

o · · · v0i v3

o v1i v0

o · · · v3i v3

o

](D.25)

where the operator represents the element-wise Hadamard product. Then we see


that

Ag ≈ io (D.26)

and we can solve for the optimum coefficients in the LMS sense as

g? = (ATA)−1AT io. (D.27)



transient LMS method.

function gtrans = c a l c g t r a n s l m s ( f , x , y , N)

i f nargin < 4 ; N = 3 ; end ;

x = x ( : ) ; % make column v e c t o r

y = y ( : ) ; % make column v e c t o r

f = f ( : ) ; % make column v e c t o r

A = [ ] ;

for j = 0 :N

for k = 0 :N

A = [A, x . ˆ j .∗ y . ˆ k ] ;

end

end

gtrans = A\ f ;

g t rans = reshape ( gtrans , [N+1, N+1 ] ) ’ ;


D.3 DC simulation

This method provides an alternative to the transient methods from the previous sec-

tion. The Taylor coefficients are calculated with the derivatives that are approximated

from the finite differences. A two-dimensional dc sweep simulation gives the terms

io[m, n] = io(m∆v, n∆v) (D.28)

where m and n are integers and ∆v is the step size. With these points, the Taylor

coefficients can be calculated as

gMN =1

M !N !(2∆v)M+N

M∑m=0

N∑n=0

(−1)m+n

(M

m

)(N

n

)io[M − 2m, N − 2n]. (D.29)

Smaller ∆v values result in errors due to small differences in output current value.

Large ∆v can exceed the radius of convergence of the Taylor series causing errors.

Practical values for ∆v are in the range between 1 mV and 10 mV.



dc discrete differences.

function gdc = ca l cgdc ( f , x , y , N)

i f nargin < 4 ; N = 3 ; end ;

dx = d i f f ( x ( 1 : 2 ) ) ;

dy = d i f f ( y ( 1 : 2 ) ) ;

c en t e r = ( s ize ( f ) + 1 )/2 ;

gdc=nan (N+1, N+1);

for j =1:N+1

for k=1:N+1

D.3. DC SIMULATION 115

gdc ( j , k)= der iv2 ( f , dx , dy , center , j −1, k−1) . . .

/ f a c t o r i a l ( j −1)/ f a c t o r i a l (k−1);

end

end

end

function d = der iv2 ( z , dx , dy , xy , ordx , ordy )

i f ordy==0

cvy = [ 1 ] ;

indy = [ 0 ] ;

else

cvy =[1 ,−1] ;

for k = 1 : ordy−1

cvy=conv ( cvy , [ 1 , − 1 ] ) ;

end

i f mod( ordy ,2)==0

indy =[ordy /2:−1:−ordy / 2 ] ;

else

indy =[ordy :−2:−ordy ] ;

end

end

d=0; m=1;

for k=xy(2)+ indy

i f ordx==0

d=d+cvy (m)∗ z (k , xy ( 1 ) ) ;

else

d=d+cvy (m)∗ de r i v ( z (k , : ) , dx , xy ( 1 ) , ordx ) ;


end

m=m+1;

end

d=d/( dy∗(2ˆmod( ordy , 2 ) ) ) ˆ ordy ;

end

function d = der i v (y , dx , n , order )

i f s ize (y ,1)> s ize (y , 2 )

y=transpose ( y ) ;

end

cv =[1 ,−1] ;

for k = 1 : order−1

cv=conv ( cv , [ 1 , − 1 ] ) ;

end

i f mod( order ,2)==0

d=sum( y (n+[ order /2:−1:− order / 2 ] ) . ∗ cv )/ dxˆ order ;

else

d=sum( y (n+[ order :−2:− order ] ) . ∗ cv )/(2∗dx )ˆ order ;

end

end

Appendix E

Unity-gain stage nonlinearity

E.1 Analog-inverter transconductor

The current through a MOSFET is a two-dimensional function of the gate-source

voltage and gate-drain voltage. We define the Taylor series of these functions for the

PMOS and the NMOS as

isd,p =∑j,k

g(p)jk v

jsg,pv

ksd,p (E.1)

ids,n =∑j,k

g(n)jk v

jgs,nv

kds,n. (E.2)

Kirschoff’s current law for the analog-inverter transconductor in figure E.1 gives the

equation

io = ids,n − isd,p

=∑j,k

(g

(n)jk − (−1)j+kg

(p)jk

)vji v

ko

=∑j,k

Gjkvji vko (E.3)

117

118 APPENDIX E. UNITY-GAIN STAGE NONLINEARITY

vi vo

io

isd,p

ids,n

Figure E.1: Schematic diagram of theinverter transconductor for nonlinear-ity analysis.

vi vo

io1 io2

Figure E.2: Schematic diagram of theunity-gain stage for nonlinearity anal-ysis.

where we defined

Gjk = g(n)jk − (−1)j+kg

(p)jk . (E.4)

E.2 Unity-gain stage

The current of the input transcoductor in figure E.2 is a two-dimensional function of

the input and output voltages with the Taylor series

io1 =∑j,k

Gjkvji vko (E.5)

where the coefficients Gjk are defined in (E.4). The self-biased load in figure E.2 is a

width-scaled replica of the input transconductor. That is, it has the same quiescent

gate-source voltage and gate-drain voltage, but the transistor widths are scaled by β.

Therefore, the nonlinear I − V characteristics of the load are simply scaled by β and

we can write

io2 = β∑j,k

Gjkvjovko . (E.6)

E.2. UNITY-GAIN STAGE 119

The output voltage will be a a function of the input voltage with the Taylor series

vo =∑n

anvni . (E.7)

Kirschoff’s current law for the output node gives the equation

io1 + io2 =∑j,k

Gjkvko

(vji + βvjo

)= 0. (E.8)

E.2.1 First-order case

Solving for a1 can be achieved through traditional methods resulting in

a1 =G10

β(G10 +G01) +G01

. (E.9)

For a1 = 1 we need

β(G10 +G01) +G01 = G10 (E.10)

and after solving for β we find

β =G10 −G01

G10 +G01

=Ai − 1

Ai + 1(E.11)

where Ai = G10

G01is the intrinsic gain of the inverter transconductor. For Ai 1, the

scale factor is unity.

E.2.2 Second-order case

Equating the terms from (E.8) containing v2i gives the expression

−a2(βG10 + (1 + β)G01) = G20(1 + β)−G11(1− β) +G02(1 + β) (E.12)


By substituting the equality in (E.10) and rearranging, we find the second-order

coefficient to be

a2 = −G20

G10

(1 + β) +G11

G10

(1− β)− G02

G10

(1 + β). (E.13)

E.2.3 Third-order case

Equating the terms from (E.8) containing v3i gives the expression

−a3(βG10 + (1 + β)G01) = G30(1− β)−G21(1 + β) +G12(1− β)−G03(1 + β)

− a2 (G20(2β) +G11(2β − 1) +G02(2β + 2)) . (E.14)

By substituting the equality in (E.10) and rearranging, we find the third-order coef-

ficient to be

a3 = −G30

G10

(1− β) +G21

G10

(1 + β)− G12

G10

(1− β) +G03

G10

(1 + β)

+ a2

(G20

G10

(2β) +G11

G10

(2β − 1) +G02

G10

(2β + 2)

). (E.15)

E.3 Comparison with simulation

Using the method decribed in appendix D.2, the Taylor coefficients, Gjk, and an

are simulated for the unity-gain stage designed as described in §6.1. The Taylor

coefficients, Gjk, for the load transconductor are listed in table E.1. The intrinsic

gain of the transconductor is Ai = 5.84 which corresponds to β = 0.71. From these

E.3. COMPARISON WITH SIMULATION 121

Table E.1: Transistor-level simulated Taylor coefficient values, Gjk, for the degener-ated inverter transconductor load.

Gjk k = 0 k = 1 k = 2 k = 3

j = 0 0 0.83 mS −0.18 mS/V 0.17 mS/V2

j = 1 4.85 mS −0.94 mS/V −2.43 mS/V2 —

j = 2 −1.52 mS/V −3.74 mS/V2 — —

j = 3 −13.9 mS/V2 — — —

values we calculate the second-order coefficient to be

a2 = − G20

G10

(1 + β)︸︷︷︸0.53 V−1

+G11

G10

(1− β)︸︷︷︸−0.06 V−1

− G02

G10

(1 + β)︸︷︷︸0.06 V−1

= 0.53 V−1 (E.16)

which is in reasonable agreement with the directly simulated case where

a2sim = 0.64 V−1. (E.17)

The discrepancy is due to the fact that the scale factor, β, is implemented by gener-

ating the load with triode devices, which distorts the relationship between the input

and load transconductor Taylor coefficients. This can also be accounted for, but

the analysis is complex and of little added value. This simulation suggests that a

reasonable approximation for the second-order coefficient is

a2 ≈G20

G10

(1 + β). (E.18)


Similarly, for the third-order coefficient

a3 = −

0.84 V−2︷︸︸︷G30

G10

(1− β) +

−1.32 V−2︷︸︸︷G21

G10

(1 + β)−

0.15 V−2︷︸︸︷G12

G10

(1− β) +

0.06 V−2︷︸︸︷G03

G10

(1 + β)

+ a2G20

G10

(2β)︸︷︷︸−0.24 V−2

+ a2G11

G10

(2β − 1)︸︷︷︸−0.04 V−2

+ a2G02

G10

(2β + 2)︸︷︷︸−0.07 V−2

= −0.62 V−2 (E.19)

as compared to the directly simulated case

a3sim = −0.68 V−2. (E.20)

A reasonable approximation is

a3 ≈G30

G10

(1− β) +G21

G10

(1 + β) + a2G20

G10

(2β). (E.21)

It is important to note that, while the term a2 can be reduced with a pseudo-

differential implementation, its contribution to (E.21) can not be removed.

Appendix F

Unity-gain stage supply rejection

F.1 Single-ended

For the single-ended case, the traditional small-signal analysis is sufficient. For the

unity-gain stage in figure F.1, the supply rejection is

vo

vdd

=2gm,p + 2gds,p

gm,p + gm,n + 2gds,p + 2gds,n

≈ 2gm,p

gm,p + gm,n

≈ 1. (F.1)

Similarly, the ground rejection is

vo

vss

=2gm,n + 2gds,n

gm,p + gm,n + 2gds,p + 2gds,n

≈ 2gm,n

gm,p + gm,n

≈ 1. (F.2)

In other words, supply noise passes directly to the output.

123

124 APPENDIX F. UNITY-GAIN STAGE SUPPLY REJECTION

vdd

vss

vo

Vic

Figure F.1: Schematic diagram of the unity-gain stage for supply rejection analysis.

F.2 Pseudo-differential

A pseudo-differential implementation rejects the single-ended supply noise to the

extent that the circuit is balanced. But even a fully-balanced pseudo-differential

circuit is not completely immune to supply noise. To see this, assume that the

transistors have the Taylor series

isd,p =∑j,k

g(p)jk v

jsg,pv

ksd,p (F.3)

ids,n =∑j,k

g(n)jk v

jgs,nv

kds,n. (F.4)

The output of the unity-gain stage is a function of both the input voltage, vi, and the

supply voltage, vdd, which can be represented with the two-dimensional Taylor series

vo =∑m,n

amnvmi v

ndd. (F.5)

The term a10 is the linear gain from the input to the output and a01 is the linear gain

from the supply to the output. For a pseudo-differential implementation, the output

F.2. PSEUDO-DIFFERENTIAL 125

voltage is

vod =∑m,n

amn

(vid

2

)mvndd −

∑m,n

amn

(−vid

2

)mvndd

=∑

m odd, n

2amn

(vid

2

)mvndd. (F.6)

The first-order supply gain term, a01, is canceled and the first-order input gain term,

a10, remains, as expected. The most dominant term that is not canceled by the

pseudo-differential implementation is a11vidvdd. This product can be thought of as

the input signal mixed with the supply noise with a conversion gain a11. The analysis

to find the expression for a11 is messy, but a reasonably accurate approximation is

a11 = −g(p)20

g(p)10

. (F.7)

For accuracy, this expression should be extracted from transistor-level simulations,

but some intuition can be drawn from a square-law approximation. If we assume

ISD,p =1

2κp(VSG,p − Vt,p)2 (F.8)

then the non-zero terms are

g(p)10 =

∂ISD,p

∂VSG,p

= κp(VSG,p − Vt,p) (F.9)

g(p)20 =

1

2

∂2ISD,p

∂V 2SG,p

=1

2κp (F.10)

and substituting into the expression for a11 we see that

a11 = −1

2

1

(VSG,p − Vt,p)= −1

4

(gm,p

ISD,p

). (F.11)

126 APPENDIX F. UNITY-GAIN STAGE SUPPLY REJECTION

Therefore, we can expect the conversion gain to be greater than unity, but less than

ten.

Appendix G

Switched-capacitor tuning circuit

As discussed in §5.6.2, the delay has the largest contribution to the total FFE offset.

To mitigate this problem the triode gate voltages, Vbp and Vbn, can be biased to adjust

for the gain and common mode of the delays as depicted in the schematic diagram in

figure G.1. Figure G.2 shows the contour lines of unity gain and half-supply common

mode in the space of Vbp and Vbn. The intersection of these lines is the optimal bias

point where both objectives are simultaneously achieved: the gain is unity and the

common mode is half of the supply. It is possible to tune for PVT variations by

finding this optimal bias point for a replica gain stage and mirroring the bias voltages

to the delays in the core FFE.

One conceivable tuning method is to create a feedback loop for the replica gain

stage that sets Vbp so that the common mode is VDD/2. For this case, sweeping

Vbn and plotting Vbp produces the contour line of constant common mode as in figure

G.2. The issue with this simple solution is revealed when mismatch is introduced into

the gain stage. Figure G.3 (top) shows a simulation of the contour lines of constant

common mode for 100 Monte Carlo points. The device mismatch causes these lines

to vary significantly. This effect is due to the feedback loop adjusting Vbp to also

compensate for the mismatch of the delays in setting the common mode. The gain

127

128 APPENDIX G. SWITCHED-CAPACITOR TUNING CIRCUIT

C

Gm GmGmGm

Vbp

Vbn

vovi

Figure G.1: The single-path Pade-inspired delay schematic diagram with gate-voltagetunable triode-degenerated load transconductor.

from the triode devices to the output common mode is small and large variations

are necessary to compensate for small mismatches. The goal of the replica-based

tuning circuit is to adjust for global PVT variations and, as such, it is desirable to

tune the common mode without compensating for the mismatch. To accomplish this,

the common mode with the mismatch of the non-degenerated gain stage is measured

by shorting out the triode devices. This common mode is defined to be the natural

0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

Constant GainConstant CMMonte Carlo

Vbn (V)

Vbp(V

)

Figure G.2: Contour lines of con-stant gain and common mode and 50point Monte Carlo simulation of con-verged bias voltages for the switched-capacitor circuit in figure G.4.

0.5 0.6 0.7 0.8 0.9 10

0.3

0.6

0.5 0.6 0.7 0.8 0.9 10

0.3

0.6

Vbn (V)

Vbn (V)

Vbp(V

)Vbp(V

)

Figure G.3: Monte Carlo simulation ofthe contour lines of constant commonmode with the output common modeforced to (top) half of the supply and(bottom) the natural common mode.

129

common mode expressed as VCMN. The lines of constant common mode are plotted

in figure G.3 (bottom) for a Monte Carlo simulation with the feedback loop modified

to force the common mode to this natural common mode. The spread is significantly

reduced, as expected. This calibration technique requires multiple phases which can

be achieved with the switched-capacitor circuit shown in figure G.4(a).

The calibration of gain and common mode is completed in four phases which are

depicted in figure G.4(b). Phase 1 and 2 make a small step in Vbn towards unity gain.

Phase 3 and 4 make a small step in Vbp towards the natural common mode.

During phase 1, the amplifier is auto-zeroed while −vi is sampled on one capacitor

and Avi on the other where A is the gain of the stage1. During phase 2, the voltage

is integrated onto Cf1 updating Vbn with the voltage

∆Vbn = vi(A− 1)Cs1

Cf1

. (G.1)

When A > 1 the bias voltage Vbn is increased and for A < 1 it is decreased. The gain

is monotonically decreasing for increasing Vbn, therefore, this acts as a restorative

force to set the gain to unity. The step size is proportional to the error in a fashion

similar to the LMS algorithm.

During phase 3, the amplifier is auto-zeroed while VCM−VCMN is sampled on Cs2.

During phase 4, the voltage is integrated onto Cf2 updating Vbp with the voltage

∆Vbp = (VCM − VCMN)Cs2

Cf2

. (G.2)

The natural common mode is a constant and VCM is monotonically decreasing for

increasing Vbp, therefore, this acts as a restorative force to set the common mode

equal to the natural common mode. Similar to the update step for Vbn, the step size

1The reference voltage VCM− vi is generated by a self-biased inverter transconductor with equalNMOS and PMOS widths, resulting in a voltage that is some value vi less than VCM.

130 APPENDIX G. SWITCHED-CAPACITOR TUNING CIRCUIT

Cp2

Cp1

Cs2

Cf2

CL

Cf1

CL

Cs1

Cs1

Cbp

Cbn

4

4

4

d4 4

4

1

1

2

2

2

d2 1

1

Gm Gm

1

2

4

2

d2

d4

4

1

Phase 1 Phase 2 Phase 3 Phase 4(b)

(a)

VCM - vi

Vbn

Vbp

to FFE

delays...

to FFE

delays...VCM

VCMN

full-size replica

unity-gain stage

Figure G.4: (a) The schematic diagram for the switched-capacitor gain and commonmode replica tuning circuit and (b) the associated clock phase diagram.

131

is proportional to the error.

The amplifier is implemented as a traditional two-stage for high gain to minimize

second-order offset effects not canceled by the auto-zeroing. This switched-capacitor

circuit is sensitive to leakage current so high-voltage I/O devices are used in the

amplifier with a 1.8 V supply. The current is 300 µA per amplifier for a total power

consumption of 1.1 mW. An input clock with frequency 500 kHz is used to generate

the clock phases in figure G.4(b) with each phase having a period of 2 µs. The output

converges within 1 ms in post-layout simulations. The plot in figure G.2 shows the

converged bias voltages for plot-layout Monte Carlo simulations for 50 points.

Unfortunately, there was an issue during testing that prevented a complete physi-

cal verification in the proof-of-concept design. The measured outputs Vbp and Vbn on

the oscilloscope showed large spikes occurring at the clock frequency. This issue was

not reproducible with post-layout simulations. A possible source of this problem is

the undesired overlap of adjacent clock phases. The internal clocks are not accessible

off chip so this could not be verified.

Appendix H

Gain compression generalization

For a nonlinear system defined by the expression

y = a1x+ a2x2 + a3x

3 + · · · (H.1)

the terms anxn can contribute signal directly proportional to x that is indistinguish-

able from the linear contribution of the term a1x. This effect is referred to as gain

compression because the contribution is typically opposite in sign from the linear

term. Here we define the linear contribution of xn as the gain error αn. For the

simple case with x = cos(ωt), we can use the trigonometric identity

cos3(ωt) =1

4(3 cos(ωt) + cos(3ωt)) (H.2)

to see that

α3 =3

4. (H.3)

For a general input x, a closed-form expression is not always known. In this case, the

statistical properties of the signal can be used for the analysis. To find the general

132

133

expression for αn we minimize the distortion power expression

var(xn − αnx) = var(xn)− 2αnE[xn+1] + α2nE[x2] (H.4)

by differentiating and setting equal to zero obtaining

∂

∂αnvar(xn − αnx) = −2E[xn+1] + 2αnE[x2] = 0 (H.5)

→ αn =E[xn+1]

E[x2](H.6)

with the associated distortion power

dn = var(xn − αnx)

= var(xn)− α2nE[x2]. (H.7)

For the zero-mean unit-variance sinusoid xs =√

2 cos(ωt), the distortion is

d3s = 410

16− 4

9

16=

1

4(H.8)

This expression is useful in characterizing the linearity of a system. The ratio of the

distortion power for target application input signal to that that of the sinusoidal case

can be used to predict the total distortion from conventional sinusoidal measurement

techniques. For example, a zero-mean unit-variance normally distributed signal, xg,

has the third-order distortion power

d3g = 15− 32 · 1 = 6. (H.9)

In other words, the output distortion power for a normally distributed input will

be 24× that of the case for a sinusoidal input of equal power. This factor is a

134 APPENDIX H. GAIN COMPRESSION GENERALIZATION

strong function of the signal characteristics. For example, the zero-mean unit-variance

square-wave signal xsq = sgn(cos(ωt)) has zero third-order distortion power. This can

be seen from the definition of the distortion power where

d3sq = 1− 12 · 1 = 0. (H.10)

Bibliography

[1] M. El-Chammas, “Background calibration of timing skew in time-interleaved

A/D converters,” Ph.D. dissertation, Stanford University, August 2010.

[2] M. Shanbhag. (2011, Apr.) 100 Gb/s simulated backplane channels. [Online].

Available: http://ieee802.org/3/100GCU/public/channel.html

[3] E. Mammei, F. Loi, F. Radice, A. Dati, M. Bruccoleri, M. Bassi, and A. Maz-

zanti, “A power-scalable 7-tap FIR equalizer with tunable active delay line for

10-to-25Gb/s multi-mode fiber EDC in 28nm LP-CMOS,” in ISSCC Dig. Tech.

Papers, Feb. 2014, pp. 142–143.

[4] “IEEE approved draft standard for ethernet amendment 2: Physical layer spec-

ifications and management parameters for 100 Gb/s operation over backplanes

and copper cables,” IEEE P802.3bj, Apr. 2014.

[5] J. D’Ambrosia, P. Mooney, and M. Nowell. (2013, May) 400 Gb/s ethernet:

Why now? [Online]. Available: http://goo.gl/n574PC

[6] J. F. Bulzacchelli, “Equalization for electrical links,” IEEE Solid-State Circuits

Mag., vol. 7, no. 4, pp. 23–31, 2015.

135

136 BIBLIOGRAPHY

[7] K. Smith, A. Wang, and L. Fujino, “Through the looking glass II - part 1 of 2:

Trend tracking for ISSCC 2013,” IEEE Solid-State Circuits Mag., vol. 5, no. 1,

pp. 71–89, 2013.

[8] B. Zhang, A. Nazemi, A. Garg, N. Kocaman, M. R. Ahmadi, M. Khanpour,

H. Zhang, J. Cao, and A. Momtaz, “A 195mW / 55mW dual-path receiver AFE

for multistandard 8.5-to-11.5 Gb/s serial links in 40nm CMOS,” in ISSCC Dig.

Tech. Papers, Feb. 2013, pp. 34–35.

[9] D. Cui, H. Zhang, N. Huang, A. Nazemi, B. Catli, H. G. Rhew, B. Zhang,

A. Momtaz, and J. Cao, “A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-

end with programmable gain control and analog peaking in 28nm CMOS,” in

ISSCC Dig. Tech. Papers, Jan. 2016, pp. 58–59.

[10] E.-H. Chen, R. Yousry, and C.-K. K. Yang, “Power optimized ADC-based serial

link receiver,” IEEE J. Solid-State Circuits, vol. 47, no. 4, pp. 938–951, Apr.

2012.

[11] A. Momtaz and M. Green, “An 80 mW 40 Gb/s 7-tap T/2-spaced feed-forward

equalizer in 65 nm CMOS,” IEEE J. Solid-State Circuits, vol. 45, no. 3, pp.

629–639, Mar. 2010.

[12] M. N. Sadiku, Elements of Electromagnetics. New York, NY: Oxford University

Press, 2001, vol. 428.

[13] J. Baker-Jarvis, M. D. Janezic, B. Riddle, C. L. Holloway, N. Paulter, and

J. Blendell, “Dielectric and conductor-loss characterization and measurements

on electronic packaging materials,” NIST, Boulder, CO, Tech. Rep. 1520, July

2001.

BIBLIOGRAPHY 137

[14] N. Wiener and Y. Lee, “Electrical network system,” U.S. Patent 2 124 599, July

26, 1938.

[15] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and Systems. Pear-

son, 2014.

[16] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits. Cam-

bridge University Press, 2003.

[17] R. Schaumann and M. E. V. Valkenburg, Design of Analog Filters. Oxford

University Press, USA, 2001.

[18] H. Pade, Sur la representation approchee d’une fonction par des fractions ra-

tionnelles. Gauthier-Villars et fils, 1892, no. 740.

[19] K. Bult and H. Wallinga, “A CMOS analog continuous-time delay line with

adaptive delay-time control,” IEEE J. Solid-State Circuits, vol. 23, no. 3, pp.

759–766, June 1988.

[20] S. K. Garakoui, E. A. M. Klumperink, B. Nauta, and F. F. E. V. Vliet, “A 1-

to-2.5GHz phased-array IC based on gm-RC all-pass time-delay cells,” in ISSCC

Dig. Tech. Papers, Feb. 2012, pp. 80–82.

[21] R. L. Geiger and E. Sanchez-Sinencio, “Active filter design using operational

transconductance amplifiers: a tutorial,” IEEE Circuits and Devices Mag., vol. 1,

no. 2, pp. 20–32, 1985.

[22] M. J. M. Pelgrom, H. P. Tuinhout, and M. Vertregt, “Transistor matching in

analog CMOS applications,” in IEDM Dig. Tech. Papers, Dec. 1998, pp. 915–

918.

[23] B. Widrow and S. Streans, Adaptive Signal Processing. Prentice-Hall, Inc., 1985.

138 BIBLIOGRAPHY

[24] R. Boesch, K. Zheng, and B. Murmann, “A 0.003 mm2 5.2 mW/tap 20 GBd

inductor-less 5-tap analog RX-FFE,” in VLSI Circuits Dig. Tech. Papers, to be

published.

[25] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University

Press, 2004.

[26] V. Balasubramanian. (2009, May) EyeMax 3m 28 AWG cable as-

sembly vs IEEE P802-3ba spec. Draft 2.0. [Online]. Available:

http://ieee802.org/3/ba/public/channel.html

[27] B. Nauta, “A CMOS transconductance-C filter technique for very high frequen-

cies,” IEEE J. Solid-State Circuits, vol. 27, no. 2, pp. 142–153, Feb. 1992.

[28] P. Patel. (2009, Sept.) 1 meter backplane channel. [Online]. Available:

http://ieee802.org/3/100GCU/public/channel.html

[29] V. Stojanovic, M. Horowitz, J. Zerbe, K. Yang, and W. Ellersick. (2003, May)

Stanford EE371 lecture notes: High-speed links (lecture 16). [Online]. Available:

http://goo.gl/kcvPe9