A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFE for an … · 2016-04-08 · A Blind Baud-Rate...

A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFEfor an ADC-Based Receiver

by

Clifford Ting

A thesis submitted in conformity with the requirementsfor the degree of Masters of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2013 by Clifford Ting

A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFEfor an ADC-Based Receiver

Clifford Ting

Master of Applied Science, 2013

Graduate Department of Electrical and Computer Engineering

University of Toronto

Abstract

This thesis describes two design ideas in the area of ADC-based receivers.

The first contribution of thesis is a 10Gb/s blind baud-rate CDR. The blind baud-

rate operation, which is made possible by using a 2UI integrate-and-dump filter, creates

intentional ISI in adjacent bit periods. The blind samples are interpolated to recover

center-of-the-eye samples for a speculative Mueller-Muller PD and a 2-tap DFE operation.

The 65nm CMOS test chip has a measured high-frequency jitter tolerance of 0.19UIPP

at ±300ppm of frequency offset.

The second contribution of this thesis is a digital zero-forcing adaptive DFE. The

DFE coefficients are calculated by correlating data samples with the recovered bits. Sim-

ulations show that the adaptive taps converge to the ISI values on the pulse response of

the data signal. The CDR and adaptive 2-tap DFE have a high-frequency jitter tolerance

of 0.28UIPP when simulated at 10Gb/s with an 8” FR4 channel.

ii

Acknowledgements

The work described in this thesis would not have been possible without the help and

support of many people.

I would like to thank my supervisor, Professor Ali Sheikholeslami, for his support

and guidance during my M.A.Sc. studies and for being a great teacher. His optimism

encouraged me to continue measuring and eventually publish the blind baud-rate test

chip, even though it did not work initially.

I would also like to the thank the thesis committee members, Professor Tony Chan

Carusone, Professor Antonio Liscidini, and Professor Andreas Moshovos for reviewing

the thesis and for their valuable feedback.

My gratitude goes to Fujitsu Laboratories Ltd. for sending me to their office in

Kawasaki, Japan to tape out the test chip. I am grateful to everyone who made my

visit an enjoyable one – in particular, thank you to Masaya Kibune, Hirotaka Tamura,

Takuji Yamamoto, Kouichi Kanda, Takayuki Hamada, Junji Ogawa, Hirotaka Yamazaki,

Yasumoto Tomita, and Iwao Sugiyama. I would like to thank Tamura-san for sharing

his ideas with me and for always encouraging research discussions. A special thank you

goes to Kibune-san who stayed late to keep me company during the tapeout, made sure

I had everything I needed during my stay in Japan, and took time on weekends to give

me a tour of Kyoto and Tokyo, even though he was very busy with his own work.

Thank you to all the graduate students in BA5000 and BA5158 for making my time

in graduate school a wonderful experience. In particular, the contributions in this thesis

could not have been done without Josh Liang’s and Sadegh Jalali’s help during de-

iii

sign and measurement of the test chips. Also, I would like to thank the previous and

current students of Professor Sheikholeslami’s research group for their friendship and

valuable discussions: Shayan Shahramian, Safeen Huda, Behrooz Abiri, Sadegh Jalali,

Ravi Shivnaraine, Aynaz Vatankhahghadim, Josh Liang, Neno Kovacevic, and Farhad

Ramezankhani.

Most of all, I would like to thank my parents for their unconditional love and support.

Thank you for always being there for me.

iv

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4

2.1 Channel effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Linear Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2 Decision-Feedback Equalization (DFE) . . . . . . . . . . . . . . . 12

2.4 Equalizer Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Zero-Forcing (ZF) Method . . . . . . . . . . . . . . . . . . . . . . 14

2.4.2 Minimum Mean Square Error (MMSE) Method . . . . . . . . . . 19

2.4.3 Maximum Eye Opening Method . . . . . . . . . . . . . . . . . . . 22

2.5 Clock and Data Recovery (CDR) . . . . . . . . . . . . . . . . . . . . . . 23

2.5.1 Phase-Tracking CDR with Clock Feedback . . . . . . . . . . . . . 24

2.5.2 Blind Feed-forward CDR . . . . . . . . . . . . . . . . . . . . . . . 31

2.5.3 Blind CDR with Feedback . . . . . . . . . . . . . . . . . . . . . . 34

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

v

3 A Blind Baud-Rate CDR 36

3.1 Blind 1x Data Recovery Concepts . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Proposed 1x Blind Receiver Architecture . . . . . . . . . . . . . . . . . . 39

3.3 Receiver Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.1 Integrate-and-Dump Filter . . . . . . . . . . . . . . . . . . . . . . 42

3.3.2 Clock Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.3 Data Interpolator . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.4 Mueller-Muller Phase Detector . . . . . . . . . . . . . . . . . . . 46

3.3.5 Decision-Feedback Equalizer . . . . . . . . . . . . . . . . . . . . . 48

3.3.6 Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.4 Simulation and Measurement Results . . . . . . . . . . . . . . . . . . . . 50

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 A Zero-Forcing Adaptive DFE for an ADC-Based CDR 60

4.1 Proposed DFE Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2 Proposed Blind ADC-Based Receiver Architecture . . . . . . . . . . . . . 62

4.3 Proposed Digital CDR with Adaptive 2-tap DFE . . . . . . . . . . . . . 63

4.3.1 Data Interpolator . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.3.2 Low-Pass Filter for DFE Adaptation . . . . . . . . . . . . . . . . 68

4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Conclusion 78

5.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2.1 Implementation of a Fully Feed-Forward Blind Baud-Rate CDR . 79

5.2.2 Evaluation of Phase-Dependent DFE for Data Interpolators . . . 80

5.2.3 Adaptive Optimization of Offset Coefficient in MMPD . . . . . . 80

vi

5.2.4 Calibration of I&D and ADC Front End . . . . . . . . . . . . . . 80

References 80

vii

List of Tables

4.1 Comparison of Adapted Coefficients (c1 and c2) vs. Pulse Response (h1

and h2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

viii

List of Figures

2.1 The basic components of a communication system . . . . . . . . . . . . . 4

2.2 An example of a channel frequency response and the effect on an isolated

data pulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Intersymbol interference (ISI) when transmitting a ’1111’ sequence . . . . 6

2.4 Comparison of (a) binary and (b) ADC-based receivers . . . . . . . . . . 8

2.5 (a) Linear and (b) non-linear receiver equalizers . . . . . . . . . . . . . . 10

2.6 Frequency response of combined channel and linear equalizer . . . . . . . 11

2.7 Source-degenerated continuous time linear equalizer . . . . . . . . . . . . 11

2.8 A 3-tap DFE example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.9 A speculative 1-tap DFE . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.10 An example of channel+FFE pulse response (h(t)) and Nyquist response

(g(t)). ISI is the difference between the two responses (r=g-h). . . . . . . 15

2.11 An example of a receiver with a channel (with 2 pre-cursor and 4 post-

cursor taps of ISI) and a 2-tap FFE . . . . . . . . . . . . . . . . . . . . . 16

2.12 A partial model of a discrete-time receiver with channel and FFE . . . . 16

2.13 A geometric representation of optimal zero-forcing FFE coefficients . . . 17

2.14 A model of a discrete-time receiver, including a ZF adaptation loop . . . 18

2.15 A example of minimizing average error by using steepest-descent algorithm 20

2.16 A model of a discrete-time receiver with a DFE and LMS adaptation loop 21

2.17 A system that adapts equalizer taps based on eye opening . . . . . . . . 22

ix

2.18 A recovered clock sampling equalized data . . . . . . . . . . . . . . . . . 23

2.19 CDR classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.20 System diagram of phase-tracking CDR with clock in feedback loop . . . 25

2.21 Example of a jitter tolerance chart . . . . . . . . . . . . . . . . . . . . . 26

2.22 (a) PD inputs and output and (b) linear model . . . . . . . . . . . . . . 26

2.23 Alexander PD implementation . . . . . . . . . . . . . . . . . . . . . . . . 27

2.24 Alexander PD examples with early and late CKRX . . . . . . . . . . . . 27

2.25 Transfer function of Alexander PD with no jitter on data or CKRX . . . 27

2.26 Hogge PD implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.27 Hogge PD output with (a) early, (b) on-time, and (c) late CKRX . . . . . 28

2.28 Transfer function of Hogge PD . . . . . . . . . . . . . . . . . . . . . . . . 29

2.29 Example of (a) pulse response and (b) MM function [21] . . . . . . . . . 30

2.30 System diagram of a 8x oversampled blind feed-forward (burst-mode)

CDR [22,27] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.31 The edge detection and data selection process from Figure 2.30 . . . . . . 32

2.32 A blind 2x ADC-based CDR [32] . . . . . . . . . . . . . . . . . . . . . . 32

2.33 A blind 1.45x ADC-based CDR [33] . . . . . . . . . . . . . . . . . . . . . 33

2.34 System diagram of blind CDR with feedback [10] . . . . . . . . . . . . . 34

2.35 Analog data interpolator (DI) estimates center and edge samples from

blind samples [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1 Worst-case for 2x, 1.45x and 1x sampling on open eye diagram . . . . . . 37

3.2 Comparison of theoretical worst-case jitter tolerance given the pulse re-

sponses of an ideal channel, 1UI I&D, and 2UI I&D. Blind baud-rate

samples can shift across a 1UI range due to frequency offset. . . . . . . . 38

3.3 System block diagram of interleaved analog front end (1 UI I&D and ADC)

and digital CDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 Comparison of (a) fully analog 2UI I&D and (b) analog and digital 2UI I&D 40

x

3.5 Handling (a) negative frequency offset: data (TX) is slower than blind

receiver clock (CKRX) (b) positive frequency offset: data (TX) is faster

than blind receiver clock (CKRX) . . . . . . . . . . . . . . . . . . . . . . 41

3.6 Implementation of integrate-and-dump (I&D) circuit [28] . . . . . . . . . 42

3.7 I&D operating phases synchronized with clock pulses . . . . . . . . . . . 43

3.8 Implementation of clock pulse generator with adjustable delay for deskew 43

3.9 (a) Effect of clock phase skew on the I&D integration period (b) Equal

I&D integration periods after correcting clock skew . . . . . . . . . . . . 44

3.10 Adjustable clock delay block . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.11 Piecewise linear interpolation of desired sample from blind samples . . . 45

3.12 (a) Pulse response of an ideal channel followed by 2UI I&D (b) Proposed

MM function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.13 Design and implementation of the speculative Mueller-Muller phase detec-

tor (MMPD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.14 (a) A speculative 2-tap DFE and (b) the first stage of the parallel specu-

lative DFE that recovers 8 bits per cycle . . . . . . . . . . . . . . . . . . 48

3.15 The second stage of parallel speculative DFE that recovers 16 bits per cycle 49

3.16 Loop filter with configurable proportional and integral gains . . . . . . . 50

3.17 Simulated loop filter convergence with 1000ppm of frequency offset for

PRBS-7. Signals correspond to nodes on the block diagram of Fig. 3.16 . 51

3.18 Frequency response of channel models in simulation . . . . . . . . . . . . 52

3.19 Simulated eye diagrams using Channel A + 2UI I&D . . . . . . . . . . . 52

3.20 Simulated eye diagrams using Channel B . . . . . . . . . . . . . . . . . . 53

3.21 Simulated jitter tolerance results at 10Gb/s with a BER of 10−6 . . . . . 54

3.22 Chip photo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.23 Measurement setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.24 Average ADC output given DC input (a) before and (b) after skew correction 56

xi

3.25 Measured channel frequency response . . . . . . . . . . . . . . . . . . . . 57

3.26 Measured eye diagrams (a) after the channel and (b) after the ADC ADC 57

3.27 Simulated and measured jitter tolerance results with 10Gb/s PRBS-7 in-

put data and BER of 10−6 and 10−12, respectively . . . . . . . . . . . . . 58

4.1 ISI can be calculated by correlating sampled data (Ak, Ak−1, etc.) with

recovered bits (xk, xk−1, etc.) . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Zero-forcing controller for n-tap DFE adaptation . . . . . . . . . . . . . . 61

4.3 System diagram of proposed receiver with 3-bit ADC-based CDR and

adaptive DFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.4 Data interpolator calculates sample at desired location from closest blind

samples. (a) Negative or (b) positive frequency offsets result in occasional

skipped or extra interpolated samples . . . . . . . . . . . . . . . . . . . . 63

4.5 Proposed digital CDR with adaptive DFE . . . . . . . . . . . . . . . . . 64

4.6 Piecewise linear interpolation of desired sample from 2x blind samples . . 66

4.7 Frequency responses of 1x and 2x data interpolators. Both interpolators

operate on a 10Gbps data signal with a Nyquist frequency of 5GHz. . . . 67

4.8 Low-pass filter for DFE coefficients . . . . . . . . . . . . . . . . . . . . . 68

4.9 Hysteresis block implemented in low-pass filter . . . . . . . . . . . . . . . 68

4.10 Frequency responses of channel models used in simulation . . . . . . . . . 69

4.11 Combined channel and interpolator pulse responses showing ISI tap values

(h−1, h0, h1, h2, h3) when CDR has locked . . . . . . . . . . . . . . . . . 70

4.12 Simulated DFE adaptation with Channel C at 10Gbps. DFE converges

to same steady-state values when given different initial coefficients (i.e. 0

and 30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.13 Simulated DFE adaptation with Channel D at 10Gbps. DFE converges

to same steady-state values when given different initial coefficients (i.e. 0

and 30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xii

4.14 Simplified diagram of CDR model used for eye diagram simulations . . . 72

4.15 Simulated eye diagrams with 5Gbps data and Channel C. Eye diagrams

correspond to signals in Figure 4.14 . . . . . . . . . . . . . . . . . . . . . 73

4.16 Simulated eye diagrams with 10Gbps data and Channel C. Eye diagrams


4.17 Simulated eye diagrams with 10Gbps data and Channel D. Eye diagrams


4.18 Simulated jitter tolerance of proposed receiver . . . . . . . . . . . . . . . 76

xiii

List of Acronyms

ADC Analog-to-Digital Converter

BER Bit-Error Rate

CDR Clock and Data Recovery

CTLE Continuous-Time Linear Equalizer

DFE Decision Feedback Equalizer

DJ Deterministic Jitter

CP Charge pump

DI Data Interpolator

FFE Feed-Forward Equalizer

FIR Finite Impulse Response

FR4 A type of glass-reinforced epoxy laminate printed circuit board

Gb/s Gigabits per second

Gbps Gigabits per second

ISI Intersymbol Interference

LMS Least Mean Square

xiv

MMPD Mueller-Muller Phase Detector

MMSE Minimum Mean Square Error

NRZ Non-Return-to-Zero

PCB Printed Circuit Board

PD Phase Detector

PI Phase Interpolator

PRBS Pseudo-Random Binary Sequence

PVT Process, Voltage and Temperature

RJ Random Jitter

UI Unit Interval

USB Universal Serial Bus

VCO Voltage-Controlled Oscillator

ZF Zero-Forcing

xv

1 Introduction

The rapid improvements in processor speeds and other digital computation have enabled

the development of applications such as Internet and video conferencing. These applica-

tions have, in turn, caused a growing demand for high-speed data communication. One

particular trend is the centralization of processing and storage resources in cloud com-

puting centers [4,9]. While cloud computing reduces complexity and power consumption

of client devices (e.g. laptops, tablets, cell phones, etc.), it comes at a cost of requiring

high bandwidth communication between the client device and computing center [4].

This thesis focuses on the part of the communication system that transfers digital data

from one chip to another via wireline channels. In recent years, the wireline channels

used in chip-to-chip communications have not improved at the same rate that silicon

technologies have advanced. In addition, the desire to minimize production costs has

limited the number of I/Os and channels available to each chip. Hence, we are forced to

send ever increasing amounts of data per channel in the presence of channel imperfections.

Accordingly, new circuit innovations are required in order to achieve higher data rates.

1.1. Motivation

The main channel imperfection that limits data rates is the channel’s bandwidth. As

data rates increase, the data signal experiences frequency-dependent attenuation due to

the dielectric and conductive losses in the channel [12, 15]. Analog circuits can be used

1

Chapter 1. Introduction 2

to equalize the signal and recover the digital data [13, 20, 30]. Compared to their digital

counterparts, the analog circuits consume less power, but are more vulnerable to process,

temperature, and voltage (PVT) variations. As the technology scales to smaller device

sizes and lower voltages, analog circuits benefit less from the technology advances and,

in fact, are at a disadvantage because they require voltage headroom. Digital circuits,

on the other hand, port easily and perform better with each successive technology.

The role of a clock-and-data recovery (CDR) block is to recover the transmitted data

on the receiver side by sampling the data signal either with a phase-tracking [3,7,11,37]

or blind clock [1,32,36]. Sampling with a blind clock removes the feedback loop between

analog and digital circuits and results in faster development because the analog and

digital blocks can be designed independently.

Digital equalizers and CDRs require a high-speed analog-to-digital converter (ADC).

However, the ADC consumes significant power. One of the goals of this thesis is to

increase data rate without increasing ADC power. We can accomplish this by reducing

the ADC’s sampling rate while maintaining the bit rate. The first part of this thesis

proposes a CDR that can recover data from blind, baud-rate samples. This will be in

contrast with the previous work where 2x [32,36] or 1.45x [33] the baud rate was used as

the sampling rate.

The second part of this thesis proposes an adaptive DFE for the blind, baud-rate

CDR. The proposal simplifies the DFE controller compared to a previous LMS adaptive

DFE for blind CDRs [1].

1.2. Thesis Objectives

This thesis presents the design and implementation of a blind baud-rate CDR for an

ADC-based receiver, and the architecture of a zero-forcing adaptive DFE for a blind

CDR. The main objectives of the thesis are as follows:

• Provide a background on different types of adaptive equalizers and clock-and-data

Chapter 1. Introduction 3

recovery systems,

• Investigate and propose a CDR to recover data from blind, baud-rate ADC samples,

• Present the implementation, simulation, and measurement results to show proposed

CDR’s functionality,

• Investigate and propose an adaptive DFE for a blind CDR,

• Present the implementation, and simulation results to show the DFE controller

functionality; measurement is left as future work.

1.3. Thesis Outline

The remaining chapters of this thesis are organized as follows:

• Chapter 2 provides a background on different types of adaptive equalizers and

clock-and-data recovery systems,

• Chapter 3 describes the concept of blind, baud-rate data recovery, proposes a CDR

architecture, and presents simulation and measurement results,

• Chapter 4 proposes a novel DFE controller for the CDR developed in Chapter 3,

and presents simulation results,

• Chapter 5 concludes the thesis and provides the future directions for this work.

2 Background

This purpose of this chapter is to present the basic concepts needed to understand the

contributions of this thesis and to review existing architectures of some of the blocks used

in high-speed system for communicating digital data. The communication system shown

in Figure 2.1 consists of three main components: a transmitter, channel, and receiver.

The channel is the physical medium (e.g. wireline, wireless, or optical) that connects the

transmitter to the receiver. In this thesis, we will focus on systems designed for electrical

wireline channels. Two examples of wireline channels include the traces on an FR4 PCB

and copper wire in Ethernet and USB cables.

Transmitter

(TX)Channel

Receiver

(RX)

Digital

Data

Recovered

Data

Figure 2.1: The basic components of a communication system

The transmitter converts the digital data into a signal that can be transmitted across

the channel (e.g. electrical pulses with NRZ coding). The receiver samples the signal

at the other end and recovers the digital data. Bit errors occur when the recovered

data does not match the original data at the transmitter. The goal of the high-speed

communication system is to minimize the bit error rate (BER). For wireline applications,

the target BER is usually below 10−12.

At the time of this writing, the term, ”high-speed,” refers to the sending and receiving

of data on the order of gigabits per second (Gbps). Wireline channels have not improved

4

Chapter 2. Background 5

much as data rates have increased. Hence, the transmitter and receiver must compensate

for the non-idealities of the channel (e.g. bandwidth limitations) in order to reduce the

BER below the target rate while minimizing their power consumption.

In this chapter, we will focus on the blocks and different architectures of the receiver

and omit the details of transmitter because this thesis contributes in the area of receiver

architecture. The chapter is organized as follows. Section 2.1 describes channel non-

idealities and their effect on the transmitted data signals. Sections 2.3 and 2.4 describe

how equalizers compensate for channel bandwidth limitations and how to adapt an equal-

izer to match a particular channel. Section 2.5 discusses different types of clock-and-data

recovery blocks.

2.1. Channel effects

The top of Figure 2.2 shows an example of a frequency response of the channel from

the transmitter to the receiver (also known as the S21 parameter). The frequency, fb,

is the baud rate of the data (e.g. if the data rate is 10Gbps with NRZ coding, then fb

would be 10GHz). We are mostly interested in the channel frequency response up to

fb/2 since the data pattern with the highest transition density (”01010101. . .”) has a

frequency of fb/2. In a FR4 channel, the skin effect of the copper trace and dielectric

loss from the surrounding PCB substrate cause the channel response to be attenuated

at high frequencies. As we increase the data rate, the channel will further attenuate the

signal at fb/2.

The bottom of Figure 2.2 shows the transmitted and received data pulses. Assuming

NRZ coding, the transmitted pulse has a duration of Tb=1/fb. The figure shows a digital

”1” being sent; if a digital ”0” were being sent, then pulse amplitude would be negative.

The transmitted pulse is a Nyquist pulse; if we sample the pulse at baud rate, fb, then we

would have only one non-zero sample at h0. However, the channel’s frequency-dependent

attenuation spreads the pulse energy into adjacent Tb bins (h−1, h1, h2, etc.). If we


TX RXChannel

fb

Ch

an

ne

l F

req.

Re

sp

on

se

h0

h1 h2

Tb=1/fb

h0

Pu

lse

Re

sp

on

se

h1h2h-1

Freq.

TimeTdelay + Δt

Figure 2.2: An example of a channel frequency response and the effect on an isolated datapulse

transmit a sequence of bits, as depicted in Figure 2.3, then the received signal would

become a superposition of pulses.

Bk-2h2

Bk-1h1

Bk+1h-1

Bkh0

Figure 2.3: Intersymbol interference (ISI) when transmitting a ’1111’ sequence

Let xk represent a sample in the received signal. Our goal is to recover the transmitted

bit, Bk, from the sample, xk. However, due to the spreading of the pulse energy, xk

also includes components from previous bits (Bk−1, Bk−2) and future bits (Bk+1). This


interference is known as intersymbol interference (ISI). The example in Figure 2.3 includes

3 ISI components in addition to Bkh0. If the ISI components are left uncompensated

they can corrupt the recovery of Bk and cause bit errors.

xk = Bk+1h−1 +Bkh0 +Bk−1h1 +Bk−2h2 (2.1)

In general, the sample, xk, can be expressed as the following:

xk = Bkh0︸︷︷︸Main cursor

+∑i<0

Bk−ihi︸︷︷︸Precursor ISI

+∑i>0

Bk−ihi︸︷︷︸Post-cursor ISI

(2.2)

The ISI caused by previous and future bits are known, respectively, as pre-cursor and

post-cursor ISI. In order to successfully recover the transmitted bit, Bk, the main cursor

must be the dominant cursor. If the main cursor is dominant, then data eye diagram will

be open.

|h0| >∑i 6=0

|hi| (2.3)

In addition to ISI, the channel also introduces a propagation delay, Tdelay, shown in

Figure 2.2, between the transmitted and received pulses. Tdelay is a constant delay when

transmitting a given pulse response over a given channel. However, since we usually

design the transmitter and receiver to work with a range of channels, Tdelay is not known

at the time of design.

In practice, the time when signal the arrives at the receiver usually deviates from

Tdelay. This timing deviation is defined as jitter (shown as ∆t in Figure 2.2) and can be

modeled as a random process. In general, jitter can be split into two components [18]:

deterministic jitter (DJ) and random jitter (RJ). DJ is bounded while RJ is unbounded

and has a Gaussian distribution. Channel imperfections (e.g. bandwidth limitations,

reflections, crosstalk, and electromagnetic interference) can cause part of the DJ. RJ is


mostly caused by noise from the circuits in the transmitter and receiver (e.g. thermal,

shot, and flicker noise) [18].

In order to compensate for the channel’s non-idealities, a typical receiver contains two

main blocks: equalizer and clock-and-data recovery (CDR) block. Equalizers are com-

monly used to reduce pre-cursor and post-cursor ISI in order to fulfill Equation 2.3. The

CDR block recovers the data below the target BER by compensating for the propagation

delay and jitter.

2.2. Receiver

Figure 2.4a shows a conventional binary receiver with a phase-tracking clock in a feedback

loop. The analog equalizer reduces ISI in order to open the eye. The eye opening allows

the comparator to sample the received data where the error probability is at its minimum

and regenerate the data bit. The comparator is followed by a CDR that detects the phase

of the equalized data signal and aligns the rising edge of the clock (CKREC) with the

center of the data eye. This kind of receiver is binary because the sampling comparator

only captures the sign of incoming data signal. Since equalization requires both sign and

magnitude, all of the equalization must be done before the comparator in the analog

domain.

CDRAnalog

Equalizer

Data

CDRAnalog

Equalizer

DataDigital

EqualizerADC

CKREC

CKREC

(a)

(b)

Figure 2.4: Comparison of (a) binary and (b) ADC-based receivers


Alternatively, we can transform the binary receiver into an ADC-based receiver by

replacing the comparator with an ADC as illustrated in Figure 2.4b. By incorporating an

ADC in the receiver, we capture both sign and magnitude of the signal after the analog

equalizer. Now that we have obtained magnitude information, it becomes possible to

perform additional equalization in the digital domain after the ADC. The digital equalizer

and CDR architectures (discussed in Sections 2.3 and 2.5) can be implemented entirely

with HDL code. The main disadvantages of ADC-based receivers are the high power

consumption and large area of the ADC. However, there are several benefits of a digital

equalizer and CDR implementation:

• Digital blocks are immune to PVT variations (assuming that timing constraints are

met across all corners)

• HDL code is easily ported across technology nodes using automatic synthesis, place,

and route software, whereas analog blocks must be manually designed for each

process technology.

• The digital blocks scale with more advanced technology nodes (which often benefit

digital blocks more than analog ones).

• The digital equalizer can be easily combined with a digital controller. Furthermore,

the adaptive controller may benefit by gaining access to both sign and magnitude

information. In analog equalizers, the adaptation controller often only has access

to sign information.

2.3. Equalization

Equalization can be implemented at the transmitter (pre-equalizer), receiver (post-equalizer),

or both. It is easier to perform post-equalization for two reasons. First, the equalizer may

change the signal swing, which, if implemented in the transmitter, alters the amplitude

of output signal. In many cases, the transmitter is designed for a set of communication


standards that impose constraints on the signal swing in the channel. The standards do

not impose constraints on the signals internal to the receiver. Second, adaptive equal-

ization is more easily implemented in the receiver compared to the transmitter. The

adaptive controller requires information about the channel response, which can be esti-

mated at the receiver. If the adaptive controller were implemented at the transmitter, it

would require feedback to be sent back from the receiver through an auxiliary channel.

For these reasons, some systems include a small amount of constant equalization at the

transmitter and adaptively equalize most of the frequency-dependent attenuation at the

receiver [36]. In this thesis, we will focus on post-equalizers.

There are two broad categories of equalizers: linear and non-linear. A receiver may

include one type or both types of equalizers. We will discuss them in more detail in

Sections 2.3.1 and 2.3.2.

Linear

EQ

C(z)

Partially Equalized

Signal (xK’)

Data

Signal

Recovered

Data (AK)

(xK)

yK

wK

Non-Linear Equalizer

Figure 2.5: (a) Linear and (b) non-linear receiver equalizers

2.3.1. Linear Equalization

The main purpose of equalization is to improve BER by decreasing the ISI caused by the

high-frequency attenuation of the channel. If we can cascade the channel and equalizer

such that the overall response is flat up to fb/2, then most of the ISI will be eliminated.

A linear equalizer achieves the final flat response by either emphasizing (i.e. boosting)

the high frequency content or by de-emphasizing (i.e. attenuating) the low frequency

content in the data signal. Figure 2.6 shows the latter.


fb/2

Channel Response

fb/2

=

fb/2

Linear EQ Response Channel + Linear EQ

Response

Figure 2.6: Frequency response of combined channel and linear equalizer

A linear equalizer can be implemented with a continuous-time or discrete-time ar-

chitecture. Figure 2.7 depicts an example of a commonly-used continuous-time linear

equalizer (CTLE). Usually, RS or CS is made programmable so that the zero at fz can be

adjusted to match the channel response. CTLEs can only be implemented with analog

circuits.

Vin+ Vin-gm

RS

CS

RL RL

CL CL

VOUT +-

fZ fP1 fP2

AV

AV ≈2πRLCL

1

fZ ≈2πRSCS

1

fP1 ≈

1 + (gmRS)/2fP2 ≈

2πRSCS

gmRL

Figure 2.7: Source-degenerated continuous time linear equalizer

A discrete-time linear equalizer (also known as feed-forward equalizer or FFE) can

be implemented either in the analog domain or digital domain (if the receiver is ADC-

based). They usually include an infinite impulse response (IIR) or finite impulse response

(FIR) filter that boosts high-frequency content of the data signal.

The main disadvantage of a linear equalizer is that it boosts not only the high-

frequency content of the signal, but also high-frequency noise. Sources of noise include


thermal noise from the transmitter and receiver, crosstalk, and ADC quantization noise

(in the case of a digital FFE). The latter can add a significant amount of noise to the

signal; hence, we can either increase the ADC resolution to reduce the quantization noise

or reduce the FFE gain (and rely on the decision-feedback equalizer described in the next

section).

2.3.2. Decision-Feedback Equalization (DFE)

A decision-feedback equalizer (DFE) is a non-linear equalizer that removes post-cursor

ISI from the channel using the recovered data and a filter (shown as C(z) in Figure 2.5)

whose pulse response matches the post-cursor ISI of the partially equalized signal. The

DFE response is given by:

wk =∑i>0

Ak−ici (2.4)

We assume a very low BER such that we can approximate the original data, Bk,

with the recovered data, Ak. The optimal equalization occurs when the DFE coefficients

match the ISI taps of the partially equalized signal (i.e. ci = h′i):

yk = x′k − wk = (Bkh′0 +

∑i<0

Bk−ih′i +

∑i>0

Bk−ih′i)−

∑i>0

Ak−ici

yk = Bkh′0 +

∑i<0

Bk−ih′i

(2.5)

One disadvantage is that the DFE cannot remove pre-cursor ISI since the DFE feed-

back path would need future recovered data to estimate the pre-cursor ISI. However,

linear equalizers can reduce pre-cursor ISI and are often used in conjunction with a DFE.

Another disadvantage of the DFE is error propagation. When an incorrect decision is

made, the wrong data is fed back through C(z) and may cause incorrect decisions on

future data.


The DFE’s main advantage is the absence of high-frequency noise amplification. Un-

like a linear equalizer that amplifies both signal and noise, the DFE slicer regenerates

the digital data without noise and the noise-less signal is fed back for equalization.

D Q D Q D Q

Critical

Timing Path

c1 c2 c2

AK-1 AK-2 AK-3

Recovered Data

Partially

Equalized

Signal

DFE FIR = C(z) = c1z-1

+ c2z-2

+ c3z-3

Figure 2.8: A 3-tap DFE example

The filter, C(z), can be implemented as either an FIR or IIR filter. Figure 2.8 il-

lustrates a DFE example with a 3-tap FIR filter. The coefficients c1, c2, and c3 should

be adjustable in order to accommodate different channels. Section 2.4 describes some

adaptive controllers that can set appropriate DFE coefficients for a given channel.

The feedback loop indicated in Figure 2.8 poses a challenge in meeting timing con-

straints during design of the DFE. If the DFE is implemented in the analog domain, the

high capacitance at the adder node slows down the propagation of the feedback signal.

The problem occurs if Ak−1 cannot be recovered and sent back to the adder in time for

the next bit. A digital DFE in an ADC-based receiver would face similar issues – digital

adders are slower than analog ones. One solution to the timing problem is to employ

speculation on the recovered bit. In this thesis, we assume NRZ coding where Bk is

either +1 or -1. Figure 2.9 shows an example of a 1-tap speculative DFE. It subtracts

both c1 and −c1 from the received signal and later selects the correct result using a mux.

The speculation removes the gain and the adder from the feedback loop; only the mux

and register remain in the critical path. The cost of speculation is the area and power


consumed by the extra adder. In particular, speculative DFEs do not scale easily because

the number of adders increases exponentially as we increase the number of DFE taps.

D Q-1

+1

+c1

-c1

Partially

Equalized

Signal

AK-1

Critical path is faster

Figure 2.9: A speculative 1-tap DFE

2.4. Equalizer Adaptation

When designing a receiver, we usually intend the receiver to work with a range of chan-

nels. In addition, the ISI in the received signal may vary with process and temperature.

Hence, we would not know the exact amount of equalization required at the time of

design. As described in Sections 2.3.1 and 2.3.2, both linear and decision-feedback equal-

izers usually include configurable coefficients which can be adjusted by a controller to

obtain an appropriate amount of equalization during receiver operation. This section

describes three different adaptation methods: zero-forcing (ZF), minimum mean square

error (MMSE), and maximum eye-opening.

2.4.1. Zero-Forcing (ZF) Method

A zero-forcing (ZF) equalizer attempts to force all ISI components to zero. If the equalizer

does not have enough taps (i.e. degrees of freedom), to force all ISI to zero, then the

optimal tap values should minimize the mean-squared sum of ISI components. This

section presents a ZF controller for a linear equalizer; we will discuss and compare ZF

and LMS controllers for DFEs at the end of Section 2.4.2.

The ZF analysis and examples in this section are taken from [5] and [13] and repro-

duced here for convenience. First, we will find the optimal equalizer coefficients in terms


of zero-forcing criteria. Second, we will describe a feedback loop that converges to the

optimal coefficients.

Figure 2.10 shows an example of a combined channel and equalizer pulse response,

h(t), and the desired Nyquist response, g(t). The sampled versions of the responses can

be represented with vectors, h and g, respectively, and we assume that p1 and p2 are

constants such that pre-cursor and post-cursor taps outside the range of k− p1 to k+ p2

are zero. In Figure 2.10’s example pulse response, p1 and p2 are 2 and 4, respectively. We

define the ISI vector, r, to be the difference between the actual and desired responses, h

and g, respectively. In this and the next section, note that the transmitted and recovered

data are represented with vectors, bk and ak, instead of Bk and Ak to distinguish the

data vectors from other matrix quantities.

g(t)

h(t)

rk = gk - hk = g(kTb - Tdelay) - h(kTb)

g = [ gk-p1 … gk-1 gk gk+1 … gk+p2 ]T

Define:

h = [ hk-p1 … hk-1 hk hk+1 … hk+p2 ]T

r = [ rk-p1 … rk-1 rk rk+1 … rk+p2 ]T

Figure 2.10: An example of channel+FFE pulse response (h(t)) and Nyquist response (g(t)).ISI is the difference between the two responses (r=g-h).

r = g − h (2.6)


The goal of a ZF equalizer is to minimize the energy of ISI, ‖r‖2.

‖r‖2 = ‖g − h‖2 = (g − h)T (g − h) (2.7)

Figure 2.11 provides a simple example of a system with a channel, FFE, and receiver.

In this case, the FFE is a 2-tap finite impulse response (FIR) filter. In Figure 2.11, f(t)

represents the channel pulse response. However, if the system includes both a CTLE and

FFE, then f(t) would be the convolution of the channel and CTLE responses. The vector,

c, represents the FFE coefficients. bk, yk, and ak represent the source data, equalized

signal, and recovered data, respectively. Figure 2.12 models the system with matrix and

vector quantities.

f(t)bk

Channel

ak

Tb

c1

c2

y(t)x1(t)

x2(t)

Delay element

bk = [ bk+2 bk+1 … bk-4 bk-5 ]T

c = [ c1 c2 ]T

f(-2) f(-1) … f(4) 0

0 f(-2) … f(3) f(4)F =

xk = [ x1(kT) x2(kT) ]T

Figure 2.11: An example of a receiver with a channel (with 2 pre-cursor and 4 post-cursortaps of ISI) and a 2-tap FFE

Fbk

Channel + Linear EQ

cc

yk = xkTc = bk

TF

Tc

ak

xk = Fbk

Figure 2.12: A partial model of a discrete-time receiver with channel and FFE

Given Figure 2.12, we see that the pulse response is h = F T c. Therefore, we substitute


h into Equation 2.7:

‖r‖2 = (g − F T c)T (g − F T c)

= gTg − 2gTF T c+ cTFF T c

(2.8)

In order to find the optimal c that minimizes ISI (i.e. cOPT ), we take the derivative

of Equation 2.8 and set it equal to zero:

∂

∂c(‖r‖2) = 2cTFF T − 2gTF T = 0 (2.9)

cOPT = (FF T )−1Fg (2.10)

As an example, let us assume that the FFE has two taps (i.e. c is a 2x1 vector). As

illustrated in Figure 2.13, h = F T c can be represented as a 2D plane. If g lies on the

plane, then there exist values for the two taps that can compensate the ISI completely.

However, if g is not on the plane, then we can find c = cOPT such that the length of r is

minimum. This occurs when r is orthogonal to the plane spanned by h.

h = FTc

hOPT = FTcOPT

g

r

Figure 2.13: A geometric representation of optimal zero-forcing FFE coefficients

Figure 2.14 shows the model from Figure 2.12 with a ZF feedback loop. The vector nk

represents white noise generated by the receiver’s circuits. The error ek is the difference

between the received sample, yk, and the desired signal, which we generate using the

desired pulse response, g, and recovered data, ak. We define vk = nkT c to be noise

shaped by the FFE coefficients. We also assume a low BER such that ak ≈ bk. In order


Fbk

Channel + Linear EQ

xk yk ak

M

c

Makek

g

Shift

Reg

nk

ek

akController

Figure 2.14: A model of a discrete-time receiver, including a ZF adaptation loop

to show that the feedback loop converges correctly, we find the error, ek, in terms of bk,

r, and vk.

ek = akTg − yk

= bkTg − (F T bk + nk)T c

= bkT (g − q)− nk

T c

= bkT r − vk

(2.11)

The ZF adaptation correlates the error, ek, with the recovered bits, ak. Equation 2.12

takes the average of the correlation term to find the ISI vector, r.

E[akek] = E[ak(bkT r − vk)]

= (E[ak(bkT r − vk)])

= (E[bkbkT ]r − E[bkvk])

= r

(2.12)

The integrator in the feedback loop forces the average of the weighted quantity Makek


to zero. The matrix, M , is a parameter that sets the gain of the feedback loop and

maps ISI taps to the FFE coefficients. To find M , we assume that bk is a sequence of

independent bits such that E[bkbkT ] = I where I is the identity matrix. We also assume

that the data is uncorrelated with noise (i.e. E[bkvk] = 0).

ME[akek] = Mr = 0 (2.13)

M(g − F T c) = 0 (2.14)

c = (MF T )−1Mg (2.15)

By comparing Equations 2.10 and 2.15, we see that the feedback converges to the

optimal tap values if M = uF (where u is a scalar that determines loop gain) or, more

generally, M = UF (where U is a matrix). This result implies that an optimal M

should be selected based on channel and equalizer responses. It appears that, by using

ZF adaptation, we have changed the problem of choosing c into one of choosing M .

However, it turns out that M is a less sensitive parameter compared to c. In practice, M

is chosen based on the worst-case channel that the system is designed for; in other cases,

the adaptation loop will not converge optimally, but will be close enough [5].

2.4.2. Minimum Mean Square Error (MMSE) Method

The minimum mean square error method seeks to minimize the average power of the

error, E[e2k], between the received signal, xk, and the desired signal. The error may

include the effects of both ISI and random noise. This is in contrast with the zero forcing

method where the adaptation algorithm minimizes ‖r‖2, which only includes ISI. An

implementation of a MMSE controller for a DFE is described in [1].

Figure 2.15 illustrates how we can find the MMSE using the steepest descent algo-


rithm. We assume that E[e2k] is well-behaved with respect to the equalizer tap values,

ck = [c1k c2k . . . cik . . . cNk], and that following the gradient at all ck will lead to the

minimum E[e2k]. For each cik, we start with an initial value and increment or decrement

it in the direction of decreasing average error power.

E[e2]

Minimum E[e2]

Increment or

decrement ci in

direction of

decreasing E[e2]

ci

Figure 2.15: A example of minimizing average error by using steepest-descent algorithm

ci(k+1) = cik − u∂E[e2k]

∂cik(2.16)

In a receiver system, it is usually not practical to measure E[e2k]; therefore, we ap-

proximate Equation 2.16 by replacing the expected value with the instantaneous value.

When this approximation is made, the steepest descent algorithm is known as the least

mean square algorithm.

ci(k+1) = cik − u∂(e2k)

∂cik

ci(k+1) = cik − 2uek∂(ek)

∂cik

(2.17)

Equation 2.17 can be applied to any equalizer. Figure 2.16 shows a LMS feedback loop

implemented for a DFE. In order to apply the steepest-descent algorithm, it is necessary

to relate the error, ek, to the DFE coefficients, ck in Equation 2.18.


g

Shift

Reg

xk

yk ak

ek

2u

Controllerck

{ak-1, ak-2, ak-3, …}

DFE

wk

Figure 2.16: A model of a discrete-time receiver with a DFE and LMS adaptation loop

ek = yk −M∑j=1

gjkak−j

ek = (xk −N∑i=1

cikak−i)−M∑j=1

gjkak−j

(2.18)

From Equation 2.18, we can find the derivative of ek with respect to cik:

∂(ek)

∂cik= −ak−i (2.19)

We can substitute Equation 2.19 into Equation 2.17:

ci(k+1) = cik + 2uekak−i (2.20)

We can implement Equation 2.20 as the controller in Figure 2.16. It is possible to

further simplify the controller to replace ek or ak−i or both with only their signs (i.e.

sgn(ek) and sgn(ak−i)). These simplified LMS controllers are respectively known as

sign-error, sign-data, or sign-sign.

It is also interesting to compare the ZF and LMS controllers in Figures 2.14 and 2.16.


If we replace the FFE in Figure 2.14 with a DFE and substituted M = 2uI (where I is

the identity matrix), then the ZF controller would be identical to the LMS controller for a

DFE. This is expected because a DFE does not amplify noise. Therefore, minimizing ISI

(||r||2) and signal error (E[e2k]) at the DFE output should lead to the same solution [12].

2.4.3. Maximum Eye Opening Method

The maximum eye-opening method is another commonly-used algorithm [8, 16, 31] for

adjusting equalizer taps. Figure 2.17 shows a system that uses an eye monitor to measure

eye height or width and feeds the information back to the equalizer through a controller.

It should be noted that optimizing an equalizer to maximize eye height may not lead

to an optimal eye width and vice versa. The eye monitors described in [31] and [16]

measure eye height by comparing the outputs of a main sampler and auxiliary sampler

with a shifted threshold. If the outputs are the same, then the threshold of the auxiliary

sampler is within the eye. Thus, the eye monitor estimates eye height by increasing the

threshold of the auxiliary sampler until the outputs differ.

Equalizer

ControllerEye

Monitor

CDR

Recovered

Clock

Recovered

DataSignal from

Channel

EQ coefficients

Figure 2.17: A system that adapts equalizer taps based on eye opening

The adaptive controllers in [31] and [16] iterate across all possible combinations of

equalizer tap values. For each combination, they determine the eye-opening by plotting

a histogram and, at the end, choose the tap settings that produce the maximum eye-

opening. Compared to the ZF and LMS equalizers, this adaptation method is slower and


cannot run continuously during data recovery because it has to try all of the equalizer

settings. However, the method is more flexible since it can be applied to a variety of

equalizer structures and does not depend on having correctly recovered data.

2.5. Clock and Data Recovery (CDR)

In many wireline communication systems, the clock signal is not transmitted with the

data signal in order to reduce the number of wires and, therefore, the cost of the channel.

In addition, the receiver usually has a plesiochronous clock source (i.e. similar in fre-

quency, but phase and frequency are not matched) with respect to the transmitter data.

Hence, the clock and data recovery (CDR) block’s job is to extract the transmitted clock

and binary data from the data signal in the presence of jitter and frequency offset. One

type of CDR generates a phase-tracking clock whose falling and rising edges align, re-

spectively, with the zero-crossings and centers of the data signal (shown in Figure 2.18).

Then, the CDR samples the data signal with the clock’s rising edge and outputs the

recovered data and phase-tracking clock to downstream digital blocks.

Eye Diagram of

Equalized Data Signal

Recovered

Clock, CKRX

Figure 2.18: A recovered clock sampling equalized data

Another type of CDR blindly samples the data signal with the plesiochronous clock

and post-processes the samples to extract the data bits and phase information. As de-

picted in Figure 2.19, we can classify CDRs into two broad categories where one operates

with a phase-tracking clock and the other with a blind clock. We can further classify

CDRs as having a feedback or feed-forward architecture. In Sections 2.5.1 to 2.5.3, we

will discuss three types of CDRs; burst-mode CDRs [2] are omitted because they are less

relevant to the proposed CDR. Chapter 3 proposes an ADC-based implementation of a


blind-sampling CDR with feedback.

CDR Types

Phase-Tracking

ClockBlind Clock

Feedback

(Conventional)

Feed-forward

(Burst-mode)

Feedback

(Data interpolator)

Feed-forward

(Oversampling)

Figure 2.19: CDR classification

2.5.1. Phase-Tracking CDR with Clock Feedback

Figure 2.20 shows a conventional phase-tracking CDR with clock feedback. The phase

detector (PD) compares the equalized data signal to the recovered clock, CKRX , to esti-

mate the phase difference between them. The PD output is an error signal that is ideally

proportional to the phase difference. The charge pump (CP) is a transconductor that

converts the error signal to a current. The loop filter is a proportional-integral controller

where the resistor, R1, produces a proportional voltage to the current and the capacitor,

C1, integrates the current. The second capacitor, C2, is used to smooth the pulses of

current from the CP and its value is much smaller compared to C1. The voltage from

the loop filter adjusts the frequency (and, indirectly, the phase) of the voltage-controlled

oscillator (VCO) that generates CKRX . When operating in steady state conditions, the

feedback loop forces the phase of CKRX to match that of the incoming data signal.

Although Figure 2.20 shows an CDR with a VCO block, it is also possible to generate

CKRX with a phase interpolator (PI). While a VCO’s frequency is proportional to its

input voltage, PI’s phase is directly proportional to its input signal. Hence, a PI-based

CDR usually has an extra integrator in the loop filter to replace the integrator from the

VCO. PI-based CDRs can be used in multi-transceiver systems to reduce the number of

VCOs (e.g. to avoid coupling between VCOs). On the other hand, PIs are challenging


PD & CP VCO

R1

C1

C2

Loop Filter (LF)

D Q Recovered data (AK)Equalized

Data Signal

PD: phase-detector

CP: charge pump

VCO: voltage-controlled oscillator

CKRX

Figure 2.20: System diagram of phase-tracking CDR with clock in feedback loop

to implement in terms of linearity (i.e. phase output not exactly proportional to the

input signal) and noise (i.e. the PI has a lower output amplitude compared to VCO

output) [10].

We can characterize a CDR’s performance by measuring its jitter tolerance, jitter

transfer, and jitter generation. Jitter tolerance measures the maximum amount of sinu-

soidal jitter between the data signal and CKRX from which the CDR can successfully

recover data given a required BER. Jitter transfer is the amount of jitter the CDR trans-

fers from the data signal to CKRX . Jitter generation is the amount of jitter in CKRX

caused by the CDR’s internal blocks (e.g. VCO). The most important measurement is

jitter tolerance because it directly relates input jitter to BER. A simplified example is

shown in Figure 2.21.

The jitter tolerance curve is separated into two parts by the CDR’s bandwidth. When

the frequency of the input jitter is low, the CDR can shift CKRX to track the center of

the data data eye even if it deviates from the ideal location by more than 0.5UI. However,

when the jitter frequency is higher than the CDR bandwidth, the feedback cannot track

the data eye. At most, the data eye can move the 0.5UI (i.e. 1UIPP ) before a bit

error occurs. In practice, the high frequency jitter tolerance is usually lower than 1UIPP


1UIPP

Jitt

er

tole

rance

(UI P

P)

Jitter Frequency (Hz)

CDR bandwidth

Figure 2.21: Example of a jitter tolerance chart

because the CDR has to recover data in the presence of other components of jitter besides

sinusoidal jitter (e.g. data-dependent jitter, random jitter, etc.).

The PD is an important component of the CDR because it provides the error signal

used to guide the feedback loop (shown in Figure 2.22). In following sections, we will

discuss three types of PDs: Alexander, Hogge, and Mueller-Muller.

KPD

ΦERR

ΦIN

ΦCK

PDOUT

PDΦIN

ΦCK

PDOUT

(a) (b)

Figure 2.22: (a) PD inputs and output and (b) linear model

Alexander (Bang-Bang) Phase Detector

As depicted in Figures 2.23 and 2.24, the Alexander PD, also known as a bang-bang

PD, samples both the edges and centers of the data signal. When a transition occurs,

the PD compares the edge sample to the adjacent center samples to determine if the

clock is early or late with respect to the data signal. In order to capture both center and

edge samples, the Alexander PD must oversample at 2x the baud rate. Alexander PDs

are widely used because they are easily implemented with digital logic, but, as shown in


Figure 2.25, they are highly non-linear when jitter is absent from clock and data. When

jitter exists, the PD can be linearized [17], but its gain is jitter-dependent. This is also

undesirable since we usually cannot predict the jitter in advance.

D Q D Q

D Q D Q

PD

Lo

gic

CKRX

D2

D1

E

DIN

Early

Late

{D1, E, D2}

110 or 001

100 or 011

000 or 111

1

0 1

0

0 0

Early Late

Figure 2.23: Alexander PD implementation

D2D1 E

01 1

CKRX is early

(a) (b)

D2D1 E

01 0

CKRX is late

CKRX

DIN

Figure 2.24: Alexander PD examples with early and late CKRX

-UI/2 -UI/2

ΦERR=ΦIN-ΦCK

PDOUT=Avg(Late – Early)

Figure 2.25: Transfer function of Alexander PD with no jitter on data or CKRX


Hogge Phase Detector

The Hogge PD is depicted in Figure 2.26. In contrast to the Alexander PD, its output

is linear and its gain is independent of jitter. As shown in Figure 2.27, the signal, B, is

a pulse with a constant width of 0.5UI. The other signal, A, measures the time from the

data transition to the rising edge of CKRX . When the rising edge samples the center of

the data eye (Figure 2.27b), the data transition occurs 0.5UI from the rising edge, the

pulses on A and B are equal, and the average PD output is zero. Otherwise, PDOUT is

positive or negative when CKRX is late or early, respectively.

D Q D Q

CKRX

DIN

A

B

PDOUT

BUF1

FF1 FF2

Figure 2.26: Hogge PD implementation

Early

(a)

A

B B BA A

CKRX

DIN

A < B A < B A > B

Avg(PDOUT)<0

PDOUT

+1

-1

Avg(PDOUT)=0 Avg(PDOUT)>0

On time Late

(b) (c)

Figure 2.27: Hogge PD output with (a) early, (b) on-time, and (c) late CKRX


Figure 2.28 shows the transfer function of an ideal Hogge PD with no offset. However,

the Hogge PD is more difficult to implement accurately compared to the Alexander PD.

In particular, the delay of BUF1 should match the clock-to-Q delay of FF1. A delay

mismatch adds a phase offset to the A signal and, in turn, causes PD offset [6].

-UI/2 -UI/2

ΦERR=ΦIN-ΦCK

PDOUT=Avg(Late – Early)

Figure 2.28: Transfer function of Hogge PD

Mueller-Muller Phase Detector

One way to reduce power consumption is to reduce the sampling rate. Both the Alexan-

der and Hogge PDs require a 2x oversampling rate. In contrast, Mueller-Muller PDs

(MMPDs) allow the CDR to operate at baud rate (1x) sampling [14, 21, 26] – the PD

calculates phase error from center samples only. The center samples contain mostly

amplitude information about the data signal and the edge samples, which the MMPD

ignores, contain mostly phase information. However, if pulse response of the data signal

has ISI, then the MMPD can infer the phase information from the center samples and

the slope of the pulse response. Therefore, a MMPD requires ISI in order to function; it

will fail if given a data signal with a Nyquist pulse response (which has infinite slope on

its edges).

Each MMPD is defined by a MM function, F , which should be chosen based on the

pulse response of the channel. The MM function is also the transfer characteristic of the

MMPD. When placed in a CDR feedback loop, the feedback forces the MM function to

zero.

Figure 2.29 shows an example that Mueller and Muller presented in their 1976 pa-


0 T 2T

Time

3T

h-1

h0

h1

F = h-1-h1

-1 1

Sampling Phase (UI)

2

(a) (b)

Pulse Response

ExampleMM Function

Figure 2.29: Example of (a) pulse response and (b) MM function [21]

per [21]. The MM function demonstrated in [21] was F = h−1 − h1 (i.e. the difference

between the precursor, h−1, and post-cursor, h+1). Given the example pulse response

shape, when the samples h−1 and h1 shift to the left, h1 becomes greater than h−1 and F

is negative. Conversely, if the samples shift to the right, F becomes positive. When the

CDR locks, the feedback forces F to zero and h−1 and h1 are equal such that the main

cursor, h0, is near the optimal sampling position close to the peak of the pulse response.

Mueller and Muller also showed that we can estimate the points on the pulse response

(e.g. h−1, h0, h1, etc.) by correlating baud-rate samples of the data signal with the

recovered data. The results are listed in Equations 2.21 to 2.24. The derivation is omitted

because the analysis is very similar to the Equation 2.12. We note that Equations 2.21

to 2.24 assume random, independent data with zero DC bias (E[Ak] = 0); therefore, the

MMPD requires these conditions on the input signal in order to function correctly.

E[xkAk−1] = h1 (2.21)

E[xkAk] = E[xk−1Ak−1] = h0 (2.22)

E[xk−1Ak] = h−1 (2.23)


h−1 − h1 = E[xk−1Ak − xk−1Ak] (2.24)

According to Equation 2.24, we can implement the MMPD described in Figure 2.29

using the expression: xk−1Ak−xk−1Ak. The loop filter that follows the MMPD estimates

the expected value by averaging the MMPD output.

From Figure 2.29, we can also observe a disadvantage of the MMPD – namely, its

transfer function is dependent on the shape of the channel pulse response. A sharp pulse

response will lead to a high PD gain, whereas a spread-out pulse response (resulting from

increased ISI) will reduce the PD gain.

2.5.2. Blind Feed-forward CDR

An example of a blind feed-forward CDR is described in [22,27], as shown in Figure 2.30.

The proposed design samples a 10.3Gbps data signal at 82.5GS/s (8x oversampling).

The edge detector locates the rising and falling data transitions by comparing adjacent

samples. As depicted in Figure 2.31, the data selector chooses the sample farthest away

from the edge (i.e. closest to the center of the UI).

8-phase

clock

generator

SamplersEdge detection +

Data selection logic

PLLCKREF

Recovered

DataDIN

Figure 2.30: System diagram of a 8x oversampled blind feed-forward (burst-mode) CDR [22,27]

An advantage of the feed-forward architecture is that the CDR blocks can be im-

plemented and simulated independently. In fact, the data selection logic in Figure 2.30

was implemented on a separate FPGA while the analog front end blocks were imple-


Detected Edges

UI Center

Figure 2.31: The edge detection and data selection process from Figure 2.30

mented on a test chip. However, the 8x oversampling ratio required a large number of

samplers and a complicated clock distribution network, which resulted in the test chip’s

high power consumption of 5.8W. The oversampling ratio also limits the data rate. The

analog front end’s power consumption and increasing data rates motivates us to reduce

the oversampling ratio.

FFE

Data

Decision

Low-pass

FilterPD

ΦX

ΦAVG

DOUT

5Gb/s

Input

Digital CDR

5GHz

Blind CK

a

bΦX

2x blind samples

0.5UI

ΦX 0.5a

a - b

PD interpolates linearly between

2x samples to find zero-crossing:

5-bit

ADC

¸4 ¸2

Figure 2.32: A blind 2x ADC-based CDR [32]

Figure 2.32 shows an ADC-based implementation of a 2x blind feed-forward CDR [32], [36].

A 5Gb/s input is sampled by a 5-bit ADC and is passed to a feed-forward equalizer (FFE)

in the digital CDR. After the FFE, the blind samples are processed by the phase detector

(PD). If two adjacent blind samples are opposite in sign, a zero-crossing is detected which


corresponds to the edge sample in a phase-tracking system. This zero-crossing, denoted

by variable φX , is approximated by the linear interpolation shown in Figure 2.32. The

instantaneous value of φX is low-pass filtered into φAV G by the digital filter. The data

decision block adds 0.5UI to φAV G to find the center of the eye and compares it to φX to

recover the data. This system uses 2x sampling where the blind samples are 0.5UI apart.

However, if oversampling ratio can be decreased, then the data rate can be increased

without increasing the frequency of the blind clock.

5-bit

ADC

Data

Decision

FilterPDΦX

ΦAVG

DOUT

¸4

6.875Gb/s

Input

Digital CDR5GHz

Blind CK ¸2

Data

Compactor

S1

S2

S3 S16

Fractional sampling: 16 samples per 11 UI

ΦX

Figure 2.33: A blind 1.45x ADC-based CDR [33]

A subsequent work [33], illustrated in Figure 2.33, reduces the oversampling ratio to

1.45x; the receiver takes 16 samples for every 11UI to achieve 6.875Gb/s. Its architecture

is similar to the one presented in [36], but now the samples are farther apart than 0.5UI

and the linear interpolation used in the PD to estimate zero-crossings is less accurate. To

solve this problem, the PD filters out some of the less accurate results based on sample

amplitude. With this architecture, 1.45x seems to provide a good compromise where

the oversampling ratio can be reduced without much loss in jitter tolerance. In order to

eliminate oversampling altogether, Chapter 3 proposes a different CDR architecture.


2.5.3. Blind CDR with Feedback

Due to the linearity and noise drawbacks of PI-based CDRs, [10] proposed a 2x oversam-

pling, 32Gbps design based on a data interpolator (DI) instead of a PI. The DI samples

the data signal blindly and generates the center and edge samples by interpolating be-

tween the blind samples as shown in Figure 2.34. The DI is implemented in the analog

domain by storing the samples on capacitor arrays and interpolating through charge

sharing.S

am

ple

r

Sw

itch

ed

-

ca

p.

arr

ay

PD

LF

Data Interpolator

DIN

Recovered

Data

ΦAVG

Figure 2.34: System diagram of blind CDR with feedback [10]

Data

Edge

Data

Edge

Data Center

Blind sample

Interpolated

sample

Figure 2.35: Analog data interpolator (DI) estimates center and edge samples from blindsamples [10]

A disadvantage of a DI-based CDR is that the DI introduces interpolation error

when estimating the desired samples. In particular, the analog interpolator is a first-

order interpolator (see Figure 2.35). A digital DI can reduce the error by using a more

sophisticated interpolation algorithm. Chapter 3 proposes ADC-based implementation

of a blind CDR with a digital DI.


2.6. Summary

This chapter discussed fundamental concepts about channels and receivers and reviewed

some previous work on adaptive equalizers and CDR blocks. This thesis builds upon the

background in this chapter by exploring blind baud-rate CDR architecture in Chapter 3

and a zero-forcing adaptive DFE in Chapter 4.

3 A Blind Baud-Rate CDR

This chapter proposes a CDR that can recover data from blind baud-rate samples. Sec-

tion 3.1 discusses some concepts and challenges arising from blind baud-rate data re-

covery. Sections 3.2 and 3.3 present the receiver, CDR, and each of their components.

Section 3.4 shows the simulated and measured results.

3.1. Blind 1x Data Recovery Concepts

The PDs in the 2x [32,36] and 1.45x [33] blind CDRs (Figures 2.32 and 2.33, respectively)

interpolate between the blind samples in order to detect the phase of the zero crossings;

they require a finite slope in order to calculate phase. The interpolation cannot accurately

estimate phase when given a low-loss channel because the data transitions become to

abrupt. Unlike phase-tracking CDRs, blind ADC-based CDRs perform poorly with low-

loss channels. Since a blind ADC-based CDR should work with a range of channels, we

focus most of the analysis on low-loss channels. Section 3.4 shows how the proposed

CDR can be modified for a high-loss channel.

Figure 3.1 compares eye diagrams with different sampling rates given a low-loss chan-

nel. The worst-case sampling position occurs when adjacent samples are equally far from

the center of the eye. For 2x blind sampling, the worst case is where adjacent samples are

both 0.25UI from the edge, which leads to a high-frequency jitter tolerance of 0.5UIPP.

When the oversampling ratio is decreased to 1.45x, jitter tolerance decreases to 0.31UIPP.

36

Chapter 3. A Blind Baud-Rate CDR 37

At 1x, the samples may occur on the edges. If jitter shifts samples away from each other,

then the CDR will not capture the bit at all, which results in zero jitter tolerance. The

following paragraph uses the channel’s pulse response to elaborate on this issue and to

arrive at the proposed solution.

2x 1.45x 1x

0.5UIPP 0.31UIPP 0UIPP

High Freq.

Jitter Tol.

(HF JT):

Figure 3.1: Worst-case for 2x, 1.45x and 1x sampling on open eye diagram

Figure 3.2 shows the pulse response of an ideal channel. The best sampling position

occurs when the main cursor is at the center of the ideal pulse response. In a clocked

phase-tracking system, the sampling would remain at this position. However, with 1x

blind sampling, any frequency offset between the data and receiver clock will cause the

sampling phase to shift continuously across a 1UI window. When the sampling occurs

near the UI boundary, any high-frequency jitter may shift the sampling outside the 1UI

phase range, resulting in the loss of data bits (i.e. zero jitter tolerance).

In order to increase the jitter tolerance at baud-rate sampling, the pulse response

is extended beyond 1UI by introducing a controlled amount of ISI in the data using a

rectangular filter, which is implemented via an integrate-and-dump (I&D) circuit [28] in

the receiver front end. A rectangular filter is suitable in this case since its response has a

finite length of ISI and requires fewer equalization taps compared to the exponentially-

decaying response of an RC filter. A 1UI rectangular filter, convolved with the ideal

channel, spreads the pulse response to 2UI. If we have a perfect decision feedback equalizer


Ideal

channel

(no I&D)

0 T 2T 3T

Ideal

channel

+ 1UI I&D

Ideal

channel

+ 2UI I&D

Pulse Response with

Blind Baud-Rate Samples

h0

h-1

h0

h0

h-1: Pre-cursor

h0: Main cursor

Vertical eye opening with

ideal DFE (h0-h-1)

-1 0 1

1UI blind range

h-1

0UIpp jitter

tolerance at

boundary

(No margin)

Sampling Phase (UI)Time

0.5UIpp jitter

tolerance at

boundary

1UIpp jitter

tolerance at

boundaryh-1

Faded arrows and dots show possible

sampling phases due to frequency offset.

Figure 3.2: Comparison of theoretical worst-case jitter tolerance given the pulse responses ofan ideal channel, 1UI I&D, and 2UI I&D. Blind baud-rate samples can shift across a 1UI rangedue to frequency offset.

(DFE) to cancel all post-cursor ISI, then the eye would be open for a range of 1.5UI (this

would have been 2UI if we could cancel pre-cursor ISI). If the blind samples shift beyond

the 1UI window, there is still a remaining jitter margin of 0.5UIPP. A 2UI rectangular

filter increases this margin to 1UIPP and results in a symmetric eye opening with respect

to the blind sampling window. For these reasons, a 2UI I&D circuit was chosen for the

proposed design.


3.2. Proposed 1x Blind Receiver Architecture

Figure 3.3 shows the system diagram of the receiver including an analog front end and

digital CDR. The analog front end consists of four interleaved I&D and ADC blocks, each

operating at 2.5GS/s. Figure 3.4 shows two possible implementations of a 2UI I&D. The

first implementation illustrated in Figure 3.4a is a fully analog 2UI I&D. We have chosen

the second implementation (Figure 3.4b) where the 2UI I&D consists of 2 components:

one piece is analog and the other digital. The I&D circuit integrates 1UI samples and

the ADC converts the samples into 5-bit digital values. An adder in the digital CDR

combines adjacent 5-bit 1UI I&D samples to synthesize 6-bit 2UI I&D samples. Since

the ADC resolution is limited to 5 bits, if we were to obtain 2UI I&D samples directly in

the analog domain and feed them to the ADC, we would have lost the additional 1 bit

of resolution.

Simulations showed that the system needed an ADC with a minimum ENOB of 4 bits;

this work uses a previously designed 5-bit ADC with a known ENOB of 4.2 bits [32].

The proposed design does not include ADC calibration; the addition of digital calibration

for gain, offset, and timing mismatches [19, 25, 35] would further improve the receiver

performance.

The samples in the digital CDR are processed by the data interpolator, which esti-

mates the samples at the center of the eye using the recovered phase, φAV G. The digital

data interpolator allows the use of a more sophisticated interpolation algorithm com-

pared to an analog interpolator. A Mueller-Muller PD and loop filter form a feedback

loop with the data interpolator. Loop latency is critical in this design since the digital

CDR operates on a 625MHz divided clock – each cycle in the loop adds significant delay.

The proposed implementation has a loop latency of 7 cycles. A 2-tap DFE recovers the

binary data, Ak, from the interpolated samples, xk.

The data interpolator compensates for frequency offset. As shown in Figure 3.5a, we


5GHz Blind

CKRX

Data

Interpolator

MM

PD

Loop

Filter

xK

Average interpolation phase (ΦAVG)

z-1

-31

Convert to

signed integer

Add 1UI I&D samples

to form 2UI samples

xK: Interpolated samples

AK: Resolved bits

16x5b

17x1b

Digital

CDR

÷2 ÷4

10Gb/s

Data

5-bit

ADC1UI I&D

Digital CDR

2.5GHz

625MHz

DFE

4

AK1-UI

I&D

5-bit

ADC

AK

x2

4:1

6

4

Clock gen.

Figure 3.3: System block diagram of interleaved analog front end (1 UI I&D and ADC) anddigital CDR

Analog

2UI I&DADC

z-1

Analog

1UI I&DADC

Blind

CKRX

Blind

CKRX

(a) (b) Digital adder

produces 2UI I&D

Figure 3.4: Comparison of (a) fully analog 2UI I&D and (b) analog and digital 2UI I&D

define negative frequency offset to mean the transmitter clock is slower than the blind

receiver clock. When this occurs, an interpolated sample is skipped each time the phase

completes a 1UI rotation. Similarly, Figure 3.5b shows a positive frequency offset where

the transmitter clock is faster than the receiver clock. A positive frequency offset would


Blind samples

Phase rolls over from 1UI to 0UI

à skip interpolation

Desired sampling

locations

1UI

Phase rolls over from 0UI to 1UI

à do interpolation twice

Blind samples

Desired sampling

locations

ΦAVG

ΦAVG

(a)

(b)

Figure 3.5: Handling (a) negative frequency offset: data (TX) is slower than blind receiverclock (CKRX) (b) positive frequency offset: data (TX) is faster than blind receiver clock(CKRX)

result in cases where no blind sample exists between two desired samples; the interpolator

resolves these cases by interpolating twice between the closest two blind samples when

the decreasing φAV G rolls over from 0UI to 1UI. The range of frequency offset supported

by the loop filter is low enough that we can assume the extra interpolated sample is very

close to the blind sample at 1UI. Hence, the implemented interpolator directly uses the

blind sample as the extra interpolated sample.

The data path in the digital CDR is sized for 17 parallel samples. Most of the time,

only 16 paths are active. If there is frequency offset and φAV G rolls over, then the number

of active paths is temporarily reduced to 15 or increased to 17 for one cycle.


Reset

Switches

V0

V1

V2

V3

CL CL

SC1,SC1xSC0,SC0x

SC3,SC3xSC2,SC2x

SC3SC2

SC1SC0

Vin+ Vin-

SC2x

SC2

Figure 3.6: Implementation of integrate-and-dump (I&D) circuit [28]

3.3. Receiver Implementation

3.3.1. Integrate-and-Dump Filter

The output from the channel drives the input of the I&D filter. The I&D circuit in Fig-

ure 3.6 introduces controlled ISI into the ADC input and also operates as a frequency-

scalable anti-aliasing filter [28]. The circuit consists of a single source-degenerated

transconductance stage that converts the input voltage to current and integrates the

signal on the input capacitance of the four interleaved ADCs, labelled as CL in Fig-

ure 3.6. Each interleaved I&D block operates in 3 phases: integrate, hold (during which

the ADC samples the value), and reset. The clock pulses (SC0, SC1, SC2, and SC3) reset

the outputs (V0, V1, V2, and V3) and redirect the current to each of the interleaved

ADCs. Each clock pulse is 1UI wide.


Operating phases

Clock Pulses

1UI

4UI

SC0

SC1

SC2

SC3

(1) Integrate, (2) Hold, (3) Reset

Figure 3.7: I&D operating phases synchronized with clock pulses

3.3.2. Clock Generator

CML-to-CMOS Converters with

Adjustable Delay for Deskew

CMOS Duty-

Cycle Correction

CML

Toggle

FF

(÷2) SC3

SC2

SC1

SC05GHz

CKRX

Clock

Pulse

Generator

Figure 3.8: Implementation of clock pulse generator with adjustable delay for deskew

Figure 3.8 shows the clock generator which drives the ADC and I&D. A CML toggle

flip-flop divides a 5GHz input clock into 4 phases, each at 2.5GHz. The outputs are then


converted into single-ended CMOS signals and buffered. The clock pulse generator [28]

uses logic gates to generate 1UI wide pulses from the 4 clock pulses.

Correct skew by

adjusting clock delays

SC0

SC1

SC2

SC3

Effect of clock phase

skew

(a) (b)

Figure 3.9: (a) Effect of clock phase skew on the I&D integration period (b) Equal I&Dintegration periods after correcting clock skew

Figure 3.9a shows an example of the clock pulses when skew exists between the 4

phases. First, we note that any skew could change the integration periods when the pulses

control the I&D operation. There would be gain mismatch between the 4 interleaved

I&D blocks. Second, when high-speed signals are sampled, the clock skew would appear

effectively as high-frequency periodic or duty cycle dependent (DCD) jitter. Both the

gain mismatch and high-frequency jitter will degrade the receiver’s jitter tolerance. This

sensitivity to clock skew is a disadvantage of using the I&D block.

As shown in Figure 3.9b, the clock skew can be compensated by adjusting the clock

phase through deskew circuits. In this design, the skews are manually adjusted by ob-

serving the ADC outputs (e.g. Figure 3.24). Figure 3.10 shows the deskew circuitry

implemented in each of the CML-to-CMOS converters as a 4-bit phase interpolator. The

differential clock signal connects to the In+ and In- inputs and a 20ps delayed clock

connects to In del+ and In del-. Combining them achieves ±10ps of deskew range on

each of the 4 clock phases driving the I&D.


In+

In-

In_del+

Out

In_del-

Del[3]

1x2x4x8x 1x2x4x8x

_____

Del[3]_____

Del[2]_____

Del[1]_____

Del[0]

Del[2]

Del[1]

Del[0]

Vbias

Figure 3.10: Adjustable clock delay block

3.3.3. Data Interpolator

0.5×ΦAVG when 0 ≤ ΦAVG < 0.5 UI

0.5×(1-ΦAVG) when 0.5 ≤ ΦAVG ≤ 1 UI

ΦAVG

a

bc

d

0.5((b-a) + (c-d))×Y(ΦAVG)b×(1-ΦAVG) + c×ΦAVG

≈ΦAVG

ΦAVG

1UI

bc

Desired sample ≈

Y(ΦAVG) =

Figure 3.11: Piecewise linear interpolation of desired sample from blind samples

Given the ADC’s blind samples and the CDR’s recovered phase, φAV G, the data

interpolator estimates the value of the data at the centre of the eye (i.e. the desired

sample). Figure 3.11 shows 4 consecutive blind samples, a, b, c and d, that are separated


by 1UI. The desired sample is φAV G away from sample b. For simplicity, the expression

in Figure 3.11 assumes that φAV G is a floating point value between 0 and 1UI. In the

implementation, φAV G is represented by a 5-bit value.

The desired sample is estimated first by linearly interpolating between samples b

and c. This estimate has a large error because samples b and c are separated by 1UI.

To improve accuracy, extrapolation is performed using the slopes ((b − a)/1UI) and

((c− d)/1UI). The piecewise linear shape is scaled in Figure 3.11 by the average of the

two slopes and superimpose it on the linear interpolation. Hence, the accuracy of the

estimate is improved by using four instead of two blind samples.

3.3.4. Mueller-Muller Phase Detector

0 T 2T 3T

h0

h-1

Time

h1

h2 -1 1

Sampling Phase (UI)

Ideal channel+2UI I&D

Pulse Response

2-2

F = h0-h1

B

B

-B

MM Function

(a) (b)

Figure 3.12: (a) Pulse response of an ideal channel followed by 2UI I&D (b) Proposed MMfunction

In the proposed design, the 2UI I&D provides a wider pulse response such that the

conventional MM function in Figure 2.29 would not provide the optimal sampling phase.

If the receiver includes a DFE to cancel post-cursor ISI, the maximum vertical eye opening

occurs when the main cursor, h0, is at time T in Figure 3.12 because h0 is the maximum

value of the pulse response and h−1 is zero. Setting the pre-cursor tap to zero will allow

us to fully benefit from the DFE and eliminates the need for FFE. This sampling position

occurs when post-cursor ISI, h1, is equal to the main cursor, h0. To identify this desired


phase location, we choose the MM function to be F = h0 − h1 [14] and force it to zero

through the feedback loop. Since the actual sampling phase is blind, the desired phase

is forced on the interpolating phase, φAV G.

D Q0

1

AK-1

xK

xK-1

Addition and sign operation are done

speculatively while the DFE resolves AK-1

h-1 = h(-T+t)

= E[xK-1AK]

h0 = h(t)

= E[xKAK]

= E[xK-1AK-1]

h1 = h(T+t)

= E[xKAK-1]

h2 = h(2T+t)

= E[xKAK-2]

Mueller-Muller function:

Mueller-Muller PD:

-1

+1MMPDout = (xK-1 – xK)AK-1

F = (h0-h1)

= E[(xK-1 – xK)AK-1]

= E[xK-1AK-1 - xKAK-1]

Figure 3.13: Design and implementation of the speculative Mueller-Muller phase detector(MMPD)

Chapter 2 showed that the pulse response can be estimated using the samples xk, and

the recovered data, Ak [21]. From Equations 2.22 and 2.21, h0 and h1 can be estimated

by the expected values, E[xkAk] and E[xkAk−1], respectively. We substitute the expected

values into the MM function to transform the MM function into the MMPD. The loop

filter in the next block performs the expected value operation by averaging the MMPD

output.

Note that the expressions for pulse response are not unique. For example, according

to Equation 2.22, h0 is also equal to E[xk−1Ak−1]. In the implementation illustrated in

Figure 3.13, we can therefore choose h0 = E[xk−1Ak−1] so that Ak−1 can be factored

out of the expressions for h0 and h1. The DFE has some latency before it recovers Ak−1;


factoring out Ak−1 allows the subtraction to be performed before Ak−1 becomes available.

Since Ak−1 takes on only two values, +1 and -1, it only affects the sign of the MMPD.

In the PD implementation, subtraction is performed first and speculation is used for the

sign of Ak−1. The DFE’s recovered data and the PD output are ready at the same time,

thereby reducing latency in the CDR feedback loop and improving loop stability.

3.3.5. Decision-Feedback Equalizer

DFE Sum (2-tap)

xK

AK

AK-2AK-1

00

01

10

11

D Q

D Q

D Q

D Q

DFE LevelsC1

C2

AK-2AK-1

xK

xK+1

xk+7

AK

AK+1

AK+7

DFE Sum X8

DFE

SumD Q

D Q

D Q

DFE

Sum

DFE

Sum

(a) (b)

Figure 3.14: (a) A speculative 2-tap DFE and (b) the first stage of the parallel speculativeDFE that recovers 8 bits per cycle

The DFE compensates for post-cursor ISI from the channel and the I&D filter. As

can be seen from the pulse response in Figure 3.12, recovering data from an ideal channel

and 2UI I&D filter would require one DFE tap to equalize post-cursor h1, while a more

attenuative channel may require more taps. Three pipeline stages, operating at 625MHz,

resolve 16 bits in parallel – actually 15 to 17 bits to handle cases of frequency offset as

discussed in Section 3.2. DFE adaptation was not included in this design.

To recover 16 bits per clock cycle, 16 parallel DFE sum blocks are required. Spec-

ulation is used extensively to reduce latency in the CDR feedback loop. In each DFE

summation block shown in Figure 3.14a, the 2 DFE taps, C1 and C2, are manually set


and speculation is performed by subtracting the 4 possible levels from the interpolated

sample, xk. When the previous two bits Ak−1 and Ak−2 have been recovered, the mux

selects the correct Ak.

This speculation removes the adder from the critical path. However, the muxes remain

on the critical path since, in order to resolve all 16 bits, data must propagate through 16

muxes. However, at 625MHz, the data can only propagate through 8 muxes per cycle.

Figure 3.14b shows 8 DFE summation blocks that resolve 8 bits in one clock cycle. For

this reason, another stage of speculation was created.

The next stage speculates on the Ak−1 and Ak−2 inputs to the DFE Sum x8 blocks.

As shown in Figure 3.15, Ak−1 and Ak−2 drive the first 4 parallel DFE Sum x8 blocks in

a speculative structure which resolve bits Ak to Ak+7. The last two bits Ak+6 and Ak+7

of this first stage then drive a second set of 4 DFE Sum x8 blocks which resolve bits Ak+8

to Ak+15. In the end, the complete DFE has a latency of 3 cycles.

DFE

Sum

X8

0001

1011

AK-2AK-1

AK+7 AK+6

AK

AK+6

DFE

Sum

X8

0001

1011

AK+8

AK+14

AK+7 AK+15

xK

xK+6

xK+7

xK+8

xK+14

xK+15

Figure 3.15: The second stage of parallel speculative DFE that recovers 16 bits per cycle


Proportional

Gain

KP={0.25, 0.5, 0.75, 1}

Integral

Gain

KI={0, 0.25, 0.5, 0.75, 1}

Cyclic

Counter

KCYC=1/2048

Phase

Counter

KPC=1/32

Up/down signal

÷256

S

KSUM=16

Saturating

Counter

From

PD

16x11b 5bΦAVG

Figure 3.16: Loop filter with configurable proportional and integral gains

3.3.6. Loop Filter

The loop filter is a conventional proportional-integral controller as shown in Figure 3.16.

The parallel PD outputs are summed together and the result is scaled by configurable pro-

portional and integral gains. The saturating counter is sized to handle up to ±1900ppm

of frequency offset. At the output, the 5-bit phase counter produces the recovered CDR

phase as discrete φAV G values ranging from 0 to 31 which are fed back to the data

interpolator block, closing the CDR feedback loop.

3.4. Simulation and Measurement Results

This section shows, through simulation, that the feedback loop converges correctly, how

the system can be modified for a more attenuative channel, and simulated jitter tolerance

results. Next, the measured eye diagrams and measured jitter tolerance of the proposed

CDR are presented.

Figure 25 illustrates the loop dynamics by showing the transient signals in the loop

filter. When the system in Figure 3.3 starts up, it appears that the MMPD relies on

correctly recovered data to estimate phase and, at the same time, the DFE requires a

correct phase to recover the data. To verify that the feedback loop does not enter into

a deadlock, we have applied an input with 1000ppm of frequency offset so as to start

the loop with both phase and data errors. The proportional gain and saturating counter


−1000

0

1000Proportional Gain Output

−1000

0

1000Saturating Counter Output

−1

0

1Up/Down Signal

0

20

40Phase Output (φ

AVG)

0 1 2 3 40

1000

2000

Time (us)

Error Count

Figure 3.17: Simulated loop filter convergence with 1000ppm of frequency offset for PRBS-7.Signals correspond to nodes on the block diagram of Fig. 3.16

outputs are, respectively, the outputs of the proportional and integral paths in the loop

filter. The cycle-slipping causes the saturating counter to temporarily decrease, but the

saturating counter settles to a value corresponding to 1000ppm within 4µs. The up/down

signal increments or decrements φAV G. In steady state, φAV G increases from 0 to 31 and

wraps around in order to track the frequency offset. After 3µs, φAV G is close enough to

the center of the eye to recover the data correctly (i.e. no more bit errors).

Figure 3.17 illustrates the transient signals in the loop filter (Figure 3.16). The

simulation demonstrates the digital CDR locking to the received signal from Channel A

+ 2UI I&D and with 1000ppm of frequency offset. There is cycle slipping, however the


A+2UI I&D

B

A

Figure 3.18: Frequency response of channel models in simulation

1UI

I&DADC

Data

Interpolator\

xK

Channel

A

PRBS-7

Generator

10GHz

0UI 1UI0.5UI-1

0

1

0UI 1UI0.5UI0

16

31

0UI 1UI0.5UI-2048

0

2048

0UI 1UI0.5UI-128

0

128

ΦAVG

CKRX

2-tap

DFE

AK

MM

PD

Loop

Filter

1 + z-1

(TX RJ =

0.17 UIpp)(RX RJ =

0.23 UIpp)

Figure 3.19: Simulated eye diagrams using Channel A + 2UI I&D

proportional and integral paths settle to their steady state values in approximately 4µs.

Similarly, the bit errors stop occurring after 3µs.

As discussed in Section 3.1, the receiver relies on ISI to spread the pulse response

beyond 1UI. We demonstrate through simulation that the 1x blind CDR can work in

2 cases. In cases where the channel attenuation is low (i.e. there is not enough ISI

produced by the channel), the system relies on the 2UI I&D to produce the ISI. This


5-bit

ADC

Data

Interpolator\

xK

Channel

B

PRBS-7

Generator

10GHz

0UI 1UI0.5UI-1

0

1

0UI 1UI0.5UI0

16

31

0UI 1UI0.5UI-1024

0

1024

0UI 1UI0.5UI-1024

0

1024

ΦAVG

CKRX

20-tap

DFE

AK

MM

PD

Loop

Filter(TX RJ =

0.17 UIpp)(RX RJ =

0.23 UIpp)

Figure 3.20: Simulated eye diagrams using Channel B

situation is demonstrated in Figure 3.18 which shows the combined frequency response of

a low-attenuation Channel A followed by its associated 2UI I&D filter. In contrast, where

the channel is attenuative by itself (i.e. there is enough ISI produced by the channel),

the 2UI I&D is no longer needed to produce extra ISI. This situation is demonstrated

by Channel B in Figure 3.18. Simulations show that the 1x blind CDR works in both

of these cases. If the CDR will be used in applications with a wide variety of channels,

then, ideally, the front-end filter should be adaptive such that it increases the amount of

post-cursor ISI when the channel has less high-frequency loss. However, an adaptive filter

is beyond the scope of this work. The test chip, which is described later, demonstrates

only the first case (i.e. low-attenuation channel with 2UI I&D).

Figures 3.19 and 3.20 show the eye diagrams from simulations done in Simulink using

event-driven models [34]. The data source is 10Gb/s and has 0.17UIPP of random jitter.

Similarly, the blind receiver clock is simulated with 0.23UIPP of random jitter. The two

leftmost eye diagrams in Figure 3.19 show the data eye after Channel A and I&D. The

5-bit ADC quantizes the samples into discrete values from 0 to 31. The eyes are still

open because the analog 1UI I&D does not add much attenuation. The 1 + z−1 filter

adds further ISI and closes the eye. In order to obtain the eye diagrams in the digital


CDR, we break the feedback loop and set φAV G to 0.5UI. This forces the desired sample

halfway between the blind samples and the data interpolator produces the worst-case

interpolation error in this condition. The open eye after the DFE adder shows that the

data can be successfully recovered.

Figure 3.20 demonstrates that the system can recover the data with Channel B with-

out the I&D filter, however it requires a 20 tap DFE. This large number of taps is

necessary for Channel B because it introduces a long tail of ISI. This is not the case for

Channel A with the 2UI I&D because it produces far less ISI.

0.1

1

10

100000 1000000 10000000 10000000 1E+09

Jit

ter

To

lera

nc

e (

UIp

p)

Jitter Frequency

1.5" FR4 + 2UI I&D

16" FR4 (no I&D)

100kHz 1MHz 10MHz 100MHz 1GHz

Channel A + 2UI I&D Channel B

Figure 3.21: Simulated jitter tolerance results at 10Gb/s with a BER of 10−6

Figure 3.21 compares the simulated jitter tolerance for each of the two channels. The

simulation assumes a bit error rate (BER) of 10−6. The high-frequency jitter tolerance of

the system in Figure 3.20 (Channel B) is slightly below that of the system in Figure 3.19

(Channel A + 2UI I&D). We also note that the former has a lower CDR bandwidth

compared to the latter, which is caused by a lower PD gain. Compared to Channel A,


Channel B further spreads out the pulse response, which reduces the PD gain (i.e. the

slope of the MM function).

Process 65nm CMOS

Data Rate 10Gb/s

Supply 1.2V

ADC+Demux

Power

CDR Power

Clock Gen.

Power

109mW

112mW

83mW

Digital

CDR

5-bit

ADC

4:16 Demux

I&D

Clock

Generator

(420x645μm2)

(60x490μm2)

(400x490μm2)

(85x145μm2)

(150x260μm2) I&D Power 1.7mW

Total Power 306mW

Figure 3.22: Chip photo

The proposed receiver was implemented in Fujitsu’s 65nm CMOS process. Figure 3.22

is a photo of the test chip. The I&D, clock generator, and ADC are custom-design analog

blocks. The digital CDR was designed using Verilog RTL and implemented with standard

cell gates.

Figure 3.23 shows a simplified diagram of the measurement setup. The data source

is a PRBS-7 generator. A logic analyzer captures and stores digital waveforms from the

test chip (i.e. design-under-test or DUT). For jitter tolerance measurements, sinusoidal

jitter was applied to the transmitter clock.

Figure 3.24 shows the average ADC output when the I&D is given a DC input. On

one test chip, we observed that one of the interleaved front end blocks had a lower gain

compared to the other blocks as we varied the DC input. As discussed in Section 3.3,

the gain error is mostly caused by systematic clock skew. If left uncompensated, the


Test

Channel

PRBS-7

Generator

10GHz CK with

sinusoidal jitter

DUT

5GHz CKRX

Logic

Analyzer

I&D ADC CDRPRBS-7

Comp.

Figure 3.23: Measurement setup

-400 -200 0 200 4000

10

20

30

40

DC Input Voltage (mVpp Differential )

Avera

ge A

DC

Ou

tpu

t C

od

e

ADC 0

ADC 1

ADC 2

ADC 3

-400 -200 0 200 4000

10

20

30

40

DC Input Voltage (mVpp Differential )

Avera

ge A

DC

Ou

tpu

t C

od

e

ADC 0

ADC 1

ADC 2

ADC 3

After Skew Correction

Ave

rag

e A

DC

Ou

tpu

t C

od

e

-400 -200 0 200 400

DC Input Voltage

(mVpp Differential)

40

30

20

10

0

Before Skew Correction

-400 -200 0 200 400

40

30

20

10

0

DC Input Voltage

(mVpp Differential)

(a) (b)

Figure 3.24: Average ADC output given DC input (a) before and (b) after skew correction

skew will reduce the CDR’s jitter tolerance. Hence, the delay was manually adjusted the

delays in the clock generator. Figure 3.24b shows that the gain at the output of ADC 3

matches more closely with gain of the other interleaved blocks after skew correction.

The measurements were performed with a 48” SMA cable as the channel – its fre-

quency response is plotted in Figure 3.25. Figure 3.26a shows the data eye at the output


Figure 3.25: Measured channel frequency response

916mVPP

93.1ps

0 0.2 0.4 0.6 0.8 1.0

0

20

10

30

Sampling Phase (UI)

1U

I I&

D D

igita

l O

utp

ut

Channel + 1UI I&D + ADC

Eye DiagramChannel Eye Diagram

(a) (b)

Figure 3.26: Measured eye diagrams (a) after the channel and (b) after the ADC ADC


of the channel. Figure 3.26b shows the eye diagrams taken from the outputs of the in-

terleaved ADCs. It has been partially attenuated by the analog 1UI I&D. There is some

mismatch between the 4 interleaved analog front ends, but the digital CDR is able to

tolerate this as demonstrated in the jitter tolerance measurement.

0.01

0.1

1

10

100000 1000000 10000000 10000000

Jit

ter

To

lera

nc

e (

UIp

p)

Jitter Frequency (Hz)

Simulation (BER=1e-6)

-300ppm (TX slower than RX)

0ppm

300ppm

1000ppm (TX faster than RX)

XLAUI mask

100kHz 1MHz 10MHz 100MHz

Figure 3.27: Simulated and measured jitter tolerance results with 10Gb/s PRBS-7 input dataand BER of 10−6 and 10−12, respectively

The jitter tolerance was measured after skew correction and with a maximum BER of

10−12 at 10Gb/s. In Figure 3.27, we show the results given -300, 0, 300, and 1000 ppm of

frequency offset. A negative frequency offset means that the transmitter is slower than the

blind receiver clock (i.e. above baud-rate sampling). A positive frequency offset means

that the transmitter is faster than the blind receiver clock – this case is worse for jitter

tolerance since we are actually sampling slightly below baud-rate. During measurement,

we were able to push the frequency offset to 1000ppm with a slight degradation in jitter

tolerance.

In addition, the CDR model was simulated with the channel frequency response (as


in Figure 3.25) and 300ppm of frequency offset. Due to simulation time constraints,

the simulation assumes a maximum BER of 10−6. For this reason, the simulated jitter

tolerance is higher compared to the measured results. The jitter tolerance mask for XL-

Attachment-Unit-Interface (XLAUI) is also shown in Figure 3.27. Although the proposed

design did not specifically target Ethernet applications in the proposed design, the mask

is provided as a reference.

3.5. Summary

This chapter presents a 1x blind ADC-based CDR. The proposed architecture recovers

data by extending the channel pulse response so that the pulse amplitude is greater than

zero, no matter where the blind samples occur within a 1UI window. The receiver adds

controlled ISI to the pulse response through the use of an I&D block in the receiver

front end. The baud-rate design allows the CDR to operate at 10Gb/s given a 10GS/s

sampling rate.

The proposed design was fabricated in a 65nm CMOS process. The test chip success-

fully recovers 10Gb/s data with BER below 10−12. Jitter tolerance measurements show

that the CDR implementation can recover data with below-baud rate sampling – the

CDR operates with ±300ppm of frequency offset and a high-frequency jitter tolerance of

0.19UIPP.

4A Zero-Forcing Adaptive DFE

for an ADC-Based CDR

This chapter proposes a novel zero-forcing adaptive controller for a DFE in a digital

ADC-based CDR. Section 4.1 provides the concepts of the proposed adaptive controller.

Sections 4.2 and 4.3 describe the architecture and implementation details of the receiver,

respectively. Section 4.4 presents simulation results from Simulink models. At the time of

writing this thesis, the Simulink models and Verilog implementation have been completed.

However, the measurement results are left as future work.

4.1. Proposed DFE Adaptation

Sections 2.5.1 and 3.3.4 showed how samples on a pulse response can be calculated by

correlating samples of random data with recovered bits. The example pulse response

from Figure 2.29 is reproduced in Figure 4.1 for convenience. The MMPD described

in Section 3.3.4 uses this information to estimate phase error by subtracting two pulse

response samples (h0-h1). The MMPD output is processed by a loop filter and fed back

to the data interpolator to form the phase-tracking loop. This chapter shows that it is

possible to use a similar feedback loop to adapt the DFE coefficients.

Figure 4.2 illustrates a controller that adapts ”n” DFE coefficients. The data sample,

xk, is correlated with recovered bits, Ak−1 to Ak−n, to estimate pulse response samples.

The low-pass filters provide average values of the pulse samples, which are used as DFE

coefficients, c1 to cn. The n-tap DFE subtracts post-cursor ISI from the current sample,

60

Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 61

h-1 = E[xK-1AK]

h0 = E[xKAK]h1 = E[xKAK-1]

h2 = E[xKAK-2]

Figure 4.1: ISI can be calculated by correlating sampled data (Ak, Ak−1, etc.) with recoveredbits (xk, xk−1, etc.)

xk and the decision block slices the DFE output to recover the binary data, Ak. The

bandwidth of the LPF is the main design parameter. It should be low enough to filter out

transient noise from the correlation terms and, at the same time, high enough to allow

the LPF to settle to the steady state values in reasonable time during receiver start-up.

c1

xk Ak

Shift

register

c2 cn

...

...

...

Ak-1...Ak-n

xkAk-nxkAk-2xkAk-1

n-tap DFE

LPF LPF LPF

Figure 4.2: Zero-forcing controller for n-tap DFE adaptation

This zero-forcing adaptive DFE architecture has two main advantages: scalability

and ease of design. The blocks in Figure 4.2 are easily scaled when n is increased. The

controller is also simpler compared to the ZF implementations in [30] and [13] since it

does not generate an error signal by subtracting the signals before and after the decision

block. Unlike the LMS adaptation in [1], the proposed architecture does not require a

reference (i.e. desired) signal and the feedback loop does not require a configurable gain

parameter.


4.2. Proposed Blind ADC-Based Receiver Architecture

Figure 4.3 shows the system diagram of the proposed blind receiver. The main com-

ponents are a 20GS/s, 3-bit ADC and a digital CDR with adaptive DFE. The ADC

oversamples the 10Gbps data signal by 2x. Compared to the 1x receiver from Chap-

ter 3, the oversampling reduces the anti-aliasing requirement from the analog front end,

increases the accuracy of the data interpolator in the digital CDR, and removes the need

to extend the pulse response through additional ISI. Hence, the oversampling allows us to

remove the 2UI I&D block from the receiver. The removal of the I&D block simplifies the

clock distribution, reduces the power consumed by the clock divider and pulse generator,

and removes the gain errors resulting from skew between interleaved clocks.

Channel

20GS/s

3-bit

ADC

Baud-rate CDR

with adaptive

DFE

Blind CKRX Digital blocks

Recovered

Data10Gbps Data

Figure 4.3: System diagram of proposed receiver with 3-bit ADC-based CDR and adaptiveDFE

Although the front end sampling rate is doubled, the overall ADC area and power

consumption is reduced by decreasing the number of bits from 5 to 3. If we assume

a simple flash ADC architecture, a 5-bit ADC sampling at baud-rate would require

31 comparisons per UI. In contrast, a 3-bit ADC sampling at 2x would only need 14

comparisons per UI.

The architecture of the baud-rate digital CDR, however, is mostly the same as the one

proposed in Chapter 3 (Figure 3.3). Hence, this chapter focuses only on DFE adaptation

and a few CDR blocks that were modified. The following paragraph explains how the 2x

ADC is interfaced with the 1x CDR.

The data interpolator at the input of the CDR creates baud-rate samples from the


2x samples. As shown in Figure 4.4, each pair of blind samples (0.5UI apart) are used

to calculate a desired sample in between them. The φAV G quantity tracks the center

of the data eye relative to the blind samples; edge samples are not computed. When

compared to a 2x digital CDR (e.g [24]), the baud-rate architecture reduces CDR power

consumption because no multipliers and adders are used to interpolate and equalize edge

samples.

2x blind samples

Skip

interpolation

Desired sampling

locations

Extra

interpolation

2x blind samples

0UI £ ΦAVG £ 0.5UI

0.5UI £ ΦAVG £ 1.0UI

0UI £ ΦAVG £ 0.5UI 0.5UI £ ΦAVG £ 1.0UI

Desired sampling

locations

(a)

(b)

1UIΦAVG

Figure 4.4: Data interpolator calculates sample at desired location from closest blind samples.(a) Negative or (b) positive frequency offsets result in occasional skipped or extra interpolatedsamples

A negative or positive frequency offset will result in the data interpolator skipping an

interpolation or inserting an extra interpolation in a similar way to the one described in

Section 3.2.

4.3. Proposed Digital CDR with Adaptive 2-tap DFE

Figure 4.5 shows the digital CDR and adaptive DFE. The 3-bit ADC data is demuxed

to 32 parallel samples at 625MHz. The CDR converts the samples to signed integers

before the input to the data interpolator. The phase tracking loop is the same as the one

described in Chapter 3, with two main differences: a different MMPD and configurable


phase offset coefficient, P.

Data

Interpolator

xk32x3b

5b

2-tap

DFE

MM

PD

(xk-2-xk)Ak-1

xkAk-1LP Filter

Digital

LF

xkAk-2

xk-2Ak-m-2 LP Filter

LP Filter

c2c1

Decision

block

S

4x8b

4x8b

1x8b

8b

8b

8b

8b

Ak

cm

PPhase offset adjustment:

¸4

KDIV=0.25 KSUM=16

3-bit

ADC

Convert

to signed

integer

KINT=16KADC=8

ΦAVG

Digital

CDR

Figure 4.5: Proposed digital CDR with adaptive DFE

Figure 4.5 also identifies the gains of the ADC, data interpolator, divider, and sum

blocks as KADC , KINT , KDIV , and KSUM . The ADC has a gain of 8 because it has a

resolution of 3 bits. The sum block adds together 16 parallel MMPD outputs and, there-

fore, has a gain of 16. The interpolator gain is discussed in Section 4.3.1. Accordingly,

the MM function is:

F = h−1 − h1 +P

KDIVKSUMKAV GKINT

(4.1)

When the CDR has locked to its steady state, we have the relation:

h1 = h−1 +P

KDIVKSUMKAV GKINT

(4.2)

The phase offset coefficient effectively shifts the CDR’s locking phase slightly to the

left (assuming a positive coefficient P), which, in turn, reduces the pre-cursor ISI, and


increases the post-cursor ISI. This takes advantage of the DFE’s ability to cancel the

latter, but not the former. In this work, P is manually set through test registers; in

future work, it may be possible to automatically optimize P for maximum eye opening.

The output of the phase coefficient adder is processed by the loop filter (see Sec-

tion 3.3.6) and fed back to the data interpolator. The interpolator implementation is

discussed in more detail in Section 4.3.1.

From Figure 4.5, the MMPD block also provides three correlation terms. The first

two are used to estimate the first and second DFE taps (c1 and c2). They are low-pass

filtered and the 8-bit coefficients are fed back to the DFE. The third correlation term

provides cm as an ISI monitor for off-chip measurement and optimization. The integer

”m” can be configured between values of -2 to 13 in order to observe 16 ISI taps.

The MMPD-based architecture in Figure 4.5 provides an advantage by decoupling the

phase-tracking and DFE adaptive feedback loops. In an Alexander-based or Hogge-based

phase-tracking CDR, the PD detects the data edges after decision feedback equaliza-

tion [23]. Hence, the DFE affects the CDR’s output phase. At the same time, the output

phase affects the DFE coefficients. In order to prevent the interaction from causing

instability, the DFE adaptive loop is usually implemented with much lower bandwidth

than the phase-tracking loop. However, the low DFE loop bandwidth will increase the

CDR’s start-up time. The MM-based architecture removes the interaction because the

MMPD locks to the unequalized eye – the DFE does not affect the phase-tracking loop.

Hence, the bandwidth of the DFE loop in an MMPD-based architecture can be increased

compared to DFE loop bandwidth in a Alexander-based or Hogge-based architecture.

4.3.1. Data Interpolator

The data interpolator architecture in Figure 4.6 has been modified for 2x blind samples;

otherwise it is the similar to the one presented in Chapter 3 (Figure 3.11). Note that the

worst case for interpolating between 2x blind samples occurs when φAV G is 0.25UI (i.e.


the desired sample is halfway between the 2x blind samples). In contrast, the worst case

for interpolating between 1x blind samples occurs when φAV G is 0.5UI (i.e. the desired

sample is halfway between the 1x blind samples).

Φ’AVG when 0 ≤ Φ’AVG < 0.25 UI

0.5×(1-2Φ’AVG) when 0.25 ≤ Φ’AVG ≤ 0.5 UI

Φ’AVG

a

bc

d

0.5((b-a) + (c-d))×Y(Φ’AVG)b×(1-2Φ’AVG) + c×2Φ’AVG

≈Φ’AVG

Φ’AVG

0.5UI

bc

Desired sample ≈

Y(Φ’AVG) =

Φ’AVG = mod(ΦAVG, 0.5UI)

Figure 4.6: Piecewise linear interpolation of desired sample from 2x blind samples

In the Verilog implementation, φAV G is represented by a 5-bit number. The most

significant bit of φAV G selects the pair of blind samples adjacent to the desired sample

(i.e. b and c). As shown in Figure 4.4, one pair is selected when 0UI ≤ φAV G ≤ 0.5UI

and the other when 0.5UI ≤ φAV G ≤ 1.0UI. The remaining 4 bits are substituted as φ′AV G

in the interpolation expression in Figure 4.6. For clarity, Figure 4.6 shows φ′AV G in terms

of UI, but φ′AV G is actually implemented as an integer between 0 to 15. Therefore, the

implemented interpolator has a gain of 16 (i.e. KINT=16).

One disadvantage of the proposed data interpolators (in this section and Section 3.3.3)

is that they have a phase-dependent frequency response as shown in Figure 4.7. The

frequency response of an ideal data interpolator has a flat magnitude; the interpolator

should only shift the phase of the data signal. The proposed 2x interpolator has a flat

magnitude only when φAV G is 0UI; in fact, its frequency response has a null at 10GHz

when φAV G is 0.25UI. In the time domain, the interpolator changes the pulse response

shape when φAV G 6=0UI. To compensate for this, the DFE should use phase-dependent

coefficients [1, 24]. The DFE architecture described in [1] and [24] stored 8 coefficients


for a 1-tap DFE. The disadvantage is the complexity and area required for storing and

adapting multiple coefficients for each DFE tap. However, this work neglects the pulse

shaping behaviour of the data interpolator because the magnitude of 2x interpolator’s

frequency responses are approximately flat up to the Nyquist frequency of 5GHz. Hence,

only one coefficient is implemented per tap. As we will see in Section 4.4, the DFE

adaptation converges to a coefficient that is approximately the average tap value over all

φAV G.

0

-10

-20

-30

-40

Inte

rpo

lato

r F

req

. R

es

po

ns

e (

dB

)

1GHz 10GHz5GHz

1x,

ΦAVG=0.52x,

ΦAVG=0.25

2x,

ΦAVG=0.125

2x, ΦAVG=0Nyquist freq. = 5GHz

Figure 4.7: Frequency responses of 1x and 2x data interpolators. Both interpolators operateon a 10Gbps data signal with a Nyquist frequency of 5GHz.

In Figure 4.7, we also observe a further advantage of 2x vs. 1x blind sampling. The

frequency response of the interpolator operating on 1x samples has a null at the Nyquist

frequency when φAV G=0.5UI. The system in Chapter 3 worked because the 2UI I&D

already has a null at 5GHz (see Figures 3.18 and 3.25), and, thus, the I&D mostly

masked the phase-dependent response of the interpolator. The CDR would fail if the

2UI I&D were removed because the 1x interpolator would change the pulse response

significantly. In that case, it would be necessary to implement phase-dependent DFE


coefficients. Therefore, the decision to use 2x oversampling has allowed us to save power

by removing the I&D and by using a simple DFE architecture.

4.3.2. Low-Pass Filter for DFE Adaptation

The low-pass filter (LPF) illustrated in Figure 4.8 is used to approximate the expected

value of the correlation terms from the MMPD. The LPF consists of a single integrator

in an internal feedback loop. A summer adds together a bus of 4 correlation terms at the

LPF input. If we needed faster DFE convergence, it is possible to sum together up to 16

correlation terms since the CDR processes 16 samples in parallel per cycle. However, a

larger adder would consume more power.

S

DQ

2b2b

D Q

X4

10b 11b13b, 14b,

or 15b

10b 8b

9b 8b

8b

4x8b

Up/down signal

Configurable

counterHysteresis to reduce

output toggling

Overflow

Detector

Hysteresis

Block D Q

Integrating

counter

c1 or c2

xkAk-1

or

xkAk-2

Figure 4.8: Low-pass filter for DFE coefficients

‘00’

= 0

0

1 2b 2b

2b

Hysteresis block

D Q

0

1

Up/down signal

{-1, 0, 1}

Figure 4.9: Hysteresis block implemented in low-pass filter

The configuration counter and overflow detector act as an adjustable divider that

produces an up/down signal having one of three values: 1, -1, or 0 (i.e. up, down, or

no change). The hysteresis block reduces toggling at the LPF output. As shown in

Figure 4.9, the register in the hysteresis block filters out the ”no change” signals and


stores only an up or down signal. If the hysteresis block receives a signal that is opposite

to the stored value, then the mux at the output of the hysteresis block forces the signal

to ”no change.” The filtered up/down signal at the output of the hysteresis block in

Figure 4.8 is integrated by a counter at the output of the LPF. The gain in the feedback

divides the output by 4; this is needed since the summer added together 4 terms at the

LPF input.

4.4. Simulation Results

This section presents the frequency and pulse responses of the channel models, DFE

adaptation curves, and simulated eye diagrams and jitter tolerance.

Figure 4.10 shows the frequency responses of the channel models used in simulation.

Channels C and D represent 1.5” and 8” traces on a FR4 board, respectively. The CDR

and DFE are demonstrated for three cases: Channel C at 5Gbps, Channel C at 10Gbps,

and Channel D at 10Gbps. The attenuation at the Nyquist frequency are, respectively,

5dB, 10dB, and 13dB.

0

-20

-40

-60

-80

Ch

an

ne

l F

req

. R

es

po

ns

e (

dB

)

100MHz 1GHz 10GHz

Channel D

Channel C

Figure 4.10: Frequency responses of channel models used in simulation


Figure 4.11 depicts the pulse responses of the channel models cascaded with the data

interpolator. The pulse responses are shown for two values of φAV G: 0UI and 0.25UI.

The pulses responses are normalized so that the amplitude of the eye diagram is 1. In

simulation, the offset coefficient, P, is chosen to be 77 because it shifts h0 near the peaks

of the pulse responses; hence the CDR locks at a position described by Equation 4.3.

Figure 4.11 shows the pulse response samples at the CDR lock position.

h1 = h−1 +P

KDIVKSUMKAV GKINT

h1 = h−1 + 0.15

(4.3)

Time (ns)

0 0.2 0.4 0.6 0.8 1.0

Channel C, 5Gbps, ΦAVG=0.0UI




Channel D, 10Gbps, ΦAVG=0.0UI

Channel D, 10Gbps, ΦAVG=0.25UI

h-1

h0 h1 h2

h-1

h0

h1

h-1

h0 h1 h2 h2

h-1

h0 h1 h2 h2

h-1

h0

h1 h2 h2

h-1

h0

h1 h2 h2

Figure 4.11: Combined channel and interpolator pulse responses showing ISI tap values (h−1,h0, h1, h2, h3) when CDR has locked


Figure 4.12 and 4.13 show that transient output of the adaptation controller given

Channels C and D, respectively, at 10Gbps. Each figure demonstrates that c1 and c2

converge during CDR start-up even when initialized to different values (e.g. 0 or 30).

The coefficients settle in approximately 13µs.

Adapted c1 ≈ 22

Adapted c2 ≈ 8

Figure 4.12: Simulated DFE adaptation with Channel C at 10Gbps. DFE converges to samesteady-state values when given different initial coefficients (i.e. 0 and 30)

Adapted c1 ≈ 24

Adapted c2 ≈ 10

Figure 4.13: Simulated DFE adaptation with Channel D at 10Gbps. DFE converges to samesteady-state values when given different initial coefficients (i.e. 0 and 30)


Table 4.1 compares the adapted values, c1 and c2, to the pulse response samples, h1

and h2.

Table 4.1: Comparison of Adapted Coefficients (c1 and c2) vs. Pulse Response (h1 and h2)

𝑐1 𝑐1

𝐾𝐴𝐷𝐶𝐾𝐼𝑁𝑇

ℎ1, Φ𝐴𝑉𝐺 = 0UI

ℎ1, Φ𝐴𝑉𝐺 = 0.25UI

𝑐2 𝑐2

𝐾𝐴𝐷𝐶𝐾𝐼𝑁𝑇

ℎ2, Φ𝐴𝑉𝐺 = 0UI

ℎ2, Φ𝐴𝑉𝐺 = 0.25UI

Channel C, 5Gbps

16 0.125 0.151 0.102 3 0.023 0.045 0.039

Channel C, 10Gbps

22 0.172 0.175 0.170 8 0.063 0.062 0.059

Channel D, 10Gbps

24 0.188 0.197 0.193 10 0.078 0.081 0.079

Figure 4.14 depicts a CDR model used to simulate the eye diagrams in Figures 4.15,

4.16, and 4.17. The c1 and c2 coefficients are set to the values in Table 4.1 and φAV G is

forced to either 0UI (no interpolation) or 0.25UI (worst-case interpolation).

Data

Interpolator

ΦAVG

2-tap DFE

Digital

LF

c2c1

Decision

block

S

77

¸4

Data

Signal

3-bit

ADC

MMPD

Ak

ADC

Output

Interpolator

Output (xk)

DFE

Output

(TX RJ =

0.17 UIpp)

(RX RJ =

0.23 UIpp)

CKRX

Figure 4.14: Simplified diagram of CDR model used for eye diagram simulations

Figure 4.18 shows the simulated jitter tolerance of the receiver with a PRBS-31 data

source and bit error rate (BER) of 10−6. The ADC is modeled as an ideal 3-bit ADC.

The data source and blind receiver clocks are simulated with 0.17UIPP and 0.23UIPP of

random jitter, respectively.


1.0

-1.0

0.0

0.0 0.2 0.4 0.6 0.8 1.0

7

0

1

2

3

4

5

6

0.0 0.2 0.4 0.6 0.8 1.0

Phase (UI)

100

-100

0

100

-100

0

Interpolator Output (ΦAVG=0UI) DFE Output (ΦAVG=0UI)

Data Signal ADC Output

100

-100

0

100

-100

0

Interpolator Output (ΦAVG=0.25UI) DFE Output (ΦAVG=0.25UI)

Phase (UI)

Figure 4.15: Simulated eye diagrams with 5Gbps data and Channel C. Eye diagrams corre-spond to signals in Figure 4.14


1.0

-1.0

0.0

0.0 0.2 0.4 0.6 0.8 1.0

7

0

1

2

3

4

5

6

0.0 0.2 0.4 0.6 0.8 1.0

Phase (UI)

100

-100

0

100

-100

0



100

-100

0

100

-100

0


Phase (UI)

Figure 4.16: Simulated eye diagrams with 10Gbps data and Channel C. Eye diagrams corre-spond to signals in Figure 4.14


1.0

-1.0

0.0

0.0 0.2 0.4 0.6 0.8 1.0

7

0

1

2

3

4

5

6

0.0 0.2 0.4 0.6 0.8 1.0

Phase (UI)

100

-100

0

100

-100

0



100

-100

0

100

-100

0


Phase (UI)

Figure 4.17: Simulated eye diagrams with 10Gbps data and Channel D. Eye diagrams corre-spond to signals in Figure 4.14


0.1

1

10

1.0E+05 1.0E+06 1.0E+07 1.0E+08 1.0E+09

Jit

ter

To

lera

nc

e (

UIp

p)

Jitter Frequency

Channel C, 5Gbps

Channel C, 10Gbps

Channel D, 10Gbps

100kHz 1MHz 10MHz 100MHz 1GHz

Figure 4.18: Simulated jitter tolerance of proposed receiver


4.5. Conclusion

This chapter presented novel zero-forcing adaptive controller for the DFE coefficients in

the 10Gbps digital CDR presented in Chapter 3. In order to reduce power consumption,

the 2UI I&D was removed from the receiver, oversampling was increased from 1x to 2x,

and the ADC resolution was decreased from 5 bits to 3 bits. Simulations show that the

adaptive DFE converges within 13µs to the ISI taps on the pulse response of the data

signal. The simulated high-frequency jitter tolerance is about 0.28UIPP when given a 8”

FR4 channel.

5 Conclusion

5.1. Thesis Contributions

This thesis provided a background and comparison of the different types of equalizers,

adaptive equalizer controllers, and clock-and-data recovery blocks.

A novel 1x blind ADC-based CDR was developed. The proposed receiver recovers

data by extending the channel pulse response so that the pulse amplitude is greater than

zero, no matter where the blind samples occur within a 1UI window. An I&D block in

the receiver front end extends the pulse response by adding controlled ISI. The baud-rate

design allows the CDR to operate at 10Gb/s given a 10GS/s sampling rate.

The proposed design was fabricated in a 65nm CMOS process. The test chip success-

fully recovers 10Gb/s data with BER below 10−12. Jitter tolerance measurements show

that the CDR implementation can recover data with below-baud rate sampling – the

CDR operates with ±300ppm of frequency offset and a high-frequency jitter tolerance of

0.19UIPP.

Next, a zero-forcing adaptive DFE controller was developed for the digital baud-rate

CDR. In order to reduce receiver power consumption, the 2UI I&D was removed from

the receiver, oversampling was increased from 1x to 2x, and the ADC resolution was

decreased from 5 bits to 3 bits. Simulations show that the adaptive DFE converges

within 13µs to the ISI taps on the pulse response of the data signal. The simulated

78

Chapter 5. Conclusion 79

high-frequency jitter tolerance is about 0.28UIPP when given a 8” FR4 channel. A test

chip was taped out August 2013.

The contributions include:

• Proposal of a blind baud-rate ADC-based CDR,

• Implementation of the CDR (I&D design borrowed from previous tapeout by Tina

Tahmoureszadeh, modified and implemented for the proposed design by Joshua

Liang),

• A paper presented at ISSCC 2013 [29],

• A paper accepted for publication in JSSC to appear in the Dec. 2013 issue,

• Implementation of the adaptive DFE (ADC design done by Sadegh Jalali).

I would also like to acknowledge the help of Joshua Liang with the measurement of

the blind baud-rate CDR.

5.2. Future Work

One aspect of the future work (i.e. DFE adaptation) has been discussed in detail in

Chapter 4. There are four other advances that can be made to this work and they will

be described in the following sections.

5.2.1. Implementation of a Fully Feed-Forward Blind Baud-Rate CDR

As noted in Section 3.2, one of the disadvantages of the proposed CDR is the 7-cycle

(112UI) feedback loop latency. The long loop latency limits the CDR bandwidth.

A feed-forward architecture is unconditionally stable [32, 36]. A future enhancement

would implement a feed-forward version of the blind baud-rate CDR. This would require

research on an appropriate baud-rate PD that can operate without feedback.

Chapter 5. Conclusion 80

5.2.2. Evaluation of Phase-Dependent DFE for Data Interpolators

Section 4.3.1 discussed the possibility of implementing phase-dependent DFE coefficients

to compensate for the data interpolator’s phase-dependent response. One future task

would be to evaluate the performance benefits against the area and power cost of the

extra coefficients.

5.2.3. Adaptive Optimization of Offset Coefficient in MMPD

Section 4.3 described a new coefficient, P , which is summed with the MMPD output in

order to shift the sampling phase, φAV G. In this work, P is manually assigned a value

such that the main tap, h0, is sampled near the peak of the pulse response. One future

enhancement is to add an adaptive controller to optimize P for a range of channels.

5.2.4. Calibration of I&D and ADC Front End

The interleaved analog front end blocks described in Chapter 3 did not include any

adaptive calibration. Only manual calibration for clock skew was implemented. The

jitter tolerance can likely be improved by the addition of adaptive calibration for gain,

offset, and timing mismatch in the interleaved I&D and ADC blocks.

References

[1] B. Abiri, A. Sheikholeslami, H. Tamura, and M. Kibune. An Adaptation Engine for

a 2x Blind ADC-Based CDR in 65 nm CMOS. Solid-State Circuits, IEEE Journal

of, 46(12):3140 –3149, dec. 2011.

[2] B. Abiri, R. Shivnaraine, A. Sheikholeslami, H. Tamura, and M. Kibune. A 1-to-

6Gb/s phase-interpolator-based burst-mode CDR in 65nm CMOS. In Solid-State

Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International,

pages 154 –156, feb. 2011.

[3] O.E. Agazzi, M.R. Hueda, D.E. Crivelli, H.S. Carrer, A. Nazemi, G. Luna, F. Ramos,

R. Lopez, C. Grace, B. Kobeissy, C. Abidin, M. Kazemi, M. Kargar, C. Marquez,

S. Ramprasad, F. Bollo, V. Posse, S. Wang, G. Asmanis, G. Eaton, N. Swenson,

T. Lindsay, and P. Voois. A 90 nm CMOS DSP MLSD Transceiver With Integrated

AFE for Electronic Dispersion Compensation of Multimode Optical Fibers at 10

Gb/s. Solid-State Circuits, IEEE Journal of, 43(12):2939–2957, 2008.

[4] Marco V. Barbera, Sokol Kosta, Alessandro Mei, and Julinda Stefa. To offload or

not to offload? The bandwidth and energy costs of mobile cloud computing. In

INFOCOM, 2013 Proceedings IEEE, pages 1285–1293, 2013.

[5] Jan Bergmans. Digital baseband transmission and recording. Kluwer Academic

Publishers, Boston, 1996.

81

References 82

[6] Jun Cao, Sui Huang, and M.M. Green. Non-idealities in linear CDR phase detectors.

In Circuit Theory and Design (ECCTD), 2011 20th European Conference on, pages

158–161, 2011.

[7] Jun Cao, Bo Zhang, U. Singh, Delong Cui, A. Vasani, A. Garg, Wei Zhang, N. Ko-

caman, Deyi Pi, B. Raghavan, Hui Pan, I. Fujimori, and A. Momtaz. A 500 mW

ADC-Based CMOS AFE With Digital Calibration for 10 Gb/s Serial Links Over KR-

Backplane and Multimode Fiber. Solid-State Circuits, IEEE Journal of, 45(6):1172–

1185, 2010.

[8] E-Hung Chen, Jihong Ren, B. Leibowitz, Hae-Chang Lee, Qi Lin, Kyung Oh,

F. Lambrecht, V. Stojanovic, J. Zerbe, and C.-K.K. Yang. Near-Optimal Equal-

izer and Timing Adaptation for I/O Links Using a BER-Based Metric. Solid-State

Circuits, IEEE Journal of, 43(9):2144–2156, 2008.

[9] S. Dey. Cloud Mobile Media: Opportunities, challenges, and directions. In Comput-

ing, Networking and Communications (ICNC), 2012 International Conference on,

pages 929–933, 2012.

[10] Y. Doi, T. Shibasaki, T. Danjo, W. Chaivipas, T. Hashida, H. Miyaoka, M. Hoshino,

Y. Koyanagi, T. Yamamoto, S. Tsukamoto, and H. Tamura. 32Gb/s data-

interpolator receiver with 2-tap DFE in 28nm CMOS. In Solid-State Circuits Con-

ference Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 36–37,

2013.

[11] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Col-

man, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Kil-

lips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson,

A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth. A 12.5Gb/s

SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization

References 83

and Clock Recovery. In Solid-State Circuits Conference, 2007. ISSCC 2007. Digest

of Technical Papers. IEEE International, pages 436 –591, Feb. 2007.

[12] Yasuo Hidaka. 10-20Gb/s+ Equalizer Design for Electrical Channel with 40dB+

Loss. In ATAC Technical Forum F-3, 10-40 Gb/s I/O Design for Data Communi-

cations, International Solid State Circuits Conference, Feb 2012.

[13] H. Higashi, S. Masaki, M. Kibune, S. Matsubara, T. Chiba, Y. Doi, H. Yamaguchi,

H. Takauchi, H. Ishida, K. Gotoh, and Hirotaka Tamura. A 5-6.4-Gb/s 12-channel

transceiver with pre-emphasis and equalization. Solid-State Circuits, IEEE Journal

of, 40(4):978–985, 2005.

[14] A.K. Joy, H. Mair, Hae-Chang Lee, A. Feldman, C. Portmann, N. Bulman, E.C.

Crespo, P. Hearne, P. Huang, B. Kerr, P. Khandelwal, F. Kuhlmann, S. Lytollis,

J. Machado, C. Morrison, S. Morrison, S. Rabii, D. Rajapaksha, V. Ravinuthula,

and G. Surace. Analog-DFE-based 16Gb/s SerDes in 40nm CMOS that operates

across 34dB loss channels at Nyquist with a baud rate CDR and 1.2Vpp voltage-

mode driver. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC),

2011 IEEE International, pages 350 –351, Feb. 2011.

[15] Andy Joy. (What is so Hard About) SerDes Design Challenges for 20Gb/s+ Data

Rates over Electrical Backplanes? In ATAC Technical Forum F-3, 10-40 Gb/s I/O

Design for Data Communications, International Solid State Circuits Conference,

Feb 2012.

[16] Wang-Soo Kim, Chang-Kyung Seong, and Woo-Young Choi. A 5.4Gb/s adaptive

equalizer using asynchronous-sampling histograms. In Solid-State Circuits Confer-

ence Digest of Technical Papers (ISSCC), 2011 IEEE International, pages 358–359,

2011.

References 84

[17] Jri Lee, K.S. Kundert, and B. Razavi. Analysis and modeling of bang-bang clock

and data recovery circuits. Solid-State Circuits, IEEE Journal of, 39(9):1571 – 1580,

sept. 2004.

[18] Mike Li. Jitter, Noise, and Signal Integrity at High-Speed. Prentice Hall, Upper

Saddle River, NJ, 2008.

[19] S.M. Louwsma, A. J M Van Tuijl, M. Vertregt, and B. Nauta. A 1.35 GS/s, 10 b,

175 mW Time-Interleaved AD Converter in 0.13 um CMOS. Solid-State Circuits,

IEEE Journal of, 43(4):778–786, 2008.

[20] A. Momtaz and M.M. Green. An 80 mW 40 Gb/s 7-Tap T/2-Spaced Feed-Forward

Equalizer in 65 nm CMOS. Solid-State Circuits, IEEE Journal of, 45(3):629–639,

2010.

[21] K. Mueller and M. Muller. Timing Recovery in Digital Synchronous Data Receivers.

Communications, IEEE Transactions on, 24(5):516 – 531, May 1976.

[22] J. Nakagawa, M. Nogami, N. Suzuki, M. Noda, S. Yoshima, and H. Tagami. 10.3-

Gb/s Burst-Mode 3R Receiver Incorporating Full AGC Optical Receiver and 82.5-

GS/s Over-Sampling CDR for 10G-EPON Systems. Photonics Technology Letters,

IEEE, 22(7):471–473, 2010.

[23] Massimo Pozzoni, Simone Erba, Paolo Viola, Matteo Pisati, Emanuele Depaoli,

Davide Sanzogni, Riccardo Brama, Daniele Baldi, Matteo Repossi, and Francesco

Svelto. DFE Receiver With a SSC Tolerant CDR for Serial Backplane Communica-

tion. Architecture, 44(4):1306–1315, 2009.

[24] S. Sarvari, T. Tahmoureszadeh, A. Sheikholeslami, Hirotaka Tamura, and M. Ki-

bune. A 5Gb/s speculative DFE for 2x blind ADC-based receivers in 65-nm CMOS.

In VLSI Circuits (VLSIC), 2010 IEEE Symposium on, pages 69–70, 2010.

References 85

[25] P. Schvan, J. Bach, C. Fait, P. Flemke, R. Gibbins, Y. Greshishchev, N. Ben-Hamida,

D. Pollex, J. Sitch, Shing-Chi Wang, and J. Wolczanski. A 24GS/s 6b ADC in 90nm

CMOS. In Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical

Papers. IEEE International, pages 544–634, 2008.

[26] F. Spagna, Lidong Chen, M. Deshpande, Yongping Fan, D. Gambetta, S. Gowder,

S. Iyer, R. Kumar, P. Kwok, R. Krishnamurthy, Chien chun Lin, R. Mohanavelu,

R. Nicholson, J. Ou, M. Pasquarella, K. Prasad, H. Rustam, L. Tong, A. Tran, J. Wu,

and Xuguang Zhang. A 78mW 11.8Gb/s serial link transceiver with adaptive RX

equalization and baud-rate CDR in 32nm CMOS. In Solid-State Circuits Conference

Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 366 –367, feb.

2010.

[27] N. Suzuki, K. Nakura, S. Kozaki, H. Tagami, M. Nogami, and J. Nakagawa. 82.5

Gsample/s (10.3125 GHz X 8 phase clocks) burst-mode CDR for 10G-EPON sys-

tems. Electronics Letters, 45(24):1261–1263, 2009.

[28] T. Tahmoureszadeh, S. Sarvari, A. Sheikholeslami, Hirotaka Tamura, Y. Tomita,

and M. Kibune. A combined anti-aliasing filter and 2-tap FFE in 65-nm CMOS for

2x blind 2-10 Gb/s ADC-based receivers. In Custom Integrated Circuits Conference

(CICC), 2010 IEEE, pages 1–4, 2010.

[29] Clifford Ting, Joshua Liang, Ali Sheikholeslami, Masaya Kibune, and Hirotaka

Tamura. A blind baud-rate ADC-based CDR. In Solid-State Circuits Conference

Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 122–123, 2013.

[30] Y. Tomita, M. Kibune, J. Ogawa, W.W. Walker, H. Tamura, and T. Kuroda. A

10-Gb/s receiver with series equalizer and on-chip ISI monitor in 0.11- mu;m CMOS.

Solid-State Circuits, IEEE Journal of, 40(4):986 – 993, april 2005.

References 86

[31] Y. Tomita, H. Yamaguchi, S. Kawahara, T. Higuchi, T. Yamamoto, H. Ishida, K. Go-

toh, and H. Tamura. A 0.12mm2 5Gbps receiver with a level shifting equalizer and

a cumulative-histogram-based adaptation engine. In VLSI Circuits (VLSIC), 2011

Symposium on, pages 86 –87, june 2011.

[32] O. Tyshchenko, A. Sheikholeslami, H. Tamura, M. Kibune, H. Yamaguchi, and

J. Ogawa. A 5-Gb/s ADC-Based Feed-Forward CDR in 65 nm CMOS. Solid-State

Circuits, IEEE Journal of, 45(6):1091 –1098, June 2010.

[33] O. Tyshchenko, A. Sheikholeslami, H. Tamura, Y. Tomita, H. Yamaguchi, M. Ki-

bune, and T. Yamamoto. A fractional-sampling-rate ADC-based CDR with feed-

forward architecture in 65nm CMOS. In Solid-State Circuits Conference Digest of

Technical Papers (ISSCC), 2010 IEEE International, pages 166 –167, Feb. 2010.

[34] M. van Ierssel, H. Yamaguchi, A. Sheikholeslami, Hirotaka Tamura, and W.W.

Walker. Event-Driven Modeling of CDR Jitter Induced by Power-Supply Noise,

Finite Decision-Circuit Bandwidth, and Channel ISI. Circuits and Systems I: Reg-

ular Papers, IEEE Transactions on, 55(5):1306–1315, 2008.

[35] S. Verma, A. Kasapi, Li min Lee, D. Liu, D. Loizos, Song-Hee Paik, A. Varzaghani,

S. Zogopoulos, and S. Sidiropoulos. A 10.3GS/s 6b flash ADC for 10G Ethernet

applications. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC),

2013 IEEE International, pages 462–463, 2013.

[36] H. Yamaguchi, H. Tamura, Y. Doi, Y. Tomita, T. Hamada, M. Kibune, S. Ohmoto,

K. Tateishi, O. Tyshchenko, A. Sheikholeslami, T. Higuchi, J. Ogawa, T. Saito,

H. Ishida, and K. Gotoh. A 5Gb/s transceiver with an ADC-based feed-forward CDR

and CMA adaptive equalizer in 65nm CMOS. In Solid-State Circuits Conference

Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 168 –169, Feb.

2010.

References 87

[37] Bo Zhang, Ali Nazemi, Adesh Garg, Namik Kocaman, Mahmoud Reza Ahmadi,

Mehdi Khanpour, Heng Zhang, Jun Cao, and Afshin Momtaz. A 195mW / 55mW

dual-path receiver AFE for multistandard 8.5-to-11.5 Gb/s serial links in 40nm

CMOS. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC),

2013 IEEE International, pages 34–35, 2013.

A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFE for an … · 2016-04-08 · A Blind Baud-Rate...

Documents

Transcript of A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFE for an … · 2016-04-08 · A Blind Baud-Rate...