H -S ADC-B R...I am thankful to the members of my Ph.D. oral examination committee: Prof.David...
Transcript of H -S ADC-B R...I am thankful to the members of my Ph.D. oral examination committee: Prof.David...
CLOCK AND DATA RECOVERY FOR
HIGH-SPEED ADC-BASED RECEIVERS
by
Oleksiy Tyshchenko
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
© Copyright by Oleksiy Tyshchenko 2011
CLOCK AND DATA RECOVERY FOR
HIGH-SPEED ADC-BASED RECEIVERS
Oleksiy Tyshchenko
Doctor of Philosophy, 2011
Graduate Department of Electrical and Computer Engineering
University of Toronto
ABSTRACT
THIS THESIS EXPLORES the clock and data recovery (CDR) for the high-speed
blind-sampling ADC-based receivers. This exploration results in two new CDR ar-
chitectures that reduce the receiver complexity and save the ADC power and area compared
to the previous work. The two proposed CDR architectures constitute the primary contribu-
tions of this thesis.
The first proposed architecture, a 2x feed-forward CDR architecture, eliminates the
interpolating feedback loop, used in the previously reported CDRs, in order to reduce the
CDR circuit complexity. Instead of the feedback loop, the proposed architecture uses a feed-
forward topology to recover the phase and data directly from the blind digital samples of the
received signal. The 2x feed-forward CDR architecture was implemented and characterized
in a 5 Gb/s receiver test-chip in 65 nm CMOS. The test-chip measurements confirm that the
CDR successfully recovers the data with bit error rate (BER) ≤ 10−12 in the presence of
jitter.
The second proposed architecture, a fractional-sampling-rate (FSR) CDR architecture,
reduces the receiver sampling rate from the typical integer rate of 2x the baud rate to a
fractional rate between 2x and 1x in order to reduce the ADC power and area. This archi-
tecture employs the feed-forward topology of the first contribution of this thesis to recover
ii
iii
the phase and data from the fractionally-spaced digital samples of the signal. To verify the
proposed FSR CDR architecture, a 1.45x receiver test-chip was implemented and charac-
terized in 65 nm CMOS. This test-chip recovers 6.875 Gb/s data from the ADC samples
taken at 10 GS/s. The measurements confirm a successful data recovery in the presence of
jitter with BER ≤ 10−12. With sampling at 1.45x, the FSR CDR architecture reduces the
ADC power and area by 27.3 % compared to the 2x feed-forward CDR architecture, while
the overall receiver power and area are reduced by 12.5 %.
Acknowledgments
GRADUATE STUDIES is a lot like a long journey. Sometimes it seems interesting
and exciting, while sometimes it seems hard and endless. Along the way of this
journey I met a lot of people and saw a lot of places. These people and places helped
me and inspired me to complete my journey even at times when the journey seemed to be
never-ending. Now, coming close to the end of my graduate studies, I would like to thank
the people who helped me and reflect upon the places that inspired me through out my
graduate school years.
First of all, I thank my supervisor, Prof. Ali Sheikholeslami, for his guidance through
the course of my Ph.D. work. His enthusiasm and insights have been a great source of
encouragement for me. I also thank Prof. Sheikholeslami for helping me to realize that
the graduate school in engineering is more than a technical education, rather it is a great
learning experience of solving problems and achieving goals.
I thank Dr. Hirotaka Tamura of Fujitsu Laboratories Limited (FLL), Kawasaki, Japan,
for his helpful comments, suggestions and constructive criticism at all stages of my Ph.D.
projects: from the project definitions, through architecture development and circuit imple-
mentation, all the way to test-chip measurements and publishing the results. Tamura-sensei,
you were very much like a co-supervisor for me during my Ph.D. studies, and I thank you
for all your help.
I am thankful to the members of my Ph.D. oral examination committee: Prof. David
Johns, Prof. Tony Chan Carusone, Prof. Sorin Voinigescu, Prof. Wai Tung Ng, Prof. Wei Yu;
and my thesis appraiser Prof. Michael Green for their criticism of this work and valuable
feedback.
I thank the former and current graduate students of Ali-group: Kostas Pagiamtzis,
Marcus van Ierssel, David Halupka, Jeff Chow, Scott McLeod, Pradip Thachil, Tina Tah-
iv
v
moureszadeh, Safeen Huda, Shayan Shahramian, Behrooz Abiri, and Siamak Sarvari, who
helped me turn my graduate school years into an interesting, enjoyable and diverse part of
my life. It was a great pleasure meeting them, working with them, and getting to know
them. I greatly appreciate the support of Kostas Pagiamtzis and Marcus van Ierssel, who
completed their graduate studies before I did, and who helped me believe that this journey
will eventually come to an end. Special thanks go to David Halupka with whom over the
past several years I shared the cubicle, my good and bad news, my excitement and frustra-
tions. Naturally, he shared the same with me, and I had to listen to all that. David, thank
you for your patience withstanding me all these years.
During my graduate studies I spent most of the time in Toronto, Canada. However, I
was lucky enough to see other places as well. The places are strongly associated with the
people who helped me see, explore and enjoy these places. I would like to thank these
people next.
I thank William Walker, Nikola Nedovic, Nestoras Tzartzanis, Francis Rotella and Mag-
nus Wiklund of Fujitsu Laboratories of America (FLA), Sunnyvale, CA, for welcoming me
to their team as an intern for half-a-year. It was my pleasure to learn from and to work with
the FLA team. I also thank the FLA team for allowing me to experience a professional,
good and friendly work environment.
During this internship, I had a great chance to explore the Bay Area in California, and
I thank people who helped me turn this time into an experience to remember. I thank
Jeff Chow for allowing me to “take over” his life during his leave from California, which
conveniently coincided with my internship. For several months, I stayed at Jeff’s apartment,
I drove Jeff’s car, and I used Jeff’s cell phone, which made my settling in San Jose, CA,
very smooth. I thank Kostas Pagiamtzis and Irene Goldthorpe for accompanying me on a
large number of trips in California. I also thank Irene and Kostas for helping me to realize
that loosing a bet can be just as pleasant as winning it.
I thank Laura Fujino and Prof. K.C. Smith for inviting me to attend the International
Solid-State Circuits Conference (ISSCC) as a student volunteer six consecutive times dur-
ing my graduate studies. The ISSCC attendance helped me to remain aware of the most
recent research work in the area of electronics performed all over the world both in indus-
try and in academia. Being part of the volunteers team helped me to get to know better my
fellow graduate students, and to realize what a good team is all about. I further thank Laura
vi
and Prof. Smith for sharing their life wisdom with me during the rare uneventful breaks at
ISSCC.
With all the intense schedules of the graduate studies, I am grateful to my friends who
helped me discover beautiful places and experience memorable adventures during the short
vacations away from the school matters. I thank Valeri Kirischian and Irina Ivanova for
showing me the beauty of the Province of Ontario through numerous hiking, camping and
canoeing adventures. I am particularly thankful to Valeri and Irina for helping me experi-
ence the wilderness of Lake Temagami, Ontario, with its rapidly changing weather, stren-
uous canoeing and portaging, beaver dams across tiny rivers, camp fires, starry skies, and
sometimes even polar lights. I thank Roman Ochoukov for accompanying me while explor-
ing the cities of the East Coast: Toronto, Boston, New York, Montreal, to name some of
them. I also thank Roman for his moral support during my graduate studies. Whenever I
thought that the graduate life was hard at the University of Toronto, it was enough to chat
with Roman to remind myself that life is even harder at MIT. I thank Kostas Pagiamtzis,
Irene Goldthorpe, Scott McLeod and Kevin Banovic for joining me for a skydiving adven-
ture — my most extreme experience so far. The long journey of the graduate studies differs
on so many levels from a one minute long free-fall. Yet there is one thing is common
between these two experiences: it is less stressful to reflect upon them both in retrospect.
I thank my parents for their unconditional support through the years of my studies.
Last, but not least, I thank my wife, Katya Tyshchenko, for being by my side despite all the
challenges of her own graduate studies. I further thank Katya for being with me through
most of my experiences of the graduate years: from course projects to outdoors adventures.
Approaching the end of my studies, I realize that it is not the end-goal itself that matters,
rather it is the way towards the goal that is important. Meeting the people who helped me,
and discovering the places that inspired me became an invaluable experience for me. After
all, the journey of studies is simply a part of a larger journey of life.
Contents
List of Tables ix
List of Figures x
List of Abbreviations xii
Chapter 1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Design Challenges and Approaches . . . . . . . . . . . . . . . . . . . . . 31.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 Fundamentals of Clock and Data Recovery inHigh-Speed Receivers 6
2.1 Building Blocks of a High-Speed Receiver . . . . . . . . . . . . . . . . . . 62.1.1 Channel Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 Clock and Data Recovery . . . . . . . . . . . . . . . . . . . . . . 132.1.4 Signal Energy Considerations . . . . . . . . . . . . . . . . . . . . 15
2.2 CDR Architectures for Binary-Sampling Receivers . . . . . . . . . . . . . 162.2.1 Phase-Tracking CDR Architecture . . . . . . . . . . . . . . . . . . 172.2.2 Oversampling CDR Architecture . . . . . . . . . . . . . . . . . . . 21
2.3 CDR Architectures for ADC-Based Receivers . . . . . . . . . . . . . . . . 232.3.1 Mueller-Muller CDR Architecture . . . . . . . . . . . . . . . . . . 242.3.2 Interpolating Feedback CDR Architecture . . . . . . . . . . . . . . 28
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 3 An ADC-Based Feed-Forward CDR Architecture 353.1 Feed-Forward CDR Architecture . . . . . . . . . . . . . . . . . . . . . . . 363.2 Phase-Detection Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3 Phase-Recovery Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
vii
Contents viii
3.4 Data-Decision Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Data Retiming Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.6 Simulation and Measurement Results . . . . . . . . . . . . . . . . . . . . 523.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter 4 A Fractional-Sampling-Rate CDR Architecture 584.1 Fractional-Sampling-Rate CDR Architecture . . . . . . . . . . . . . . . . . 594.2 Phase-Detection Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Eye-Based Phase Detector . . . . . . . . . . . . . . . . . . . . . . 614.2.2 Transition-Based Phase Detector . . . . . . . . . . . . . . . . . . . 64
4.3 Phase-Recovery Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.4 Data-Decision Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.5 Data Compaction Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5.1 Shift-Register Data Compactor . . . . . . . . . . . . . . . . . . . . 734.5.2 Selector-Array Data Compactor . . . . . . . . . . . . . . . . . . . 74
4.6 Data Retiming Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.7 Simulation and Measurement Results . . . . . . . . . . . . . . . . . . . . 774.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Chapter 5 Conclusions 835.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
References 87
List of Tables
2.1 Recently published high-speed receivers. . . . . . . . . . . . . . . . . . . . . . 33
3.1 Jitter tolerance simulation conditions (in Figure 3.14). . . . . . . . . . . . . . . 523.2 Test-chip parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.3 Jitter tolerance measurement and simulation conditions (in Figure 3.17). . . . . 56
4.1 Sampling phases for the sampling rate of 16/
11 ≈ 1.45x. . . . . . . . . . . . . 634.2 Conditional selector truth table. . . . . . . . . . . . . . . . . . . . . . . . . . . 764.3 Test-chip parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
ix
List of Figures
1.1 ITRS projection for chip-to-chip interconnect data rates. . . . . . . . . . . . . 2
2.1 Simplified diagram of an interconnect. . . . . . . . . . . . . . . . . . . . . . . 72.2 Functional block-diagram of a high-speed receiver. . . . . . . . . . . . . . . . 72.3 Channel response in time and frequency domains. . . . . . . . . . . . . . . . . 82.4 Equalization with a filter in the frequency domain. . . . . . . . . . . . . . . . . 102.5 Feed-forward equalization (FFE). . . . . . . . . . . . . . . . . . . . . . . . . 112.6 Decision feedback equalization (DFE). . . . . . . . . . . . . . . . . . . . . . . 122.7 Clocking schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.8 Classification of high-speed receivers with corresponding CDR examples. . . . 152.9 Binary sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.10 Simplified block-diagram of a phase-tracking CDR. . . . . . . . . . . . . . . . 172.11 Phase detection in the phase-tracking CDR. . . . . . . . . . . . . . . . . . . . 182.12 Phase-tracking feedback loop. . . . . . . . . . . . . . . . . . . . . . . . . . . 192.13 Jitter transfer and tolerance of the phase-tracking CDR. . . . . . . . . . . . . . 202.14 Simplified block-diagram of an oversampling CDR. . . . . . . . . . . . . . . . 212.15 Phase detection in the 3x oversampling CDR. . . . . . . . . . . . . . . . . . . 212.16 Jitter tolerance of the 3x oversampling CDR. . . . . . . . . . . . . . . . . . . . 232.17 Sampling with an ADC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.18 Simplified block-diagram of a Mueller-Muller CDR. . . . . . . . . . . . . . . 242.19 Mueller-Muller timing recovery from an impulse response. . . . . . . . . . . . 252.20 Mueller-Muller timing recovery from continuous data. . . . . . . . . . . . . . 262.21 Jitter tolerance of the Mueller-Muller CDR. . . . . . . . . . . . . . . . . . . . 272.22 Simplified block-diagram of an interpolating feedback CDR. . . . . . . . . . . 282.23 Blind and interpolated samples in the interpolating feedback CDR. . . . . . . . 282.24 Linear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.25 Jitter tolerance of the interpolating feedback CDR. . . . . . . . . . . . . . . . 302.26 Simplified block-diagram of a joint-adaptation-based CDR [43]. . . . . . . . . 31
3.1 Proposed feed-forward CDR architecture (simplified block-diagram). . . . . . 363.2 Receiver with the proposed feed-forward CDR architecture. . . . . . . . . . . . 37
x
List of Figures xi
3.3 Proposed linear phase estimation scheme. . . . . . . . . . . . . . . . . . . . . 383.4 Linear estimation of instantaneous phase, φX . . . . . . . . . . . . . . . . . . . 393.5 Flowchart of 2-bit accurate division for calculating φX . . . . . . . . . . . . . . 403.6 Phase recovery filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.7 Discrete-time integrator with programmable gain. . . . . . . . . . . . . . . . . 423.8 Jitter tolerance dependence on the LPF order (simulated, BER ≤ 5 ·10−6). . . . 433.9 Data decision scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.10 Data decision with isolated pulses. . . . . . . . . . . . . . . . . . . . . . . . . 463.11 Data decision in the interpolating feedback and feed-forward CDRs. . . . . . . 473.12 Data retiming schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.13 Simplified FIFO diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.14 Simulated jitter tolerance (BER ≤ 5 ·10−6). . . . . . . . . . . . . . . . . . . . 523.15 Simplified design flow of the proposed feed-forward CDR. . . . . . . . . . . . 543.16 Test-chip die photograph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.17 Measured jitter tolerance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1 Sampling rates in feed-forward CDR architectures. . . . . . . . . . . . . . . . 594.2 Receiver with the proposed fractional-sampling-rate CDR architecture. . . . . . 604.3 Eye diagram accumulation with fractional sampling rate. . . . . . . . . . . . . 624.4 Phase detection from the eye diagram. . . . . . . . . . . . . . . . . . . . . . . 634.5 Simplified block-diagram of the transition-based phase detector. . . . . . . . . 644.6 Selection of transitions leading to low-error phase detection. . . . . . . . . . . 654.7 Average-slope-recovery filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.8 Reduction of phase-detection error using average transition slope. . . . . . . . 674.9 Linear estimation of instantaneous zero-crossing phase, φZC. . . . . . . . . . . 674.10 Selector converting phase values from sampling intervals to unit intervals. . . . 684.11 Phase recovery filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.12 Phase subtracter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.13 Detecting number of samples per UI (jitter-free case). . . . . . . . . . . . . . . 714.14 Data decision in the presence of jitter. . . . . . . . . . . . . . . . . . . . . . . 724.15 Shift-register data compactor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.16 Shift-register data compactor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.17 Simplified FIFO diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.18 Simulated jitter tolerance (BER ≤ 5 ·10−6). . . . . . . . . . . . . . . . . . . . 784.19 Test-chip die photograph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.20 Measured eye diagram at the demux output. . . . . . . . . . . . . . . . . . . . 804.21 Measured jitter tolerance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
List of Abbreviations
ADC Analog-to-digital converter
BER Bit-error rate
CDR Clock and data recovery
CTLE Continuous-time linear equalizer
DAC Digital-to-analog converter
DeMUX De-multiplexor
DFE Decision-feedback equalizer
EQ Equalizer
FFE Feed-forward equalizer
FIFO First-in first-out buffer
FIR Finite impulse response
FSR Fractional sampling rate
HDMI High-definition multimedia interface
IC Integrated circuit
ISI Inter-symbol interference
ITRS International technology roadmap for semicon-
ductors
MLSD Maximum likelihood sequence detection
PCIe Peripheral component interconnect express
PD Phase detector
PI Phase interpolator
RX Receiver
xii
List of Figures xiii
SATA Serial advanced-technology attachment
SI Sampling interval
SNR Signal to noise ratio
TR Timing recovery
TX Transmitter
UI Unit interval
USB Universal serial bus
VCO Voltage-controlled oscillator
Chapter 1
Introduction
H IGH-SPEED SIGNALING SYSTEMS satisfy the growing demand for higher data
rates owing to the ongoing improvements in the integrated circuit (IC) technologies.
The transmission channels, however, have seen little or no improvement over time [1],
which leads to more severe impact of the channel on the data signal with the increasing
data rates. As a consequence, the receivers in the high-speed signaling systems must com-
pensate for the channel response in order to recover the data from the received signal [2,3].
Sampling the signal with an analog-to-digital converter (ADC), instead of a binary sampler,
allows the receivers to compensate for the channel response in the digital domain, which
in turn allows to compensate for more severe channel responses [4, 5]. However, high
ADC power consumption along with high complexity of the clock and data recovery (CDR)
function restrict the use of the ADC-based receivers only to high-performance, rather than
low-cost, interconnects [4–7]. This thesis focuses on the design of low-complexity CDR
architectures that reduce the power consumption of the ADC-based receivers, making them
suitable for the low-cost high-speed interconnects.
1.1 Motivation
The ongoing evolution of the IC technologies enables the processing of increasing volumes
of information in low-cost personal computing and entertainment systems. This trend fuels
the demand for low-cost gigabit-rate interconnects for these data-processing systems. The
International Technology Roadmap for Semiconductors (ITRS) reflects this growing de-
1
1.1. Motivation 2
2008 2010 2012 2014 2016 2018 2020 2022
Year of Production
Data Rate, Gb/s
4
10
100
Figure 1.1: ITRS projection for chip-to-chip interconnect data rates.
mand by projecting an exponential increase of the data rates in the high-speed chip-to-chip
interconnects over the next ten years, as illustrated in Figure 1.1 [8].
The ITRS projections stimulate the development of numerous standards for high-speed
signaling, including the multi-gigabit-rate standards for low-cost computing and entertain-
ment systems. Among the examples of these standards are the commercially-successful
High-Definition Multimedia Interface (HDMI) [9], Peripheral Component Interconnect Ex-
press (PCIe) [10], Serial Advanced-Technology Attachment (SATA) [11] and Universal
Serial Bus (USB) [12]. These standards are commonly used in mobile battery-powered
systems, which restricts the power budget of the high-speed interconnects. Furthermore,
to maintain a low system cost, these standards impose minimum restrictions on the trans-
mission channel between the transmitter and receiver, which leads to the use of low-quality
channels. With the data rates increasing over time, the channel impairments pose one of
the primary challenges in the design of the high-speed transceivers.
The transceivers compensate the signal for the channel distortion at the transmitter
and receiver sides [2, 13], which increases the transceivers’ complexity, area and power
consumption. The challenges of designing low-complexity low-power receivers suitable
for low-cost high-speed interconnects motivate this thesis.
1.2. Design Challenges and Approaches 3
1.2 Design Challenges and Approaches
In low-cost high-speed interconnects, the transmission channel typically consists of a prin-
ted circuit board (PCB) trace, such as in the PCIe standard [10], or a pair of wires of variable
length with little or no shielding, such as in the HDMI, SATA, and USB standards [9,11,12].
These channels modify the transmitted signal by the channel response, and hence make the
task of data recovery from the signal non-trivial at high data rates. Among the numerous as-
pects of the channel response, the inter-symbol interference (ISI) and the timing uncertainty
constitute the primary challenges in the receiver design [14].
The ISI stems from a limited channel bandwidth (BW) compared to the data transfer
rate. Typical channels attenuate the high-frequency content of the signal more than the low-
frequency content. This frequency-dependent attenuation appears as pulse smearing in the
time domain, which causes the adjacent data symbols to superimpose on and interfere with
each other, giving rise to the term inter-symbol interference. Severe ISI causes data errors
at the receiver. To avoid these data errors, the receivers compensate the signal for high-
frequency attenuation using linear equalization or decision-feedback equalization (DFE) [2,
3]. In conventional receivers with binary sampling, this equalization is performed in the
analog domain prior to sampling. The increasing data rates lead to increasing amounts of
ISI, which in turn increases the circuit complexity of the equalizers. The high complexity
of the analog equalizer comes at the cost of high loading at the receiver input, limiting the
data rates at which this receiver can be used.
Sampling the received signal with an analog-to-digital converter (ADC), instead of a
binary sampler, allows to equalize the signal in the digital domain after sampling, thus
avoiding excessive loading at the receiver input. This ADC-based receiver topology is
well-suited for channels with high ISI. However, due to high circuit complexity and power
consumption, the ADC-based receivers find their use either in high-performance optical
interconnects [4, 5, 7] or in hard-drive read channels [15], where the severe ISI justifies
the high area and power of the ADC-based receivers. To a large degree, this high circuit
complexity comes from the clock and data recovery (CDR) system of the receiver, which
allows the receiver to recover the data in the presence of the timing uncertainties.
The timing uncertainty, or jitter, stems from the deviation of the data pulse boundaries
from their nominal time due to random and data-dependent effects [16]. In high-speed in-
1.3. Thesis Contributions 4
terconnects, in order to recover the data, the receiver extracts the timing of the data symbols
from the received signal itself, rather than from a reference clock. This function is com-
monly referred to as clock and data recovery (CDR). The error-free data recovery requires
the receiver to tolerate the jitter. Typically, the CDR relies on a feedback loop to track
the average time of the data pulse boundaries, which allows to compensate for the timing
uncertainties [14]. The high CDR complexity and power consumption prohibit the low-
cost interconnects from using the existing ADC-based receivers. This thesis proposes two
new CDR architectures that reduce the complexity and power of the ADC-based receivers,
making them an attractive option for low-cost high-speed interconnects.
1.3 Thesis Contributions
This thesis investigates techniques of reducing the complexity and power consumption of
high-speed ADC-based receivers through exploring the CDR architectures in the receivers.
This exploration results in two new CDR architectures: a 2x feed-forward CDR architec-
ture and a fractional-sampling-rate (FSR) CDR architecture, which are the two key contri-
butions of this thesis.
The first contribution is a 2x feed-forward CDR architecture [17,18]. This architecture
recovers the data from the digital samples of the signal, taken 2x the baud rate, in a feed-
forward path, eliminating the phase-tracking feedback loop from the CDR. This elimination
of the feedback loop reduces the receiver complexity, which makes the architecture suitable
for low-cost high-speed signaling applications. Test-chip measurement results demonstrate
that the proposed CDR architecture successfully recovers data in a 5 Gb/s receiver.
The feed-forward architecture of the first contribution enables exploring non-conventional
sampling rates to reduce the ADC power consumption, which is one of the main challenges
in the design of ADC-based receivers. The non-conventional sampling rates lead to the
next contribution of this thesis.
The second contribution is a fractional-sampling-rate (FSR) CDR architecture [19].
This architecture recovers the data from the samples taken at a fractional rate between
2x and 1x the baud rate, reducing the ADC power and area compared to the 2x architecture.
Measurements of a test-chip receiver with the FSR CDR confirm a successful data recovery
at 6.875 Gb/s from the samples taken at 10 GS/s, which corresponds to sampling at 1.45x.
1.4. Thesis Outline 5
This sampling rate reduces the ADC power and area by 27.3 % compared to sampling at
2x.
1.4 Thesis Outline
The remainder of this thesis consists of four chapters. Chapter 2 provides a background
for this dissertation through an overview of a sample high-speed signaling system and its
key components. Chapters 3 and 4 present the main contributions of this work. Chapter 3
presents the 2x feed-forward CDR architecture that reduces the CDR complexity compared
to a conventional architecture. This architecture enables the second contribution of this
work. Chapter 4 presents the FSR CDR architecture that reduces the ADC power consump-
tion compared to the 2x architecture. Finally, Chapter 5 concludes this thesis and discusses
potential future research directions in the area of ADC-based receives.
Chapter 2
Fundamentals of Clock and Data Recovery in
High-Speed Receivers
THIS CHAPTER reviews the concept of clock and data recovery (CDR) in high-speed
signaling systems, providing a background for the contributions of this thesis. The
chapter begins with a system-level look at a high-speed interconnect, which reveals that
the channel properties necessitate the two essential building blocks in a receiver: an equal-
izer and a CDR system. Then, the chapter focuses on the CDR systems for two types of
receivers: binary-sampling and ADC-based. For the binary-sampling receivers, a phase-
tracking and an oversampling CDR architectures illustrate the techniques of clock and data
recovery from the binary samples of the received signal. The chapter contrasts these binary-
sampling receivers with ADC-based receivers that use digital samples of the received signal.
For the ADC-based receivers, a Mueller-Muller and an interpolating feedback CDR archi-
tectures exemplify the clock and data recovery techniques. The chapter concludes with a
brief summary.
2.1 Building Blocks of a High-Speed Receiver
To process information, electronic systems require exchanging digital data with other sys-
tems. A computer accessing a peripheral data storage is an example of such data exchange.
Figure 2.1 illustrates a simplified diagram of an interconnect that transfers the digital data
from a source to a consumer: from the computer to the data storage in our example. The
6
2.1. Building Blocks of a High-Speed Receiver 7
Transmitter
(TX)
Receiver
(RX)
Digital Data
to Consumer
Digital Data
from SourceChannel
Interconnect
TXOUT RXIN
Figure 2.1: Simplified diagram of an interconnect.
High-Speed Receiver
Equalizer
(EQ)
RXEQ Clock and Data Recovery
(CDR)
RXIN Digital Data
to Consumer
Figure 2.2: Functional block-diagram of a high-speed receiver.
interconnect consists of a transmitter, a channel and a receiver [20]. The transmitter (TX)
converts the digital data into a form suitable for the channel, T XOUT , and then launches the
data into the channel. The channel is a physical medium that connects the transmitter to the
receiver. This thesis assumes electrical wireline channels, such as a pair of conductors or a
trace on a printed circuit-board (PCB), rather than wireless or optical channels. The chan-
nel modifies T XOUT with unwanted channel response and delivers the modified signal to
the receiver. The receiver (RX) then compensates the received signal, RXIN , for the channel
response and recovers the digital data for the consumer.
Severe impact of the channel compromises the successful data recovery from RXIN ,
which leads to bit-errors in the digital data at the consumer side. The goal of the inter-
connect is to transfer the data with a sufficiently low bit-error rate (BER), typically with
BER < 10−12, while the channel response modifies the transmitted signal [21].
The interconnects, in which the channel, rather than the source or consumer, restricts
the maximum data transmission rate, are commonly referred to as high-speed signaling
systems, or high-speed interconnects. Over the electrical wireline channels, the data rates
are typically restricted to several gigabit per second. This definition of the high-speed
interconnects stems from the fact that the channel properties define the essential functional
blocks required at the transmitter and receiver sides. At the receiver side, which is the focus
of this thesis, the channel necessitates two blocks: an equalizer (EQ) and a CDR system, as
shown in Figure 2.2.
The remainder of this section first takes a closer look at the channel properties in Sec-
2.1. Building Blocks of a High-Speed Receiver 8
TXOUT
f, HzfB/2Channel Gain,dB
0t
RXIN
t
1 UI
1 UI
TCH
(a) Ideal channel: pulse shape preserved, constant channel delay, TCH
f, HzfB/2Channel Gain,dB
0
TXOUT
t
RXIN
t
1 UI
TCH +∆t
Precursor Postcursor
1UI
(b) Non-ideal channel: pulse shape modified, channel delay has uncertainty, or jitter, of ±∆t
Figure 2.3: Channel response in time and frequency domains.
tion 2.1.1, next it reviews the equalization techniques in Section 2.1.2, and finally it intro-
duces the CDR systems in Section 2.1.3 and touches upon the sample energy considerations
in Section 2.1.4.
2.1.1 Channel Properties
Figure 2.3 contrasts an ideal with a non-ideal channel in time and frequency domains. In
the time domain, the ideal channel transfers a data pulse from the TX side to RX side after
a constant delay, TCH , without changing the pulse shape, as illustrated in Figure 2.3(a).
In a binary signaling scheme, this pulse represents a data symbol corresponding to ‘1’,
while a pulse of opposite polarity represents a ‘0’ symbol. To transfer data at 5 Gb/s, or
equivalently at baud rate, fB, of 5 GHz, the transmitter launches the data pulses into the
channel sequentially with the pulse width, or unit interval (UI), of 200 ps. In a channel
2.1. Building Blocks of a High-Speed Receiver 9
with the delay TCH exceeding 1 UI, multiple data symbols are distributed along the channel
length. Since the ideal channel preserves the symbol shape, the symbol reaches the receiver
without interfering with the adjacent symbols. To recover the digital data, the receiver
samples this symbol TCH after the pulse is launched into the channel. In the frequency
domain, the ideal channel has a flat response: it passes all frequency components of the
data pulse equally well.
The non-ideal practical channels have a frequency-dependent attenuation similar to that
of a low-pass filter, and uncertainties in the channel delay, as shown in Figure 2.3(b). Un-
like the low-pass filters that are described in terms of their bandwidth, the channels are
commonly described in terms of their attenuation at fB/2, which is the fundamental fre-
quency when a repeating ‘1010...’ sequence is transmitted through the channel. In the time
domain, this attenuation of the high-frequency content causes the data pulses to change
their shape as they pass through the channel. Figure 2.3(b) symbolically illustrates such
pulse-shape alteration at the RX side: the sharp features of the pulse, corresponding to the
high-frequency content, become smooth and the pulse gets smeared in time, exceeding 1 UI
in duration [21]. With this alteration, the received data symbol, or cursor, is a 1-UI-long
portion of the pulse centered near the maximum amplitude of the modified pulse, while the
remaining parts of the pulse are the pre- and post-cursors of the data symbol. The pre- and
post-cursors superimpose on the surrounding data symbols in the channel, causing inter-
symbol interference (ISI). High channel attenuation at fB/2 leads to severe ISI that causes
data decision errors at the receiver, degrading the BER. To maintain a sufficiently low BER,
the receiver compensates the signal for the ISI using an equalizer [2, 3], which is the first
block of a high-speed receiver. Section 2.1.2 reviews linear and non-linear equalization
techniques.
In addition to ISI, the channel delay, TCH , varies from channel to channel by±∆t, which
may exceed 1 UI. Furthermore, the temperature and process increase ∆t variation. These
deviations of the channel delay from the nominal value contribute to the timing uncertain-
ties, or jitter, at the receiver. The jitter is further exacerbated by random and data-dependent
processes in all parts of the interconnect: the transmitter, channel and receiver. As a conse-
quence, at the receiver side, it is unknown a priori at what time the data symbols and the
boundaries between them arrive. Knowing the timing of the data symbols is essential for
the receiver to recover data with low BER. In order to compensate for jitter, the receiver
2.1. Building Blocks of a High-Speed Receiver 10
f, HzfB/2
Gain,dB 0
f, HzfB/2
Gain,dB 0
Channel Equalizer
TXOUTRXIN
RXEQ
f, HzfB/2
Gain,dB 0
Channel + Equalizer
TXOUT RXEQ
Figure 2.4: Equalization with a filter in the frequency domain.
relies on a clock and data recovery (CDR) system [14], which is the second block of a
high-speed receiver. Section 2.1.3 overviews the concept of clock and data recovery, and
Sections 2.2–2.3 discuss the CDR schemes for binary-sampling and ADC-based receivers.
2.1.2 Equalization
The goal of equalization is to compensate the received signal for the channel-induced ISI,
thus preventing bit-errors at the receiver [21,22]. The equalizers are divided into linear and
non-linear. The linear equalizers typically boost the high-frequency content of the signal
using linear operations. In contract, the non-linear equalizers rely on non-linear operations
to estimate the ISI in order to subtract it from the signal. This section reviews first the linear
equalization techniques and then the non-linear equalization.
The linear equalization can be used at the receiver side [23–25], transmitter side [26–
28] or both sides simultaneously [29–31]. Figure 2.4 demonstrates the effect of linear
equalization in frequency domain through an example of the receiver side equalizer. The
equalizer is a filter with a gain peaking around fB/2, such that a cascade of the channel
with the equalizer has a flat response up to fB/2. This cascade prevents the attenuation of
the high-frequency component in the equalized signal, RXEQ, and thus reduces the channel-
induced ISI.
2.1. Building Blocks of a High-Speed Receiver 11
RXEQRXIN
TDLY α
t
RXIN(t)
t
RXEQ(t)
t
α·RXIN(t–TDLY)
Figure 2.5: Feed-forward equalization (FFE).
The linear equalizer can be implemented either as a continuous-time or a discrete-time
filter. The continuous-time linear equalizer (CTLE) is typically an amplifier with some gain
peaking near fB/2 [32]. In contrast, the discrete-time equalizer is a finite-impulse-response
(FIR) filter. Figure 2.5 illustrates a sample equalizer FIR, where the equalizer estimates the
ISI through delaying the received signal by TDLY and scaling down the delayed signal by
a tap weight, α . The equalizer then subtracts this estimated ISI from RXIN to obtain the
equalized signal, RXEQ. Since the ISI estimate is fed forward, this equalizer topology is
called a feed-forward equalizer (FFE) [21, 33].
The amount of gain peaking in CTLE and the ISI tap weights in FFE can be either
constant or adaptive. If the channel characteristic is known at the time of the equalizer
design, constant equalizer settings are sufficient. However, a variable channel response
necessitates adaptive equalization where the equalizer settings are adjusted to the channel
properties.
The transmitter-side equalization is commonly referred to as pre-emphasis. The pre-
emphasis is similar in principle to the receiver-side equalization with the only difference
that the equalizer precedes the channel. Pre-emphasis boosts the high-frequency content of
the signal with a CTLE or FFE before the signal is launched into the channel. Since the
channel response is impossible to estimate at the transmitter side, the pre-emphasis either
uses constant equalizer settings, or it requires a return channel such that the receiver can
feed the adaptation information to the transmitter equalizer.
Both the continuous-time and discrete-time linear equalizers suffer from noise amplifi-
cation [21]. While boosting the high-frequency content of the signal, the CTLE also boosts
up the high-frequency component of the noise in the received signal. Similarly, the ISI
estimate in the FFE contains the noise components. As a consequence, the linear equal-
izers exacerbate the noise in the equalized signal. To prevent this noise amplification, the
2.1. Building Blocks of a High-Speed Receiver 12
RXIN
TDLY=1UI
α1
α2
TDLY=1UI
DTi-1
DTi-2
RXEQDTi
Figure 2.6: Decision feedback equalization (DFE).
receivers rely on non-linear equalizers.
Figure 2.6 shows through an example the concept of decision feedback equalization
(DFE), which is a non-linear equalization technique. Similar to FFE, the DFE subtracts an
estimate of ISI from the received signal, RXIN , to get the equalized signal, RXEQ. However,
in contrast with FFE, the DFE estimates the ISI by feeding back the data decision bits DTi,
which are obtained through a non-linear slicing operation [21, 22]. This non-linear slicing
prevents the noise from affecting the estimated ISI, and therefore the DFE does not amplify
the noise while cancelling the ISI. The example shown in Figure 2.6 illustrates a 2-tap DFE.
First, the decision bits, DTi, pass through a chain of 1-UI-long delay elements, TDLY , to
generate a set of previous decision bits, DTi−1 and DTi−2. Then, the previous decision bits
are scaled by their corresponding tap weights, α1 and α2, to estimate the post-cursor ISI
contribution due to these previous bits. Finally, the DFE feeds back the ISI estimate and
subtracts it from RXIN to get RXEQ. Since the DFE relies on the previous decision bits to
estimate the ISI contribution, the DFE can cancel post-cursor ISI only. The number of DFE
taps depends on the severity of ISI in the channel. In most cases, the DFE tap weight are
adaptable to the channel response.
The DFE successfully compensates for ISI under the condition that most of the data
decisions are correct. The data errors lead to the incorrect estimation of the ISI, which
leads to further errors [22]. Since typically the high-speed interconnects operate with BER
< 10−12, the DFE is an effective way of compensating for the ISI without amplifying the
noise.
The common property of the linear equalization and DFE is that both these approaches
cancel the ISI energy superimposed on the cursor energy, thus reducing the total signal
energy that reaches the decision circuit. This ISI cancellation allows to use low-complexity
and low-cost decision circuits that recover a single bit at a time. In contrast with a single-bit
2.1. Building Blocks of a High-Speed Receiver 13
detection, sequence detection approach reuses the ISI energy in order to recover a sequence
of bits at a time. Viterbi algorithm is an example of maximum likelihood sequence detection
(MLSD) algorithms, which is widely used in communication applications [4, 21]. The
MLSD-based receivers successfully compensate for high amounts of ISI in the received
signal at the cost of high circuit complexity. The MLSD algorithms are computationally
intensive, which leads to high receiver power consumption. As a result, the MLSD receivers
are typically used in high-performance applications or for channels with severe ISI such as
read channels in disk drives.
Since the equalizers modify the amplitude of the received signal, the position of the
equalizer with respect to the sampler depends on the sampling type in the receiver. Sec-
tion 2.2 shows that in a binary-sampling receiver, the equalizer must precede the sampler
in the analog domain, while Section 2.3 shows that in an ADC-based receiver the equalizer
can be implemented after the sampler in the digital domain. Before delving into the details
of sampling in the receivers in Sections 2.2 and 2.3, the following section reviews the basics
of clock and data recovery in a high-speed receiver.
2.1.3 Clock and Data Recovery
The role of the clock and data recovery (CDR) system in a high-speed receiver is to extract
the symbol timing from the received signal and then to use this timing for the data recovery
in the presence of timing uncertainties, or jitter, in the received signal. The magnitude of
the timing uncertainties compared to the UI determines the type of a clocking scheme and
the necessity for the CDR in an interconnect.
Figure 2.7 compares three clocking schemes at a system level. A short channel delay
compared to the UI allows for a global clocking scheme, show in Figure 2.7(a), in which
a shared clock generator distributes a global clock to the two systems that are exchanging
data. This global clock synchronizes the transmitter with the receiver in every interconnect,
thus serving as a timing reference. With the increasing data rate and shrinking UI, the propa-
gation delay through the clock path becomes comparable to the UI, which makes the global
clocking scheme non-suitable for the high-speed interconnects. To align the propagation
delays in the clock and data paths, a source-synchronous clocking scheme, illustrated in
Figure 2.7(b), delivers a reference clock from the transmitter to receiver through a replica
of the data channel [34]. This scheme tolerates timing uncertainties of larger amplitude
2.1. Building Blocks of a High-Speed Receiver 14
Transmitter ReceiverData
Data TransmitterReceiver
System 1 System 2Shared Clock
(a) Global scheme
Transmitter ReceiverData
Clock
Data
ClockTransmitterReceiver
System 1 System 2
(b) Source-synchronous scheme
Transmitter Data
Data Transmitter
System 1 System 2
DT
CLKCDR
DT
CLKCDR
(c) Clock and data recovery (CDR) scheme
Figure 2.7: Clocking schemes.
compared the UI at the price of doubling the number of channels per interconnect, thus
increasing the overall interconnect cost. To reduce this overhead, a CDR-based clocking
scheme, shown in Figure 2.7(c), first recovers a reference clock, CLK, from the received
data signal, and then uses this clock to recover the data, DT [14,21,22]. The CDR relies on
the clock embedded in the actual data stream in the form of data transitions. Compared to
the source-synchronous scheme, the CDR-based clocking scheme in addition to tolerating
higher jitter amplitudes eliminates the need for a dedicated clock channel, thus reducing the
channel cost. Since the channel constitutes one of the dominant costs in the interconnect,
a large majority of the low-cost high-speed interconnects uses the CDR-based clocking
scheme.
The choice of the CDR scheme for an interconnect depends on the receiver type. Fig-
2.1. Building Blocks of a High-Speed Receiver 15
High-Speed
Receivers
Binary-Sampling ADC-Based
Phase-Tracking Blind-Sampling Phase-Tracking Blind-Sampling
Phase-Tracking CDR Oversampling CDR Mueller-Müller CDRInterpolating
Feedback CDR
(2x) (> 2x) (1x) (2x or less)
Figure 2.8: Classification of high-speed receivers with corresponding CDR examples.
ure 2.8 classifies the high-speed receivers based on the sampling circuit and on the clock
synchronization. The binary-sampling receivers sample the signal with a binary sampling
circuit, such as a flip-flop, while the ADC-based receivers sample with an analog-to-digital
converter (ADC), which is a multi-level sampling circuit. Both the binary-sampling and
ADC-based receivers are further classified into the phase-tracking and blind-sampling cat-
egories. The phase-tracking receivers align a local clock with the received signal using
a phase-tracking feedback loop in order to synchronize the signal samples with the data
symbols. The blind-sampling receivers, in contrast, sample the signal with a clock that
is free-running, or blind, with respect to the received symbol boundaries. Figure 2.8 lists
four CDR examples corresponding to the four categories of the high-speed receivers. The
figure also annotates the sampling rates for every receiver category. A phase-tracking [35]
and an oversampling CDRs [36, 37] illustrate the clock and data recovery in the binary-
sampling receivers in Section 2.2, while a Mueller-Muller [38] and an interpolating feed-
back CDRs [39, 40] serve as samples for the ADC-based receivers in Section 2.3.
2.1.4 Signal Energy Considerations
The sampling scheme in a receiver determines the amount of signal energy captured in the
samples in every UI. The signal energy per UI depends on two parameters: the number of
samples per UI and the position of the samples with respect to the UI boundaries. The exact
sample energy also depends on the received data pattern. However, a repeating ‘1010...’ se-
quence, which can be approximated by a sinusoid, typically corresponds to the worst-case
2.2. CDR Architectures for Binary-Sampling Receivers 16
RXIN
Binary
Samples1 1 0 0 0 0 1 1 1
Figure 2.9: Binary sampling.
signal energy when the signal passes through a band-limited channel. This approximation
of the received signal with a sinusoid simplifies the estimation of the worst-case sample
energy for a given sampling scheme. The sample energy is proportional to the signal am-
plitude squared at the sampling instance.
With these simplifications, it is possible to compare the sampling schemes based on
the sample energy per UI. As an example, 2x phase-tracking sampling (with one sample
aligned with the UI center and the other sample aligned with the UI boundary) yields the
same sample energy per UI as the baud-rate phase-tracking sampling (with the sample
aligned with the UI center). This energy equivalence stems from the fact that the sample
aligned with the UI boundary yields zero signal energy. Furthermore, it is possible to show
that 2x blind sampling yields the same sample energy per UI as baud-rate and 2x phase-
tracking sampling schemes.
The signal energy per UI can be used to estimate the signal to noise ratio (SNR) in
the samples. SNR is commonly used to characterize analog circuits, such as amplifiers,
filters and analog-to-digital converters. However, this metric is not common in CDR sys-
tems. This thesis will briefly mention the signal energy per UI in the context of fractional
sampling rate CDRs in Chapter 4.
2.2 CDR Architectures for Binary-Sampling Receivers
A binary-sampling receiver takes binary samples of the received signal, preserving the sign
of the signal at the sampling instances and discarding the signal amplitude, as illustrated in
Figure 2.9. A relatively simple circuit, such as a flip-flop, is sufficient for this binary sam-
pling [35], which makes the binary-sampling receivers an attractive option for low-power
applications. Section 2.1.2 demonstrated that the equalization requires access to the ampli-
tude of the received signal. Since the binary sampler preserves only the sign of the signal,
2.2. CDR Architectures for Binary-Sampling Receivers 17
EQBinary Sampler
PD Loop Filter
VCO
RXIN
1DTREC
2
CKREC VCTRL
φERR
Figure 2.10: Simplified block-diagram of a phase-tracking CDR.
the equalizer precedes the sampler in the binary-sampling receivers. As a consequence, all
the signal equalization is performed in the analog domain, with the equalizer loading the
high-speed input node of the receiver. This input loading, in turn, limits the amount of the
ISI compensation that can be practically implemented in an integrated receiver. In addition,
the analog circuits of the equalizer scale poorly with the IC technology scaling.
The binary nature of sampling requires the receiver to rely only on the signal signs to
recover the clock and data. The remainder of this section reviews the CDR architectures for
the binary-sampling receivers through two examples: a conventional phase-tracking CDR
in Section 2.2.1, and an oversampling CDR in Section 2.2.2.
2.2.1 Phase-Tracking CDR Architecture
Figure 2.10 shows a simplified block-diagram of a phase-tracking CDR [35]. First, an
equalizer compensates the received signal, RXIN , for the channel ISI. Then, a recovered
clock, CKREC, triggers a binary sampler to take two samples of the signal in every UI such
that one of these samples is aligned with the UI center, while the other – with the UI edge.
Next, a phase detector (PD) uses these binary samples to estimate the phase error, φERR,
which a difference between the phase of the received data and the phase of the recovered
clock. A low-pass loop filter averages φERR to generate a control voltage, VCT RL, for a
voltage-controlled oscillator (VCO). Finally, the VCO adjust the phase of CKREC to align
it with the received data phase, thus closing the phase-tracking feedback loop. Since the
samples are aligned with the UIs, the sample in the UI center becomes the recovered data,
DTREC. The two essential components of this CDR, the phase detector and the phase-
tracking loop, are discussed in greater details next.
Figure 2.11 demonstrates the phase detection algorithm in a binary-sampling phase-
2.2. CDR Architectures for Binary-Sampling Receivers 18
Ei
Ei+1 Ei+2Di+1 Di+2
Di Ei+3Di+3 Di+4
RXIN
CKREC
DiEiDi+1 = 110 or 001
CKREC is early
(a) Clock early
0 0 1 11
Ei Ei+1
Ei+2
Di+1 Di+2
Di Ei+3Di+3 Di+4
RXIN
CKREC
DiEiDi+1 = 100 or 011
CKREC is late
(b) Clock late
Figure 2.11: Phase detection in the phase-tracking CDR.
tracking CDR. The PD uses the transitions between the distinct data symbols in RXIN (‘1’ to
‘0’ or ‘0’ to ‘1’) in order to detect if the recovered clock, CKREC, is early or late compared to
RXIN . The PD uses the samples taken twice per UI: at the rising and falling edges of CKREC.
The samples taken at the rising edge, Di, are close to the UI center, while the samples taken
at the falling edge, Ei, are close to the UI edge. When two consecutive UI-center samples
are distinct, as in the example of Di and Di+1, the PD compares these samples with the
UI-edge sample between them, Ei. If Di and Ei are identical, the PD indicates that CKREC
is early compared to RXIN , as illustrated in Figure 2.11(a). Conversely, if Ei and Di+1
are identical, the PD indicates that CKREC is late, as shown in Figure 2.11(b). When two
consecutive UI-center samples are identical, as in the example of Di+1 and Di+2, the PD
holds its previous output. Since this type of PD detects only the sign of the phase error in a
non-linear binary manner (clock early or late), and not the magnitude of the phase error, it’s
commonly referred to as bang-bang type. To align CKREC with RXIN using this bang-bang
PD, the CDR relies on a phase-tracking feedback loop, which effectively averages the PD
characteristic, allowing to approximate the PD and the entire loop with a linear model.
Figure 2.12 presents a linearized signal flow diagram of the phase-tracking loop [14].
The input to the system is the phase of the received data, φIN(s), and the output is the
phase of the recovered clock, φREC(s). In this diagram, a subtracter followed by a gain,
KPD, approximates the sampler and the PD; the loop filter has a low-pass transfer function
2.2. CDR Architectures for Binary-Sampling Receivers 19
KVCOs
KPD H(s)φIN(s)φERR(s)
φREC(s) VCTRL(s)
Sampler and PD
Figure 2.12: Phase-tracking feedback loop.
H(s); and KVCO/s models the VCO. Using this negative feedback loop, φREC(s) tracks the
low-frequency jitter in φIN(s), and attenuates the high-frequency jitter. First, the subtracter
detects the phase error, φERR(s), between the input and output phases. Then, the low-pass
filter averages φERR(s) into the control voltage, VCT RL(s), for the VCO. Finally, the VCO
adjusts φREC(s) by changing its oscillation frequency, closing the feedback loop. The trans-
fer function of this phase-tracking loop is commonly referred to as jitter transfer function,
and it can be written as:φREC(s)φIN(s)
=KPDKVCOH(s)
s+KPDKVCOH(s). (2.1)
Typically, this jitter transfer is a low-pass transfer function.
Since the recovered clock triggers the sampler to take a sample at the UI center for the
data recovery, an error-free data recovery requires φREC(s) to closely follow φIN(s). In fact,
the CDR makes a data decision error when φREC(s) deviates from φIN(s) by 0.5 UI or more
in either direction. Equivalently, the condition for the error-free data recovery is:
|φREC(s)−φIN(s)|< 0.5UI. (2.2)
This condition, in combination with (2.1), allows to evaluate the maximum amplitude of
the sinusoidal data jitter that the CDR can tolerate (in UI):
|φIN(s)|< 0.5∣∣∣∣1+
KPDKVCOH(s)s
∣∣∣∣ . (2.3)
Expressed as a peak-to-peak value, UIPP, this limit is commonly referred to as jitter toler-
ance:
JitTol(s) =∣∣∣∣1+
KPDKVCOH(s)s
∣∣∣∣ , (2.4)
which is the maximum sinusoidal jitter that the CDR can tolerate at a given frequency
without making data decision errors.
2.2. CDR Architectures for Binary-Sampling Receivers 20
Jitter frequencyω0
φREC(s)/φ
IN(s),dB
0
(a) Jitter transfer
Jitter frequencyω0
Jitter amplitude,UI PP
1
(b) Jitter tolerance
Figure 2.13: Jitter transfer and tolerance of the phase-tracking CDR.
Figure 2.13 symbolically illustrates the jitter transfer and jitter tolerance of the conven-
tional phase-tracking CDR. The jitter transfer in Figure 2.13(a) reflects that the CDR tracks
the low-frequency input jitter as long as the jitter frequency remains below ω0, which is
the bandwidth of (2.1); and that the CDR attenuates the high-frequency jitter exceeding
ω0. The jitter tolerance in Figure 2.13(b) shows that the CDR tolerates over 1 UIPP of low-
frequency jitter below ω0, and up to 1 UIPP of high-frequency jitter above ω0. A number
of standards for high-speed interconnects include specifications for the jitter transfer and
jitter tolerance since they are convenient means for quantifying the CDR’s ability to recover
data with a low BER in the presence of jitter. Unlike the analytical jitter tolerance that re-
flects only the system-level limitations of the CDR, the simulated and measured tolerances
also reveal the amount of the jitter tolerance reduction due to the circuit implementation
of the CDR. The simulated and measured jitter tolerances are used to validate the CDR
architectures proposed in this thesis.
The phase-tracking CDR architecture is typically implemented in the analog domain:
high-speed flip-flops followed by a charge pump implement the PD, while an RC network
implements the loop filter. As a consequence, this architecture is challenging to scale be-
tween the IC technology nodes, and therefore it takes only a partial advantage of the IC tech-
nology scaling. The oversampling CDR architecture, in contrast, is implemented entirely
in the digital domain, taking the full advantage of the technology scaling. The following
subsection takes a closer look at the oversampling CDR.
2.2. CDR Architectures for Binary-Sampling Receivers 21
EQBinary Sampler
PDBlind Sampling Clock
RXIN 1DTREC
N>2N:1
φPICK
N phases
Figure 2.14: Simplified block-diagram of an oversampling CDR.
RXIN
CK φ=0
CK φ=1/3
CK φ=2/3
Di+1
Di
Di+2
Di+3 Di+4
φPICK=0
φ=1/3φ=2/3
data zero-crossing phase
1/3 ≤ φ ≤ 2/3
Figure 2.15: Phase detection in the 3x oversampling CDR.
2.2.2 Oversampling CDR Architecture
Figure 2.14 shows a simplified block-diagram of an oversampling CDR, which is a blind
binary-sampling CDR [36, 37]. First, a multi-phase sampling clock triggers the binary
sampler to oversample the equalized received signal by a factor of N above the data rate,
i.e. to take N samples per UI. Since the sampling instances have no phase relation to the
received signal, RXIN , this type of sampling is referred to as blind sampling. Then, out
of N clock phases, the PD identifies the data-picking phase, φPICK , which is the closest
phase to the UI center. Finally, an N-to-1 selector takes the sample corresponding to φPICK
as the recovered data, DTREC. Since there is no feedback in this CDR, it is also called a
feed-forward CDR. The key component in this CDR architecture is the PD.
Figure 2.15 demonstrates the phase detection algorithm through a 3x oversampling
example, i.e. N = 3. A three-phase blind sampling clock, CK, with phases 0, 1/3 and
2/3 UI triggers the sampler at the rising edge of each phase to take the total of 3 samples
per UI. The PD then performs the XOR function on each pair of adjacent samples to find
the two clock phases closest to the data symbol boundaries. In this example, the symbol
2.2. CDR Architectures for Binary-Sampling Receivers 22
boundaries occur between 1/3 and 2/3 UI phases of the sampling clock. The binary nature
of sampling allows to detect only the range, rather than the exact value, of the data zero-
crossing phase. A circular phase diagram in Figure 2.15 highlights this range with a shaded
sector between φ = 1/3 and φ = 2/3. The PD identifies the most distant phase from the
zero-crossing range as the data-picking phase, φPICK , since this phase is the closest to the
UI center. In our example, φPICK = 0, and the samples taken at the 0 UI phase of the
clock become the recovered data, Di (highlighted with circles in the figure). This data-
picking scheme results in a correct decision as long as both zero-crossings surrounding
φPICK occur within the shaded sector in the circular diagram of Figure 2.15. A deviation
of two consecutive zero-crossings by the total of 1/3 UI, or cycle-to-cycle jitter of 1/3 UI,
leads to a data-decision error. This observation allows to estimate the high-frequency jitter
tolerance of a 3x oversampling CDR.
Since the sampling clock is free-running with respect to the data in the oversampling
CDR, some phase and frequency offsets between the received data and the sampling clock
are inevitable. The blind-sampling CDRs compensate for these phase and frequency offsets
using a FIFO buffer and a data flow-control technique. A FIFO buffer retimes the recov-
ered data from the transmission rate to the consumption rate by the data consumer at the
output of the interconnect. This technique is limited by the FIFO size and it is suitable to
compensate primarily for the phase offsets. The data flow-control technique compensates
for small frequency offsets between the transmitter and receiver by means of negotiating
the data flow rate from the data source to consumer. The source adjusts the rate at which
the information bits are sent over the interconnect by inserting the padding bits into the
data stream. The padding bits carry no information and they only serve the purpose of com-
pensating for the frequency offsets. The data consumer then eliminates these padding bits
(if necessary) from the recovered data. The flow-control is typically implemented at the
level of the data source and consumer, and it does not affect the interconnect or the CDR
architecture.
Figure 2.16 shows the ideal jitter tolerance of a 3x oversampling CDR. The jitter tol-
erance is limited at low and high frequencies by two distinct effects. At low frequencies,
below ω1, the FIFO restricts the maximum jitter tolerance to the FIFO size, while at high
frequencies, above ω2, the phase detection and data decision schemes restrict the jitter tol-
erance to 1/3 UIPP. The values of ω1 and ω2 as well as the jitter tolerance between these
2.3. CDR Architectures for ADC-Based Receivers 23
Jitter frequencyω2
Jitter amplitude,UI PP
1/3
FIFOSize
ω1
Figure 2.16: Jitter tolerance of the 3x oversampling CDR.
RXIN
ADC
Samples1 0.6 0 0 0 0.4 1 1 1
Figure 2.17: Sampling with an ADC.
two frequencies depend on the implementation of the phase detector. This dependence is
thoroughly analyzed in [41].
The blind oversampling CDR architecture is typically implemented entirely in the dig-
ital domain, which allows this architecture to take advantage of the IC technology scaling.
This blind-sampling CDR architecture comes at the cost of reduced jitter tolerance at high
frequency compared to the phase-tracking CDR. Moreover, the oversampling requires a
multi-phase clock generation and distribution scheme which increases the circuit complex-
ity.
Similar to the conventional phase-tracking CDR, the oversampling CDR equalizes the
received signal in the analog domain prior to sampling. This analog equalization limits
the amount of channel compensation that can be practically implemented in an integrated
receiver. Sampling the received signal with an ADC, instead of a binary sampler, allows
for the equalization in the digital domain after sampling. The following section is devoted
to the CDR architectures for the ADC-based receivers.
2.3 CDR Architectures for ADC-Based Receivers
An ADC-based receiver samples the received signal with an ADC, preserving both the sign
and amplitude of the signal at the sampling instances, as shown in Figure 2.17 [38–40].
2.3. CDR Architectures for ADC-Based Receivers 24
EQADC PD Loop Filter
VCO / PI
RXIN1
DTREC
1
CKREC VCTRL
φERR
DACφAVG
DataDecision
1
Figure 2.18: Simplified block-diagram of a Mueller-Muller CDR.
The signal amplitude captured in the samples allows for the integration of extensive digital
signal processing (DSP) into the receiver to compensate for high channel distortion after
the signal is sampled [4, 5]. Furthermore, the equalization in the digital domain simplifies
the circuit design and takes the full advantage of the IC technology scaling [7]. At the time
of this thesis writing, these benefits of the ADC-based receivers come at the cost of higher
power consumption by the ADC sampler compared to the binary-sampling flip-flops.
The digital samples capture more information about the signal at the sampling instances
than the binary samples. This extra information in the samples allows the ADC-based
receivers to recover the clock and data either using the phase-tracking sampling at 1x the
baud rate or using the blind sampling at 2x the baud rate. The remainder of this section
reviews the CDR architectures for the ADC-based receivers through two examples: a phase-
tracking Mueller-Muller CDR in Section 2.3.1, and a blind-sampling interpolating feedback
CDR in Section 2.3.2.
2.3.1 Mueller-Muller CDR Architecture
Figure 2.18 shows a simplified block-diagram of the Mueller-Muller CDR [4–6, 38]. Sim-
ilar to the conventional phase-tracking CDR, the Mueller-Muller architecture relies on a
phase-tracking feedback loop to align the recovered clock, CKREC, with the data symbols
in the received signal, RXIN . First, CKREC triggers the ADC to sample RXIN once per UI,
i.e. at 1x the baud rate, such that the samples are close to the UI centers. Then a digital
equalizer compensates the signal samples for the channel ISI. Next, the PD generates the
phase error, φERR, which indicates the deviation of the sampling instances from the UI cen-
ters. A digital loop filter then averages φERR to obtain the average error between the data
2.3. CDR Architectures for ADC-Based Receivers 25
1UI 1UI
h0
h1h–1
t
(a) τA = h−1−h1
1UI 1 UI
h0
h1
h–1 t
(b) τB = h−1
Figure 2.19: Mueller-Muller timing recovery from an impulse response.
and sampling phases, φAV G. A digital-to-analog converter (DAC) converts φAV G into the
analog control voltage, VCT RL, for the VCO or the phase interpolator (PI). Finally, the VCO
or PI adjusts the phase of CKREC to align it with the UI centers in RXIN , thus closing the
feedback loop. Since the samples are at the UI centers, the data decision algorithm uses
these samples to output the recovered data, DTREC. This feedback loop can be analyzed at
the system level similar to that of the conventional phase-tracking CDR. The phase detec-
tion approach, however, significantly differs from the bang-bang PD. The Mueller-Muller
phase detection scheme is reviewed next.
The PD extracts the timing information from the digital baud-rate samples of the re-
ceived signal to drive the phase-tracking loop [38]. This timing recovery from the baud-
rate samples is illustrated in two steps. First, the timing recovery scheme from a channel
impulse response is shown; second, this scheme is extended to a continuous data stream.
Figure 2.19 illustrates the timing recovery from an impulse response. Some small amount
of ISI in the form of pre-cursor, h−1, and post-cursor, h1, is vital for the Mueller-Muller tim-
ing recovery scheme. In case of a symmetric ISI around the symbol cursor, h0, shown in
Figure 2.19(a), the sampling phase that makes the pre- and post-cursors equal, i.e. h−1 = h1,
places h0 close to the maximum of the impulse response, which is the desired sampling
phase. Hence, a function
τA = h−1−h1 (2.5)
indicates if the sampling phase is early or late for the optimum sampling, and it can guide
the timing recovery. However, replacing this symmetric-ISI channel with an asymmetric-
ISI channel, shown in Figure 2.19(b), prevents the timing function τA in (2.5) from correctly
detecting the data phase, and the phase tracking loop from converging. This asymmetric-
2.3. CDR Architectures for ADC-Based Receivers 26
t
yn = –0.5
ŷn = –1
yn-1 = 0.75
ŷn-1 = 11 UI
(a) Early sampling: τA = (0.75×−1)− (−0.5×1) =−0.25
yn = –1
ŷn = –1
yn-1 = 1
ŷn-1 = 1
1 UI
t
(b) Correct sampling: τA = (1×−1)− (−1×1) = 0
yn = –0.75
ŷn = –1
yn-1 = 0.5
ŷn-1 = 1
1UI
t
(c) Late sampling: τA = (0.5×−1)− (−0.75×1) = 0.25
Figure 2.20: Mueller-Muller timing recovery from continuous data.
ISI channel requires a different timing function for a successful timing recovery. In the
example of Figure 2.19(b), a sampling phase that makes h−1 = 0 places the cursor sample,
h0, close to the maximum of the impulse response. Therefore, a function
τB = h−1 (2.6)
allows to recover the signal timing in case of an asymmetric impulse response. These two
functions, τA and τB, illustrate that for a successful timing recovery from the baud-rate
samples the timing function needs to match channel impulse response.
As the second step of illustrating the Mueller-Muller timing recovery scheme, it can
be shown that a simple operation can estimate the timing function from a continuous data
stream in the average sense [38]. For instance, the timing function τA in (2.5) can be
estimated using
τA = (yi−1 · yi)− (yi · yi−1) , (2.7)
2.3. CDR Architectures for ADC-Based Receivers 27
Jitter frequencyω0
Jitter amplitude,UI PP
1
Figure 2.21: Jitter tolerance of the Mueller-Muller CDR.
where yi and yi−1 are the signal samples, while yi and yi−1 are the corresponding decision
bits. Figure 2.20 illustrates this timing recovery through a simplified example of two con-
secutive pulses. The pulses exceed 1 UI in duration and therefore they interfere with each
other forming some ISI. In this example, the timing function, τA, becomes negative, zero
or positive depending on the alignment between the sampling instances and the UI centers.
In a similar way, other timing functions can be evaluated in the average sense from the
samples of the continuous data stream.
Since in the Mueller-Muller CDR the samples are aligned with the UI centers and the
CDR adjusts its sampling frequency to the received data rate, the jitter tolerance of the
Mueller-Muller CDR is similar to that of a phase-tracking CDR. For convenience, this
jitter tolerance is repeated in Figure 2.21. At low frequency, the jitter tolerance of the
Mueller-Muller CDR is determined by the bandwidth of the phase-tracking loop, while at
high frequency the jitter tolerance is limited to 1 UIPP.
The Mueller-Muller CDR architecture is well suited for the high-performance intercon-
nects. The digital equalizer implementation satisfies the need for extensive compensation
for the channel ISI at high data rates. Typically, the standards for the high-performance
interconnects have a well defined channel response, which allows using a single timing
function in the Mueller-Muller CDR for the given interconnect standard. The low-cost
interconnects, in contrast, impose fewer restrictions on the channel characteristics. Some
standards, such as USB, SATA and HDMI [9,11,12], allow the end-user to pick the channel,
which can range in length from several centimeters to several meters. Such a wide range
of channel characteristics within a single standard makes it challenging to use the Mueller-
Muller CDR architecture for the low-cost interconnects since a single timing function is
unlikely to suite the entire range of possible channel responses. The high complexity of the
2.3. CDR Architectures for ADC-Based Receivers 28
EQADC PD Loop FilterRXIN
2
DTREC
2 φERR
φAVG
DataDecision
1
Blind Sampling Clock
Interpolator2
InterpolationIndex Updater
µ
Figure 2.22: Simplified block-diagram of an interpolating feedback CDR.
Si+1
Si+3Si+5
Si+2 Si+4
Si Si+7Si+6 Si+8
RXIN
Interpolated
Samples
Ii+1
Ii+3
Ii+5
Ii+2 Ii+4
IiIi+7Ii+6
Figure 2.23: Blind and interpolated samples in the interpolating feedback CDR.
phase-tracking loop that spans across the analog and digital domain boundaries (see Fig-
ure 2.18) contributes to the design and verification costs of the receivers with the Mueller-
Muller CDR. This poses further challenges in adopting the Mueller-Muller architecture for
the low-cost interconnects. The interpolating feedback CDR architecture, which is the sub-
ject of the following subsection, is better suited for the low-cost interconnect since it has a
low sensitivity to the channel response and offers an all-digital circuit implementation.
2.3.2 Interpolating Feedback CDR Architecture
Figure 2.22 shows a simplified block-diagram of the interpolating feedback CDR [39, 40].
This CDR samples the received signal blindly, and then emulates the phase-tracking in the
digital domain using interpolation. The blind nature of sampling rules out the sampling at
1x the baud rate since in the worst case, the baud-rate samples might fall on the UI edges,
which makes the error-free data recovery practically impossible. Hence, the interpolating
feedback CDRs typically sample the received signal at 2x. A digital equalizer then com-
pensates the signal samples for the channel ISI. To recover the clock and data, the CDR
2.3. CDR Architectures for ADC-Based Receivers 29
Si+1
Si
µ
Ii
-3
5
0.6 t
Ii = (1–µ)Si + µSi+1
Si = -3
Si+1=5
µ=0.45
Ii=0.6
Figure 2.24: Linear interpolation.
interpolates between the blind samples, Si, a new set of samples, Ii, as illustrated in Fig-
ure 2.23. Every UI has two interpolated samples, Ii, such that one sample is close to the UI
center, while the other sample is close to the UI edge. The PD then uses the interpolated
UI-edge samples to detect the phase error, φERR, which is the deviation of the interpolated
samples from the symbol boundaries in RXIN (see Figure 2.22). A digital loop filter then
recovers the average zero-crossing phase φAV G from φERR. An interpolation index updater
converts the recovered φAV G into an interpolation index, µ . This index adjusts the position
of the interpolated samples with respect to the blind samples to align Ii with the UI bound-
aries, thus closing the digital feedback loop. The data decision block uses the interpolated
UI-center samples to generate the recovered data, DTREC. The interpolator in the feedback
loop enables this CDR architecture to emulate the phase-tracking entirely in the digital
domain. The interpolation operation is briefly discussed next.
Figure 2.24 illustrates function of the interpolator through an example of the first order,
i.e. linear, interpolation [39]. The inputs to the interpolator are two blind signal samples,
Si and Si+1, and the interpolation index, µ , which ranges from 0 to 1. The interpolator
joins the two samples with a line, and outputs the amplitude of this line at the interpolation
point, Ii, defined by the interpolation index, µ , as the proportion of the half-UI time interval
between Si and Si+1. The interpolator computes its output, Ii, according to
Ii = (1−µ)Si + µSi+1, (2.8)
which requires two multiplier and two adders, amounting to a high circuit complexity of
the interpolator.
At the cost of significantly increasing the interpolation order beyond linear, this inter-
2.3. CDR Architectures for ADC-Based Receivers 30
Jitter frequencyω2
Jitter amplitude,UI PP
0.5
FIFOSize
ω1
1
Figure 2.25: Jitter tolerance of the interpolating feedback CDR.
polating feedback CDR architecture is able to recover the data from the samples taken at
rates between 1x and 2x the baud rate [39]. The high interpolation order, reaching 8th order
or higher in some examples [42], causes high system complexity and power consumption,
which makes this approach impractical for high-speed low-cost interconnects.
The jitter tolerance of the interpolating feedback CDR is shown in Figure 2.25. The
shape of this jitter tolerance curve is determined by four constraints. The first constraint
is due to the blind nature of sampling in the interpolating feedback CDR. With blind sam-
pling, the CDR’s sampling clock may have some frequency offset with respect to the trans-
mitter clock. As a consequence, similar to an oversampling CDR in Section 2.2.2, the
low-frequency jitter tolerance of the interpolating feedback CDR is limited by the size of
the data-retiming FIFO. The second constraint limiting the jitter tolerance is imposed by
the phase-tracking loop. This constraint is similar to that of a phase-tracking CDR in Sec-
tion 2.2.1. The loop properties also define the frequency range ω1 – ω2 in which the loop
determines the jitter tolerance. The third constraint stems from the interpolation with jitter
frequencies exceeding the bandwidth of the phase-recovery loop. The goal of the loop is to
guide the interpolation such that the interpolated samples are aligned with the UI centers.
With an ideal interpolation, this technique becomes equivalent to sampling the signal at the
UI centers and therefore the jitter tolerance above ω2 is 1 UIPP. The non-ideal interpola-
tion further reduces the jitter tolerance limit of the interpolating feedback CDR. The forth
constraint limits the jitter tolerance near the maximum jitter frequency, which is half of the
baud rate.
This constraint is related to the sampling rate and the interpolation order. The CDR
fails to recover the data when none of the samples fall into a bit period and the interpolator
fails to calculate the correct signal value at the UI center. In the example of 2x sampling
2.3. CDR Architectures for ADC-Based Receivers 31
ADCRXIN
2
DTRECData
Decision
1
Blind Sampling Clock
Digital Timing Recovery Core
Filter Control
Joint-Adaptation Filter (EQ + TR)
PD Loop FilterφERR
φAVG
Figure 2.26: Simplified block-diagram of a joint-adaptation-based CDR [43].
and linear interpolation, this case occurs when a bit period reduces below 0.5 UI, which
is a time between two adjacent samples. In this case neither sampling itself, nor linear
interpolation recovers the bit, which causes an error. As a consequence, at maximum jitter
frequency, the jitter tolerance is limited to 0.5 UIPP in 2x interpolating feedback CDR with
linear interpolation. In contrast, higher interpolation order may recover a data bit even
when no samples fall in a UI. This improves this jitter tolerance limit at the cost of the
circuit complexity required to implement the higher order interpolation.
In addition to the interpolating-feedback approach, it is possible to use a joint-adaptation
approach in order to recover the clock and data from the blind samples taken at 2x the
baud rate. Figure 2.26 illustrates a simplified block diagram of a joint-adaptation-based
CDR, which combines the equalizer (EQ) and the timing-recovery (TR) interpolator into a
joint-adaptation filter [43]. The joint-adaptation filter simultaneously varies its magnitude
response and phase response in order to perform two actions. First, the filter compensates
the received signal for the channel ISI by adjusting its magnitude response, thus performing
the signal equalization. Second, the filter shifts the received signal in time by adjusting its
phase response, namely the group delay, such that at the filter output the signal is aligned
with the sampling clock. A feedback loop, consisting of a phase detector, loop filter and
digital timing recovery core, controls the joint-adaptation filter. The joint-adaptation-based
CDR successfully recovers the data under the condition that the frequency offset between
the transmitter and receiver is small. With the increasing frequency offset, the performance
of the CDR degrades. This high sensitivity to the frequency offset restricts this CDR to the
applications that guarantee small frequency offsets, such as backplane Ethernet channels.
Compared to the Mueller-Muller CDR architecture [38], the interpolating feedback
2.3. CDR Architectures for ADC-Based Receivers 32
and joint-adaptation-based CDRs [39,40,43] replace the analog/digital phase-tracking loop
with an all-digital feedback loop, simplifying the overall receiver design. This comes at
the cost of doubling the sampling rate with the ADC and increasing the complexity of
the digital block due to the interpolator or filter. Unlike the Mueller-Muller CDR, the
interpolating feedback architecture is insensitive to the impulse response of the channel,
which makes it suitable for the low-cost interconnects with little control of the channel
properties. The interpolating feedback CDR only requires that the sampling rate exceeds
the bandwidth of the received signal by 2x or more in order to avoid aliasing during the
interpolation. This condition is typically satisfied by the limited channel bandwidth in the
high-speed interconnects: with high channel attenuation above fB/2, sampling at 2x, i.e. at
2 fB, is sufficient to avoid aliasing. A simple anti-aliasing filter preceding the ADC prevents
the aliasing if the channel bandwidth exceeds the value prescribed by the standard.
The interpolator in the digital feedback loop introduces an error into the set of inter-
polated samples due to the low-order interpolation. This interpolation error, which may
degrade the CDR performance, can be reduced by using a higher order interpolation in-
stead of a linear interpolation. Higher order interpolation, however, comes at the cost of
further increase of the interpolator circuit complexity, which in turn increases the interpo-
lator latency. Since the interpolator is in the feedback loop, this latency can compromise
the stability of the loop [6]. Thus, the interpolation order is a trade-off between the inter-
polation accuracy on the one hand, and the implementation complexity with the resulting
latency on the other hand.
The CDR topologies described in this chapter are widely used in the high-speed re-
ceivers. Table 2.1 summarizes the key characteristics of the recently published receivers.
The table lists the year of publication, the data rate, the power consumption, and the CDR
type in every receiver. The receivers listed in this table use the phase-tracking, Mueller-
Muller and oversampling CDR types. However, the papers presenting interpolating feed-
back CDRs with measured results were published over 10 years ago. The lack of recent
publications on the interpolating feedback CDRs suggests that the circuit complexity and
high sampling rate make the use of this CDR type challenging in the high-speed receivers.
The remainder of this thesis explores new CDR architectures for the blind-sampling
ADC-based receivers. First, the thesis proposes a feed-forward CDR architecture that elim-
inates the interpolating feedback from the digital CDR to reduce the CDR circuit complex-
2.4. Summary 33
Table 2.1: Recently published high-speed receivers.
Ref. Year Data Rate Power CDR Type
[23] 2009 10.3 Gb/s 260 mW Phase-tracking
[24] 2005 10 Gb/s 133 mW Phase-tracking
[44] 2010 12 Gb/s 130 mW Phase-tracking
[45] 2006 6.4 Gb/s 310 mW Mueller-Muller
[6] 2007 12.5 Gb/s 330 mW Mueller-Muller
[4] 2008 10 Gb/s 4.5 W Mueller-Muller with Viterbi MLSD
[36] 2007 44 Gb/s 910 mW Oversampling
[46] 2007 3.5 Gb/s 115 mW Oversampling
ity [17, 18]. This architecture recovers the phase and data from the blind ADC samples
taken at 2x the baud rate. Then, the thesis proposes a fractional-sampling-rate (FSR) CDR
architecture that reduces the sampling rate below 2x in order to reduce the ADC power
and area [19]. Both proposed architectures are experimentally validated through the design,
fabrication and measurements of the receiver test-chips.
2.4 Summary
Two effects resulting from the non-ideal channels, the ISI and timing uncertainties, neces-
sitate the two blocks in the high-speed receivers: the equalizer and the CDR. The binary-
sampling receivers implement the pre-sampling equalizers in the analog domain, which
limits the amount of equalization that can be integrated in a receiver. Sampling the signal
with an ADC allows for the post-sampling equalization in the digital domain, which in turn
allows to implement more extensive equalization schemes. The receivers compensate for
the timing uncertainties using either a phase-tracking or a blind-sampling CDR. The phase-
tracking CDR architectures use a feedback loop that spans across the analog and digital
domains to align the sampling instances with the received data. The blind-sampling CDRs,
in contrast, sample the signal without any phase relation between the sampling and data
phases, allowing to implement the CDRs entirely in the digital domain. The ADC-based
2.4. Summary 34
blind-sampling receivers implement both the equalizer and the CDR in the digital domain,
which makes these receivers simple to scale with the IC technologies. The blind ADC sam-
pling, however, comes at the cost of a high power consumption by the ADC and a high
complexity of the digital CDR due to the interpolating digital feedback loop.
This thesis first proposes a low-complexity feed-forward CDR architecture for the blind
ADC-based receivers in Chapter 3, which eliminates the digital feedback loop and thus
reduces the power and area of the digital CDR. Then, the thesis proposes an FSR CDR
architecture in Chapter 4, which reduces the sampling rate below 2x to save the ADC
power and area.
Chapter 3
An ADC-Based Feed-Forward CDR Architecture
THE BLIND-SAMPLING ADC-based receivers discussed in Section 2.3.2 implement
the CDR phase-tracking feedback loop entirely in the digital domain thus allowing
for a single interface between the analog and digital domains through the ADC [39, 40].
This class of receivers also allows to equalize the received signal in the digital domain. In
fact, with the exception of the ADC, the entire receiver can be implemented in the digital
domain. This aspect of the blind-sampling ADC-based receivers makes them highly scal-
able with the technology nodes, robust to process, voltage and temperature variations, and
allows for a short design time due to the automation of the digital design flow. However,
the digital feedback loop in the previously reported CDRs relies on interpolation to recover
the phase and data from the digital signal samples. This interpolating feedback leads to a
high complexity of the digital CDR thus restricting the data rates of this class of receivers.
This chapter proposes a feed-forward CDR architecture for the blind-sampling ADC-
based receivers. This architecture recovers the phase and data directly from the blind digital
samples of the received signal in a feed-forward manner, eliminating the need for the inter-
polating feedback loop. The feed-forward topology reduces the CDR’s circuit complexity,
making this architecture suitable for the high-speed interconnects. To experimentally vali-
date the proposed architecture, a 5 Gb/s 2x ADC-based receiver with the feed-forward CDR
was implemented in 65 nm CMOS [17, 18]. This chapter presents the feed-forward CDR
architecture using a top-down approach: an introduction of the architecture and receiver is
followed by a description of the building blocks essential to this architecture.
The remainder of this chapter is organized as follows. First, Section 3.1 introduces
35
3.1. Feed-Forward CDR Architecture 36
ADCPhase
DetectorFilterRXIN
Blind Sampling Clock
φX φERR
φAVG
Data
DecisionDTREC
φAVG
Figure 3.1: Proposed feed-forward CDR architecture (simplified block-diagram).
the proposed feed-forward CDR architecture and presents an ADC-based receiver with the
feed-forward CDR. Then, Section 3.2 presents the phase detection scheme that enables
the feed-forward phase and data recovery. Next, Sections 3.3 and 3.4 describe the phase-
recovery filter and the data-decision scheme used in the proposed CDR. Section 3.5 presents
a data retiming scheme used in the feed-forward CDR to assure the error-free data recovery
in the presence of a frequency mismatch between the transmitter and receiver. Section 3.6
validates the proposed CDR architecture through the simulations and measurements of a
receiver test-chip implementing the feed-forward CDR. Finally, Section 3.7 summarizes
this chapter.
3.1 Feed-Forward CDR Architecture
Figure 3.1 presents a simplified block-diagram of the proposed feed-forward CDR archi-
tecture. First, a blind sampling clock triggers the ADC to sample the received signal, RXIN ,
without any phase relation between the sampling instances and the UI boundaries in the
signal. Then, a phase detector (PD) estimates the data zero-crossing phase with respect to
the sampling clock for every transition in the received signal. This phase is further referred
to as the instantaneous data phase, φX . Next, the CDR recovers the average data phase,
φAV G, in two steps: a phase subtracter calculates a phase error, φERR, by subtracting φAV G
from φX , and then a filter averages φERR to obtain φAV G. Finally, a data decision block picks
a sliced sample of RXIN as recovered data, DTREC, based on the values of φX and φAV G.
The proposed architecture estimates the data phase directly from the blind digital sam-
ples of the received signal. Compared to the previously reported blind ADC-based CDRs
(see Figure 2.22), this direct phase estimation eliminates the need for the interpolation and
3.1. Feed-Forward CDR Architecture 37
RXIN
5Gb/s
5 GHz BlindSampling Clock
32
Digital CDR
2 phases
EQφERR5-bit
5GS/sADC
2:32
32PD
16Filter
Data
DecisionFIFO
φX
φAVG
16DTREC
15/17
φAVG
16
S
D
RetimingClock
Figure 3.2: Receiver with the proposed feed-forward CDR architecture.
moves the PD outside the phase-recovery feedback loop. As a result, the phase-recovery
loop in the proposed architecture simplifies to the phase subtracter and the filter, as shown in
Figure 3.1. In fact, this phase-recovery loop can be viewed as an infinite-impulse-response
(IIR) phase-recovery filter with input φX and output φAV G, which is a convenient imple-
mentation of the averaging function. Since the data-decision block uses only the input and
output of the phase-recovery filter, the proposed CDR architecture is referred to as feed-
forward architecture. This feed-forward topology leads to a low circuit complexity of the
CDR as will be shown in the remainder of this chapter through a description of a sample
receiver with the proposed CDR and its building blocks.
Figure 3.2 illustrates a block-diagram of a 5 Gb/s 2x ADC-based receiver with the pro-
posed feed-forward CDR architecture. A 5 GHz two-phase sampling clock triggers two
time-interleaved 5-bit ADCs to sample the 5 Gb/s received signal, RXIN , blindly at 2x the
baud rate for the total sampling rate of 10 GS/s. The two phases of the sampling clock
are further referred to as 0° phase and 180° phase. To reduce the operating speed of the
digital block, a 2:32 DeMUX then feeds 32 samples at every 16-UI interval, or frame, to
the digital CDR. These 32 samples, each represented by 5 bits, correspond to 16 consecu-
tive sampling cycles. A 1:16 clock divider divides the sampling clock to trigger the digital
CDR. An equalizer compensates the DeMUXed signal samples for the channel loss. Next,
the PD uses the equalized samples, S, to estimate φX for every UI with a data transition. A
phase subtracter and a filter average φX to generate φAV G. A data-decision block followed
by a FIFO compose the data recovery path of the CDR. In this path, the data-decision block
picks one sliced sample per UI as a data bit, D, by comparing φX with φAV G for every UI.
3.2. Phase-Detection Scheme 38
Si+1
Si
µ
-3
5
0t
(1–µ)Si + µSi+1 = 0
Si = -3
Si+1=5
µ = 3/8
µ = Si
Si – Si+1
Figure 3.3: Proposed linear phase estimation scheme.
Then, the FIFO compensates the data bits for the frequency offset between the transmitter
and receiver, and outputs the recovered data, DTREC, by re-timing the data from the blind
sampling clock domain. To simplify the CDR verification, the re-timing clock is the baud-
rate clock (divided by 16), which assures that the data rates at the CDR input and output
are identical.
The remainder of this chapter first presents the implementation details of the CDR’s
building blocks in Sections 3.2–3.5, and then validates the proposed architecture through
the simulations and measurements of the receiver with the feed-forward CDR in Section 3.6.
3.2 Phase-Detection Scheme
Figure 3.3 presents the proposed linear phase estimation scheme used in the feed-forward
CDR architecture. Similar to the operation of linear interpolation, the linear phase estima-
tion joins two samples of opposite polarities, Si and Si+1, with a line. The equation of this
line allows to interpolate a new sample value, Ii, between Si and Si+1 at the time marked by
the interpolation index, µ:
Ii(µ) = (1−µ)Si + µSi+1. (3.1)
However, instead of calculating Ii(µ) as in the interpolator (see Figure 2.24), the proposed
phase estimation scheme sets Ii(µ) = 0 and calculates the corresponding interpolating in-
dex, µ , using:
µ =Si
Si−Si+1. (3.2)
3.2. Phase-Detection Scheme 39
5
-2
0
B, 180o
A, 0o
φX
t
000 001 010 011
[0, 0.5) UI
X
A, 0oC, 360o
100 101 110 111
[0.5, 1) UI
(a) Transition between A and B, φX < 0.5 UI
2
-3
0
B, 180o
A, 0o
φX
t
000 001 010 011
[0, 0.5) UI
X
A, 0oC, 360o
100 101 110 111
[0.5, 1) UI
(b) Transition between B and C, φX > 0.5 UI
Figure 3.4: Linear estimation of instantaneous phase, φX .
In fact, this interpolating index estimates the time of the zero-crossing between Si and Si+1
(with respect to Si) directly from the sample values, which enables the phase detection in
the feed-forward manner. Furthermore, limiting the resolution of µ to 2 bits allows for a
low-complexity circuit implementation of the proposed phase estimation scheme as shown
later in this subsection. Since both the interpolation and phase estimation rely on the same
relation of (3.1), these two operations have similar effects on the phase detection accuracy.
The PD estimates the data zero-crossing phase, φX , from the equalized digital samples
of the received signal. The PD processed the samples from 16 cycles of the sampling
clock in parallel (2 samples per cycle) and outputs φX for every UI with a data transition.
Figure 3.4 illustrates the phase detection scheme based on the linear estimation through
an example of a single cycle of the sampling clock. The PD looks at three consecutive
samples: A, B and C, which correspond to 0°, 180° and 360° phases of the blind sampling
clock. Since sample C corresponds to 360° phase, it is also sample A (0° phase) in the
following cycle of the sampling clock. When two adjacent samples have opposite signs,
the PD linearly estimates the time of zero-crossing between these two samples with respect
3.2. Phase-Detection Scheme 40
Y N
Y N Y N
φX = 011φX = 010φX = 001φX = 000
|A|
|A| + |B|
|A| + |B| < 4|B||A| + |B| > 4|A|
|A| < |B|
Figure 3.5: Flowchart of 2-bit accurate division for calculating φX .
to 0° phase of the sampling clock. The estimated zero crossing is marked as point X in
Figure 3.4. Since the adjacent samples are 0.5 UI apart in time, φX is calculated as
φX =0.5|A||A|+ |B| (3.3)
when the transition occurs between A and B in the example of Figure 3.4(a), and as
φX = 0.5+0.5|B||B|+ |C| (3.4)
when the transition occurs between B and C in the example of Figure 3.4(b). The phase
calculation occurs only when the adjacent samples have opposite polarities, which allows
to use the absolute values of the sample magnitudes in (3.3) and (3.4). The use of the
absolute values, in turn, allows to treat ‘0 – 1’ and ‘1 – 0’ data transitions identically, which
simplifies the PD circuit implementation.
To maintain low circuit complexity, the division accuracy in (3.3) and (3.4) is limited
to 2 bits. The flowchart in Figure 3.5 shows that this 2-bit accurate division requires only
simple operations: addition, comparison and left shift by 2 (multiply by 4 in decimal).
Since this division operation covers only 0.5 UI, the total φX resolution is 3 bits per UI.
The third, most significant, bit (MSB) of φX depends on the position of the zero crossing:
a crossing between A and B makes MSB=‘0’, while a crossing between B and C makes
MSB=‘1’. The discussion of the effect of limiting φX accuracy to 3 bits is postponed till
Section 3.4.
Nominally, there is at most one data transition in every cycle of the sampling clock:
either between A and B or between B and C. However, duty-cycle distortion (DCD) might
cause two transitions per sampling cycle: between A and B as well as between B and C.
3.3. Phase-Recovery Filter 41
16φAVGφX
φERR K1z-1
1 – z-1K2z-1
1 – z-1K3z-1
1 – z-1
3rd order low-pass filterphase subtracter
K1z-1
1 – z-1
K1K2z-2
(1 – z-1)2
K1K2K3z-3
(1 – z-1)3
φAVGφERR
φX[1] mod
mod
mod
φERR[1]
φERR[2]
φERR[16]
1/16
φAVG
φERRφX[2]
φX[16]
Figure 3.6: Phase recovery filter.
When two such transitions occur, the PD calculates φX as a modulo-1 sum of both zero-
crossing phases so that both transitions contribute to the average phase recovery. This
allows the phase detection scheme to estimate the data phase in the presence of DCD.
The following subsection describes the phase recovery filter implemented in the CDR.
3.3 Phase-Recovery Filter
The phase recovery filter averages the instantaneous zero-crossing phase, φX , to recover
φAV G, which tracks the data phase in the average sense. For this phase tracking, the CDR
uses a discrete-time IIR filter shown in Figure 3.6. The filter consists of a phase subtracter
and a 3rd order low-pass filter in a feedback loop.
The phase subtracter, shown in the left inset of Figure 3.6, calculates the phase differ-
ence between φX and φAV G for 16 UIs at a time and outputs the combined phase error, φERR,
for these 16 UIs. To assure that the phase recovery converges for any phase offset between
the data and sampling phases, φERR is calculated in a modulo manner such that φERR[i] is
in the range [–0.5, 0.5) UI. The subtracter excludes the UIs without data transitions from
contributing to φERR. φERR in the feed-forward CDR architecture plays the same role as the
3.3. Phase-Recovery Filter 42
FF
010001000011110101100111x2
÷4
÷2
KCONST
0
gain[2:0]
INOUT
3
2
1.5
1
0.75
0.5
0.25
0
KPROG
Figure 3.7: Discrete-time integrator with programmable gain.
PD output in a conventional phase-tracking CDR.
The phase error is fed into the low-pass filter (LPF), which consists of three cascaded
discrete-time delaying integrators with programmable gains K1, K2 and K3. These pro-
grammable gains allow adjusting the CDR’s jitter-tracking bandwidth. Figure 3.7 presents
the implementation of the integrators used in the LPF. First, the input signal, IN, is scaled
by the product of a constant gain, KCONST , with a programmable gain, KPROG. Then, the
scaled signal is accumulated using an adder and a flip-flop (FF) in a feedback configura-
tion to generate the output, OUT . A control signal, gain[2 : 0], sets the value of KPROG
through an 8-to-1 selector. The eight possible KPROG values are chosen such that they can
be calculated using only simple to implement operations: left/right shift (multiply/divide
by 2 in decimal) and addition. The resulting KPROG values range from 1/4 to 3, while
KPROG = 0 is used for debugging purposes. The resolution of the intermediate phase values
in the integrators is 16 bits: 10 least significant bits represent the fractional part of the phase
(1 UI long period), while 6 most significant bits represent the integer part. To tolerate the
jitter exceeding 64 UIs (26 UIs), ‘roll-over’ rather than ‘saturating’ counters are used in the
integrators.
Three of these integrators form three forward paths in the LPF: 1st, 2nd and 3rd order
paths, as shown in the signal flow diagram in the right inset of Figure 3.6. These paths
add up to the average (recovered) phase, φAV G. The transfer function of the entire phase
recovery filter is:φAV G
φX=
AFW
1+AFW, (3.5)
3.3. Phase-Recovery Filter 43
0.1
1
10
100
1000
10000
104 106 107 109
Jitter Frequency, Hz105 108
Jitter Amplitude, UI pp
1st Order LPF
2nd Order LPF
3rd Order LPF
Figure 3.8: Jitter tolerance dependence on the LPF order (simulated, BER ≤ 5 ·10−6).
where AFW is the forward gain of the LPF:
AFW =K1z−1
1− z−1 +K1K2z−2
(1− z−1)2 +K1K2K3z−3
(1− z−1)3 . (3.6)
Three criteria determine the filter gain values: the desired jitter-tracking bandwidth
of the CDR, the absence of gain peaking in the jitter-transfer function of (3.5), and the
low-circuit-complexity filter implementation. First, the CDR jitter-tracking bandwidth was
selected (approximately 5 MHz in the proposed receiver). Then, through simulations the
gain values were determined to achieve this bandwidth while minimizing the gain peaking
in the jitter transfer function. Finally, the gain values were rounded-off to the nearest easy-
to-implement values in binary. This procedure leads to K1 = 3/64, K2 = 7/2048, and K3 =
5/2048. To illustrate the low complexity gain implementation, K1 = 3/64 is implemented
as K1 = 1/32+1/64, where gains of 1/32 and 1/64 are obtained through right-shifting the
input value by 5 and 6 bits. In a similar manner, K2 and K3 are composed of right-shift and
addition operations to maintain the low circuit complexity. These gain values are used in
the simulations and measurements presented in Section 3.6.
To explore the effect of the order of the phase-recovery filter on the CDR performance,
the filter order was reduced from 3rd to 2nd and 1st, and the CDR’s jitter tolerance was
simulated, as illustrated in Figure 3.8. None that in all three cases, the CDR’s jitter-tracking
bandwidth remains constant. As the order changes from 1st to 2nd, the high-frequency
3.4. Data-Decision Scheme 44
jitter tolerance improves by approximately 0.2 UIPP. Furthermore, with the use of the 2nd
order filter, the jitter tolerance roll-off slope increases allowing for a higher tolerance at
low frequencies. The 3rd order filter shows a small improvement of the jitter tolerance
compared to the 2nd order filter: the high-frequency jitter tolerance remains unchanged, but
the low-frequency jitter tolerance increases by up to 3× (at 32 kHz for instance). For a
safe design with a high tolerance to low-frequency jitter, the 3rd order filter was used in the
proposed feed-forward CDR.
The CDR uses the recovered φAV G along with φX for the data recovery according to the
scheme presented in the following section.
3.4 Data-Decision Scheme
The proposed feed-forward clock recovery eliminates the interpolator from the CDR thus
reducing the circuit complexity. As a consequence of this interpolator elimination, the
value of the signal at the UI center is not interpolated and therefore is unknown. To enable
error-free data recovery along with the feed-forward clock recovery, a data decision scheme
is essential to the proposed feed-forward CDR. The role of the data decision block is to
estimate the sign of the received signal near the maximum eye opening, i.e., near the UI
center. Since φAV G indicates the average position of the UI boundaries, the average position
of the UI centers is calculated by adding 0.5 UI to φAV G using modulo-1 addition. This
UI-center phase is referred to as the data-picking phase, φPICK . The data decision block
takes the signs of the samples from 16 sampling cycles and picks one decision sample for
every UI by comparing φX and φPICK .
Figure 3.9 illustrates the proposed data-picking scheme through an example of a single
sampling cycle. The data decision block takes three consecutive sliced samples (A, B and
C) and picks one of these samples as the decision bit. This decision bit is picked such that
it belongs to the UI marked by the average UI-center phase φPICK . Three sample cases
demonstrate the data decision scheme: a jitter-free case and two cases of jitter.
In a jitter-free case shown in Figure 3.9(a), φX coincides with φAV G, and hence φX is
0.5 UI away from φPICK . φPICK marks the UI from which the data is recovered (shaded
in the figure). The decision scheme thus picks one of the two samples adjacent to φPICK :
either A or B in this example. In a jitter-free case, both samples adjacent to φPICK have the
3.4. Data-Decision Scheme 45
No JitterBA C
φX, φAVGφPICK 0.5 UI
(a) Jitter-free case��� ������� �� �� ��� �������� ����(b) Jitter example 1��� ������� �� � ���� ��� ����� ����(c) Jitter example 2
Figure 3.9: Data decision scheme.
same sign and hence the decision is trivial.
In the presence of jitter, φX deviates from φAV G, and the separation between φX and
φPICK differs from 0.5 UI. As a consequence, the two samples adjacent to φPICK might
belong to different UIs, and these samples might have opposite signs, as illustrated in
Figs. 3.9(b) and 3.9(c). In this case, the data decision scheme picks the sample that be-
longs to the UI marked by φPICK (shaded UI in the figure). For instance, in Figure 3.9(b)
the jitter causes φX to shift left compared to Figure 3.9(a) and A is picked as the decision
data. In the example of Figure 3.9(c), φX shifts right compared to Figure 3.9(a) and sample
B is picked. This scheme requires a single comparison between φX and φPICK for every UI
with a data transition.
Limited channel bandwidth and DCD reduce the width of UI-long data pulses and thus
cause two transitions per sampling cycle. This case is referred to as an isolated pulse.
Figure 3.10 illustrates a nominal and isolated UI-long pulses. In the nominal case of Fig-
ure 3.10(a), samples B and C are equidistant from φPICK , which makes both samples equally
correct decisions. However, in the presence of isolated pulses (Figs. 3.10(b) and 3.10(c)),
two transitions per UI prohibit defining a single instantaneous phase value, φX . As a conse-
3.4. Data-Decision Scheme 46
A B C
φPICK
(a) Nominal pulse width, jitter-free case
A B C
(b) Two transitions in the same cycle
Bi Bi+1Ai+1
Ci
(c) Two transitions in adjacent cycles
Figure 3.10: Data decision with isolated pulses.
quence, a comparison between φX and φPICK proves insufficient for a correct data decision.
The data decision scheme detects these isolated pulses using XOR operation on every pair
of consecutive samples. It then disregards the phase information and picks the sample at
the center of the pulse, i.e., farthest from both transitions, as shown in Figure 3.10. When
two transitions occur in the same sampling cycle between A and B, and between B and C
(see Figure 3.10(b)), B is picked as the decision data. In a similar manner the decision block
checks for an isolated pulse at the boundary between two consecutive sampling cycles. As
Figure 3.10(c) illustrates, a transition between Bi and Ci in sampling cycle i followed by a
transition between Ai+1 and Bi+1 in cycle i+1 causes the decision scheme to pick Ci (Ai+1)
as the decision bit.
The proposed data decision scheme based on the comparison between φX and φPICK
recovers the data correctly when φX deviates from φAV G by up to 0.5 UI in either direction.
Hence the CDR has the theoretical maximum jitter tolerance of 1 UIPP at jitter frequencies
exceeding the bandwidth of the phase recovery filter. Estimating φX with 3-bit accuracy
(instead of infinite accuracy) results in the reduction of the high frequency tolerance by
only 1/8 UI. Further discussion of the jitter tolerance is postponed till Section 3.6.
The data decision schemes in the proposed feed-forward CDR and the interpolating
feedback CDR are functionally equivalent to each other. Figure 3.11 illustrates this equiva-
3.4. Data-Decision Scheme 47
BA
φAVG
t
C
IAB
IBC
φPICK
B
A
φAVG
t
C
IAB
IBC
φPICK
BA
φX, φAVG
t
C
X
φPICK
B
A
φAVG
t
CφPICK
X
φX
Interpolating Feedback CDR Feed-Forward CDR
Jitter-Free Case
Jitter Example
Figure 3.11: Data decision in the interpolating feedback and feed-forward CDRs.
lence by comparing the decision schemes in the two CDRs for a jitter-free case and a jitter
example in a tabular form. The interpolating feedback CDR first interpolates a new sample,
IAB, between A and B at time φPICK and then takes the sign of this interpolated sample, i.e.,
slices IAB, to get the decision bit. In fact, the decision bit, sign(IAB), inherits the sign of
either A or B — the two samples adjacent to φPICK — depending on the values of A, B and
φPICK . In the jitter-free case, A and B belong to the same UI and therefore they have the
same sign, making the interpolation and slicing redundant. Jitter or, equivalently, voltage
noise may cause a sign inversion of either A or B. In the example of Figure 3.11, this sign
inversion occurs at sample B. In this case, the interpolated sample, IAB, is a weighted aver-
age of A and B at time φPICK , which is the best estimate of the signal value at the UI center
using linear interpolation.
The feed-forward CDR, in contrast, simply takes the sign of A or B as the decision bit.
In the jitter-free case, when A and B have identical signs, the feed-forward CDR assigns
sign(A) or sign(B) to be the decision bit. This yields the same result as the interpolating
feedback CDR. When jitter leads to a sign difference between A and B, the feed-forward
CDR marks with φX the time at which the interpolation line changes the sign (from positive
3.5. Data Retiming Scheme 48
to negative in the example of Figure 3.11). The feed-forward CDR picks the sign of the
sample that is on the side of φPICK (that is, sign(A) in this example). Again, this yields the
same result as the interpolating feedback CDR.
In fact, in the interpolating feedback CDR IAB carries the weighed average information
in voltage domain, while in the feed-forward CDR φX carries the same information in time
domain. Since the decision is represented with only one bit, both schemes lead to identical
decisions while subjected to identical jitter or voltage noise conditions.
The feed-forward and the interpolating feedback CDRs differ from the point of view
of decision feedback equalization (DFE), which requires a sample value at the UI center
to cancel ISI. The interpolating feedback CDR interpolates a sample at the UI center, IAB
in Figure 3.11, which allows using conventional DFE schemes. The feed-forward CDR,
in contrast, avoids interpolating a sample at the UI center, which requires adjustments to
the DFE approach. The authors in [47] present a DFE scheme for the feed-forward CDRs
that modifies the sample values to cancel ISI instead of modifying the interpolated UI
center value. Hence, [47] demonstrates that the proposed feed-forward CDR can be used
in conjunction with the DFE.
The following section discusses the frequency offset compensation scheme that pre-
vents the data errors due to a frequency mismatch between the transmitter and receiver.
3.5 Data Retiming Scheme
The transmitter clock determines the data rate at the input of the CDR, while the blind
sampling clock determines the data rate at the output of the data decision block. Since
these two clocks are free-running with respect to each other, a frequency offset between
them is inevitable. This frequency offset, in turn, leads to a mismatch between the data
rates at the CDR input and at the data decision output. A FIFO absorbs this data rate
mismatch by retiming the decision bits from the blind sampling clock domain to a retiming
clock domain as shown in Figure 3.2.
Figure 3.12 illustrates three possible data retiming schemes. Since the average phase,
φAV G, indicates the data phase with respect to the blind sampling clock, a phase interpolator
(PI) in a feed-forward path controlled by φAV G can generate the recovered clock, CKREC,
from the sampling clock as shown in Figure 3.12(a). The recovered clock then retimes the
3.5. Data Retiming Scheme 49
RX5 Gb/s
5 GHz
32
2 ph.
ADC FIFO16
DTREC
15�17
φAVG
16
D
Digital
CDR
PICKREC
Blind sampling clock Retiming clock
4 ph.
1 ph.
(a) Phase interpolator (PI) generates the recovered clock
RX5 Gb/s
5 GHz
32
2 ph.
ADC FIFO16
DTREC15�17
φAVG
16
D
Digital
CDR
CKCONS
Blind sampling clock Retiming clock
Data
Consumer
(b) Data consumer retimes the recovered data
RX5 Gb/s
5 GHz
32
2 ph.
ADC FIFO16
DTREC
15�17
φAVG
16
D
Digital
CDR
Blind sampling clock Retiming clock
fB/16
(c) Clock-forwarded system
Figure 3.12: Data retiming schemes.
decision bits, D, through a FIFO such that the rate of the recovered data, DTREC, matches
the rate of the received data. Since the recovered clock is available in this configuration,
the feed-forward CDR with the PI retiming scheme becomes similar to a phase-tracking
CDR in the ability to track frequency offsets. This PI retiming method requires an ana-
log component – a PI running at fB/16 – to generate CKREC, thus increasing the system
complexity.
Typically the recovered data is retimed one more time to the clock domain of the data
consumer. In fact, the feed-forward CDR architecture allows to retime the decision bits
from the blind sampling clock directly to the data consumer clock, CKCONS, as illustrated
3.5. Data Retiming Scheme 50
RdPtri
RdPtri+1WrPtri
WrPtri+1
Write
Port
Read
Port
15...17
bits
16
bits
Figure 3.13: Simplified FIFO diagram.
in Figure 3.12(b). Since CKCONS is free-running with respect to the transmitter clock, this
retiming method requires the transmitter and receiver to negotiate the data flow rate using
a flow control technique mentioned in Section 2.2.2. Retiming the data directly to the
consumer clock eliminates the need for PI in the receiver, and it allows for a fully digital
implementation of the CDR and FIFO.
The third data retiming scheme, shown in Figure 3.12(c), is applicable to clock-forwarded
interconnects. In these interconnects, a divided baud-rate clock, fB/16, is transmitted along
with data such that this divided clock is available at the receiver side. The received clock
has no phase relation to the received data, however its frequency matches the transmitter
data rate (divided), and therefore the clock can be used to retime the decision bits. This
retiming scheme is also convenient for characterizing blind-sampling receivers in labora-
tory conditions, and therefore it was used in the measurements of the receiver test-chip
presented in Section 3.6.
Figure 3.13 illustrates a simplified diagram of the FIFO. The FIFO is a circular register
with two ports: a write port (shaded in the figure) and a read port (unshaded in the figure).
The write port, which is synchronized to the sampling clock, places 15, 16 or 17 decision
bits at a time to the register. This variable number of bits at the write port allows the FIFO
to compensate for the frequency offset between the sampling and the recovered clocks. The
read port, which is synchronized to the retiming clock, removes 16 retimed bits at a time.
A write pointer, WrPtri, and a read pointer, RdPtri, define the positions of the ports in the
register. The write and read take place on the opposite sides of the circular register to avoid
metastability. After each write/read access, the pointers are updated to their new positions
in the counter-clockwise direction, WrPtri+1 and RdPtri+1.
3.5. Data Retiming Scheme 51
A small frequency offset between the transmitter and receiver clocks causes the recov-
ered average phase, φAV G, to constantly shift in one direction. For instance, if the transmitter
clock has a higher frequency than the receiver clock, φAV G constantly reduces, indicating
that the received UI is shorter than the period of the sampling clock. Conversely, φAV G
constantly increases if the transmitter clock has a lower frequency than the receiver clock.
To assure that the CDR can tolerate frequency offsets, φAV G is generated using roll-over (in-
stead of saturating) registers, which allows to track continuous phase shifts in one direction,
i.e., frequency offsets. Since φPICK is a 0.5 UI shifted version of φAV G, both φPICK and φAV G
can be used as the frequency offset indicators. FIFO uses φPICK for this purpose. Once
φPICK crosses the UI boundary as it reduces from one frame of 16 UIs to another, the write
port places 17 bits into the FIFO instead of nominal 16 bits, thus compensating for a higher
data rate at the CDR input. Conversely, the write port places 15 bits into the FIFO once
φPICK crosses the UI boundary in the opposite direction as it increases from one frame to
another, thus compensating for a lower data rate at the CDR input. The write port places
the nominal 16 bits into the FIFO as long as φPICK remains within the UI boundaries going
from frame to frame. This scheme allows the CDR to sustain up to 1 UIPP jitter in every
16 UI frame.
The FIFO retimes data error-free as long as the write and read ports do not overlap. The
FIFO consists of 64 registers, which makes the nominal separation between the ports 16
registers in either direction. Every time φPICK crosses the UI boundaries, the read and write
ports become one register closer to each other. Thus, the FIFO can compensate for up to
32 UIs of jitter to allow the retiming clock to catch up with the transmitted data rate. Once
the transmitter and retiming frequencies are close to each other, the CDR can sustain jitter
exceeding 32 UIPP. If the write and read ports overlap, the FIFO resets the position of the
read port to the opposite side of the circular buffer from the write port. This reset of the
port position is also performed at the receiver start-up.
The results of the CDR simulations and test-chip measurements, presented in the fol-
lowing section, confirm that the retiming scheme successfully compensates for frequency
offsets between the transmitter and receiver.
3.6. Simulation and Measurement Results 52
Jitter Amplitude, UI P
P
0.1
1
10
100
1000
10000
104 105 106 107 108 109 1010
Jitter Frequency, Hz
With SSC, DJ = 0.15 UIPPNo SSC, DJ = 0.05 UIPPNo SSC, DJ = 0.15 UIPP
With SSC, DJ = 0.05 UIPP
Figure 3.14: Simulated jitter tolerance (BER ≤ 5 ·10−6).
Table 3.1: Jitter tolerance simulation conditions (in Figure 3.14).
Input 5 Gb/s, 231−1 PRBSChannel loss 13 dB at 2.5 GHz
TSIM(fJIT > 250kHz) 2 ·105 UIsTSIM(fJIT ≤ 250kHz) 1 jitter period
BER ≤ 5 ·10−6
TX pre-emphasis 3 dBTx-Rx ∆fCLK 600 ppm (nominal)
SSC freq. modulation 0...−5000ppm at 32 kHzTx-Rx ∆fCLK (with SSC) 10600 ppm, 1.06 % (max)
Tx RJ 0.17 UIPP (Gaussian)Tx DJ 0.19 UIPP (dual-Dirac)Rx RJ 0.23 UIPP (Gaussian)Rx DJ legend in Fig. 3.14 (dual-Dirac)
3.6 Simulation and Measurement Results
To validate the proposed CDR architecture, a receiver with the feed-forward CDR shown
in Figure 3.2 was first simulated on a behavioral level, and then it was fabricated and
characterized in a test-chip in 65 nm CMOS.
3.6. Simulation and Measurement Results 53
The CDR was simulated using an event-driven behavioral model [48] in Simulink. This
model accounts for a limited channel BW, supports asynchronous clock domains, and al-
lows adding multiple jitter sources into the simulation. Figure 3.14 presents the simulated
jitter tolerance and Table 3.1 summarizes the simulation conditions. In these simulations,
the sinusoidal jitter was superimposed on random, deterministic jitter (RJ and DJ) and a fre-
quency offset between the transmitter and receiver. The CDR was simulated with a 5 Gb/s
231−1 PRBS sequence passed through a channel with 13 dB attenuation at 2.5 GHz. The
transmitter pre-emphasis is 3 dB. To maintain reasonable simulation time, the simulations
were performed for 2·105 UIs for jitter frequencies, fJIT , above 250 kHz, and for one full
jitter period for fJIT below 250 kHz, which corresponds to BER≤ 5 · 10−6. The nominal
frequency offset between transmitter and receiver clocks, ∆ fCLK , was set to 600 ppm. In
addition to this nominal ∆ fCLK , an offset of up to 5000 ppm was introduced to emulate a
spread-spectrum clocking (SSC) at 32 kHz. This SSC-induced offset was added both at the
transmitter and at the receiver for a total ∆ fCLK of up to 10600 ppm or 1.06 %. The variance
of RJ was adjusted to reach the reported peak-to-peak values within each simulation run.
These simulations confirm that the proposed CDR recovers error-free data at 5 Gb/s in the
presence of jitter, frequency offset and channel attenuation. The simulated jitter tolerance
is below the maximum theoretical tolerance limit of 1 UIPP at high frequencies due to the
random and deterministic jitter, and the frequency offset in the simulation setup. As the
jitter frequency approaches half the baud rate, the jitter tolerance slightly reduces, which is
the expected effect introduced in Section 2.3.2.
A receiver test-chip with the feed-forward CDR architecture was fabricated in 65 nm
standard-logic CMOS. Figure 3.15 illustrates a simplified design flow of the proposed feed-
forward CDR. First, the event-driven model of the CDR is implemented in Simulink and
simulated at behavioral level. This simulation generates two sets of test-vectors: the input
vectors that excite the CDR, and the output vectors that the CDR produces in response
to the input vectors. Then, the CDR is implemented in RTL, and this RTL is simulated
at behavioral level in Verilog. The Verilog simulation excites the CDR using the input
test-vectors generated in Simulink, and produces the output test-vectors. Finally, the output
test-vectors from Verilog and Simulink are compared with each other. If some discrepancies
are found between the two sets of output vectors, the CDR RTL is passed through a cycle
of corrections and further verifications. When the two sets of output vectors are identical
3.6. Simulation and Measurement Results 54
CDR Model in Simulink
Behavioral CDR Simulation in Simulink
Input Test-Vectors
Output Test-Vectors
CDR Implementationin RTL
Behavioral CDR Simulation in Verilog
Output Test-Vectors
RTL Corrections Vectors Match?No Yes
CDR Circuit Implementation
Figure 3.15: Simplified design flow of the proposed feed-forward CDR.
Table 3.2: Test-chip parameters.
Process 65 nm CMOSData rate 5 Gb/sSupply 1.2 VPower 178.4 mW
Receiver area 0.51 mm2
CDR is ready for the circuit-level implementation. Figure 3.16 presents the die photo of
the fabricated receiver. The ADC, frequency divider and the 2:8 portion of 2:32 DeMUX
are analog custom-designed blocks. The 8:32 portion of 2:32 DeMUX, the FFE, CDR
and the test structures (PRBS comparator and test register) are all synthesized. Table 3.2
summarizes the test-chip parameters.
The receiver test-chip uses the ADC and FFE similar to those presented in [18]. The
ADC consists of two time-interleaved 5 Gb/s 5-bit interpolating flash ADCs to achieve the
total sampling rate of 10 GS/s. To reduce the receiver input loading, the ADC evaluates
3.6. Simulation and Measurement Results 55
250 µm
1900µm
ADC &
2:8 DeMUX
8:32
Test Register
PRBS Comparator
CDR
DeMUX (80 µm)
610µm
600 µm
450µmAnalog
Front-End
(Custom Layout)
Digital
Modules
(Synthesized)
FFE
270µm
Freq. Div.
Figure 3.16: Test-chip die photograph.
four most significant bits (MSBs) using 17 comparators at the front end, and resistively
interpolates the least significant bit (LSB) to achieve the 5-bit resolution. The ADC has a
measured ENOB of 4.2 bit and a power consumption of 110 mW. After the signal samples
are DeMUXed, the samples are compensated for the channel loss using a half-UI-spaced
2-tap FIR filter as an FFE. The filter tap coefficients are programmable through a serial
shift register. The FFE compensates for up to 15 dB of channel attenuation at 2.5 GHz. It
was experimentally shown in [18] that the feed-forward CDR architecture can be used with
an adaptive FFE using a constant modulus algorithm (CMA) for adjusting the tap weights.
Since both unequalized and equalized samples are available in the digital domain, the CMA
adaptation circuits in [18] were implemented entirely in the digital domain.
Figure 3.17 presents the measured jitter tolerance of the fabricated receiver. In these
measurements, a 27 − 1 PRBS sequence running at 5 Gb/s was used as the data source.
The channel attenuation is 10 dB at 2.5 GHz. The transmitter pre-emphasis is 3 dB with
the launch amplitude of 750 mVPP. The receiver is triggered with a blind sampling clock
that has 760 ps random jitter (RMS) and -128 dBc/Hz phase noise at 1 MHz offset. The
measured jitter tolerance was recorded at BER≤ 10−12. For a comparison between the
simulated and measured results, Figure 3.17 includes a simulated jitter tolerance with the
same data source and channel loss. The measured and simulated jitter tolerances closely
match each other. Table 3.3 lists the jitter tolerance measurement and simulation conditions.
3.6. Simulation and Measurement Results 56
0.1
1
10
105 106 107 108
Jitter Amplitude, UI pp
Jitter Frequency, Hz
Simulated, BER ≤ 5·10–6
Measured, BER ≤ 10–12
Figure 3.17: Measured jitter tolerance.
Table 3.3: Jitter tolerance measurement and simulation conditions (in Figure 3.17).
Input 5 Gb/s, 27−1 PRBSChannel loss 10 dB at 2.5 GHzTSIM(sim) 2 ·105 UIsBER (sim) ≤ 5 ·10−6
BER (meas) ≤ 10−12
TX pre-emphasis 3 dBTx-Rx ∆fCLK 0
SSC freq. modulation 0Tx-Rx ∆fCLK (with SSC) 0
Tx RJ (sim) 0 UIPP
Tx DJ (sim) 0.05 UIPP (dual-Dirac)Rx RJ (sim) 0.25 UIPP (Gaussian)Rx DJ (sim) 0.05 UIPP (dual-Dirac)
3.7. Summary 57
The receiver consumes 178.4 mW at 5 Gb/s, including the ADC. The entire receiver
occupies the chip area of 0.51 mm2 (test structures excluded).
3.7 Summary
This chapter presented the proposed blind-sampling ADC-based feed-forward CDR archi-
tecture. In this architecture, the ADC samples the received signal blindly at twice the
baud-rate. The blind sampling allows removing the phase-tracking feedback loop from the
CDR, thus simplifying the receiver architecture. The CDR recovers the data phase directly
from digital signal samples in a feed-forward manner, hence eliminating the need for a
digital interpolating feedback loop. This feed-forward topology reduces the CDR circuit
complexity compared to the previously reported blind-sampling interpolating CDRs.
The feed-forward CDR architecture was fabricated in a test-chip receiver in 65 nm
CMOS. The test-chip successfully recovers data at 5 Gb/s in the presence of channel at-
tenuation of 2.5 dB at 2.5 GHz. The receiver occupies 0.51 mm2 of die area and consumes
178.4 mW of power. The CDR simulations and test-chip measurements confirm that the
proposed architecture is suitable for high-speed serial links.
In the presented receiver, the sampling ADC consumes a significant portion of the
receiver power. To reduce the ADC power, the following chapter proposes a fractional sam-
pling rate (FSR) CDR architecture, which reduces the ADC conversion rate by sampling
the received signal at 1.45x the baud rate (instead of 2x). The CDR then recovers the data
from the signal samples using a feed-forward topology.
Chapter 4
A Fractional-Sampling-Rate CDR Architecture
THE FEED-FORWARD CDR architecture presented in the previous chapter reduces
the circuit complexity of the ADC-based receivers at the cost of high sampling rate.
In this architecture, the ADC samples the received signal blindly at 2x the baud rate [17,18],
which is 2x higher compared to the sampling rate in the phase-tracking ADC-based re-
ceivers with the Mueller-Muller CDR architecture [4, 5, 38]. This high sampling rate leads
to high ADC power consumption and area in the feed-forward ADC-based receivers. To
reduce the ADC power and area, this chapter proposes a fractional-sampling-rate (FSR)
feed-forward CDR architecture for the blind-sampling ADC-based receivers. In this archi-
tecture, the ADC samples the received signal blindly at a fractional rate between 2x and
1x, thus reducing the ADC conversion rate below 2x while maintaining a low circuit com-
plexity of the receiver. The proposed CDR then recovers the phase and data from the blind
fractionally-spaced samples using a feed-forward topology. To validate the proposed CDR
architecture, a 6.875 Gb/s 1.45x ADC-based receiver with FSR CDR was implemented and
characterized in 65 nm CMOS [19]. This chapter first introduces the proposed FSR CDR
architecture, and then presents the implementation of the CDR building blocks.
The remainder of this chapter is organized into eight sections. First, Section 4.1 intro-
duces the concept of fractional sampling rate for the feed-forward CDRs and presents an
ADC-based receiver with FSR CDR. Then, Sections 4.2 – 4.6 present the implementation
details of the FSR CDR. Section 4.2 proposes two phase detection schemes for the blind
sampling at fractional rates. Sections 4.3 and 4.4 describe the phase-recovery filter and the
data-decision scheme used in the FSR CDR. Section 4.5 presents two vector compaction
58
4.1. Fractional-Sampling-Rate CDR Architecture 59
(a) Sampling at 2x
(b) Sampling at 1.45x
Figure 4.1: Sampling rates in feed-forward CDR architectures.
schemes that restore the correspondence between the number of samples and the number
of data bits with fractional sampling rates. Section 4.6 describes the data retiming scheme
used in the proposed CDR. Next, Section 4.7 validates the FSR CDR architecture through
the simulations and measurements of a receiver test-chip. Finally, Section 4.8 concludes
this chapter with a summary.
4.1 Fractional-Sampling-Rate CDR Architecture
To introduce the concept of fractional sampling rate, Figure 4.1 illustrates sampling the
received signal at two different rates: at 2x the baud rate and at 1.45x the baud rate. With
sampling at an integer rate of 2x, shown in Figure 4.1(a), every UI is sampled twice. This
sampling rate is typical for the blind-sampling ADC-based receivers [39, 40]. Sampling at
2x requires the ADC conversion rate to exceed the baud rate by twofold, which leads to
high ADC power and area. Figure 4.1(b) illustrates the concept of a fractional sampling
rate through an example of sampling at 1.45x the baud rate. With this rate, some UIs are
sampled once while other UIs are sampled twice for an average rate of 1.45x. This reduc-
tion of the sampling rate allows to reduce the ADC power and area by 27.3 % compared to
sampling at 2x. The remainder of this section proposes a CDR architecture that recovers the
data from the ADC samples taken blindly at a fractional sampling rate. The rate of 1.45x is
used as an example in the presentation of the proposed FSR CDR architecture, while this
architecture is applicable to a wide range of sampling rates between 2x and 1x.
Figure 4.2 presents a block-diagram of a 6.875 Gb/s 1.45x ADC-based receiver with the
4.1. Fractional-Sampling-Rate CDR Architecture 60
RXIN
6.875 Gb/s
5 GHz BlindSampling Clock
Digital CDR
4 ph.
φERR
5-bit2.5 GS/s
ADC
PD16
Filter
DataDecision FIFO
φX
φAVG
11DTREC16
φAVGS
RetimingClock
16
2 4
DataCompactor
1610 GS/s
4:16
Figure 4.2: Receiver with the proposed fractional-sampling-rate CDR architecture.
proposed FSR CDR architecture. A 4-way time-interleaved ADC samples the 6.875 Gb/s
received signal, RXIN , blindly at 10 GS/s for the sampling rate of 1.45x. A 4:16 DeMUX
then feeds 16 samples at a time into the digital CDR for the data recovery. These 16 sam-
ples correspond to an 11-UI interval, or frame. The ADC and the DeMUX are triggered
by a 5 GHz blind sampling clock divided by 2. This clock is further divided by 4 to trigger
the digital CDR at 625 MHz. The digital CDR uses the feed-forward topology presented
in Chapter 3. The CDR consists of a phase-recovery path and a data-recovery path. In the
phase path, first, a phase detector, PD, uses the digital samples, S, to estimate the instan-
taneous zero-crossing phase, φX . Then, a phase subtracter and a filter recover the average
zero-crossing phase, φAV G, from φX . The data-recovery path consists of three blocks: a data
decision, a data compactor and a FIFO. The data path uses the recovered phase values φX
and φAV G to recover the data from the digital samples of the received signal. Since the 16
samples at the input of the data path correspond to 11 UIs, 5 of these 11 UIs are sampled
twice, or equivalently, they have duplicate samples. First, the data decision block picks
11 data bits among the 16 samples and marks the remaining 5 samples as duplicates. To
mark the samples as decision bits or duplicates, the decision block assigns a binary flag
to every sample, and hence the block outputs 16 sliced samples along with 16 flags. The
data compactor then removes the duplicate samples to reduce the recovered data to 11 bits.
Finally, the FIFO retimes the decision bits from the sampling clock domain to a retiming
clock and outputs 16 recovered data bits, DTREC, at a time. The FIFO also compensates the
data bits for frequency offsets between the transmitter and receiver. To simplify the CDR
4.2. Phase-Detection Schemes 61
simulations and measurements, the baud-rate clock (divided by 16) is used as a retiming
clock to assure that the data rates at the CDR input and output are identical.
In the remainder of this chapter, first Sections 4.2 – 4.6 present the implementation of
the CDR’s blocks, and then Section 4.7 validates the proposed CDR architecture through
the simulations and measurements of a test-chip receiver with the FSR CDR.
4.2 Phase-Detection Schemes
The goal of the phase-detection scheme is to estimate the instantaneous zero-crossing phase
from the digital samples of the received signal. In the blind-sampling receivers, this instan-
taneous data phase is typically expressed with respect to one of the sampling phases of the
blind clock in terms of unit intervals. With an integer sampling rate, the sampling phases
repeat every UI. In the example of sampling at 2x the baud rate, presented in Chapter 3, the
sampling phases are referred to as 0° and 180°, and all the phase values are expressed with
respect to the 0° phase of the sampling clock. With a fractional sampling rate, in contrast,
the sampling phase changes from one UI to another. This variable sampling phase poses
the primary challenge in the phase detection in the FSR receivers.
Sections 4.2.1 and 4.2.2 propose two alternative phase detection schemes for the FSR
CDRs: an eye-based and a transition-based schemes. The eye-based phase detection scheme
first uses the variable sampling phase to accumulate the eye diagram of the receiver signal,
and then estimates the data phase from the accumulated eye. The transition-based phase de-
tection scheme takes an alternative approach: it linearly estimates the data phase for every
transition, and then adjusts the estimated phase values to account for the variable sampling
phase.
4.2.1 Eye-Based Phase Detector
The eye-based PD extracts the instantaneous data phase from the eye diagram of the re-
ceived signal. To accumulate the eye diagram, the PD needs to keep track of the sampling
phases for every sample. In general, it is possible to calculate the sampling phases knowing
the sampling rate or, equivalently, the sampling interval — the time between two adjacent
samples in terms of UI. This calculation of the sampling phases, however, requires circuit
resources to perform the calculations, increasing the power and area of the PD. In order to
4.2. Phase-Detection Schemes 62
6 11109875432UI: 1
S1
S2
S3S16
UI: 1 – 11
16 samples
11 UIs≈ 1.45x
Figure 4.3: Eye diagram accumulation with fractional sampling rate.
reduce the power and area, instead of calculating the sampling phases, the proposed FSR
CDR restricts the sampling rates such that the sampling phases become periodic, which
causes a repetition of the sampling phases after a known number of UIs.
Figure 4.3 illustrates the proposed sampling at a fractional rate, and the accumulation of
the eye diagram for the phase detection. The received signal is sampled such that an integer
number of samples fall into an integer number of UIs. In the example of Figure 4.3, 16
samples fall into 11 UIs for the sampling rate of 16/11≈1.45x. Since the sampling interval,
SI, is 11/16 = 0.6875 UI long, the sampling phase changes from one UI to another. In fact,
the sampling phase sweeps the UI, and folding the samples into a single UI reveals the eye
diagram of the received signal, as shown at the bottom of Figure 4.3.
The sampling rate of 16/11 causes the sampling phase to repeat every 16 samples or,
equivalently, every 11 UIs. In fact, the sampling phase for every sample can be calculated
using a simple relation:
Sampling Phase, UI = mod16 [11 · (Sample Number−1)] . (4.1)
Table 4.1 lists the sampling phases for the 16 samples that follow from (4.1). The sampling
phases are expressed with respect to the first sample, S1, whose phase is 0. Since the frac-
tion 16/11 cannot be simplified, all 16 samples in Table 4.1 have unique sampling phases
that span the entire UI. The sampling phase repetition period of 16 samples matches the
number of DeMUX channels in the receiver (see Figure 4.2), and therefore every DeMUX
channel has a constant sampling phase, or time stamp, associated with it. This correspon-
4.2. Phase-Detection Schemes 63
Table 4.1: Sampling phases for the sampling rate of 16/
11 ≈ 1.45x.
Sample Phase, UI Sample Phase, UI1 0
/16 9 8
/16
2 11/
16 10 3/
163 6
/16 11 14
/16
4 1/
16 12 9/
165 12
/16 13 4
/16
6 7/
16 14 15/
167 2
/16 15 10
/16
8 13/
16 16 5/
16
Figure 4.4: Phase detection from the eye diagram.
dence between the DeMUX channels and the sampling phases allows accumulating the eye
diagram with no phase calculations. In fact, every DeMUX channel maps to a vertical slice
of the eye diagram at a constant position, and hence the eye diagram can be constructed by
routing the samples from every DeMUX channel to its corresponding eye diagram slice.
To estimate the instantaneous data phase from the accumulated eye diagram, the PD
uses fictitious transitions extracted from the eye diagram. Figure 4.4 illustrates the process
of forming these fictitious transitions through four sample transitions. First, the PD divides
the samples into those belonging to ‘1–0’ transitions, and those belonging to ‘0–1’ transi-
tions. Then, the PD joins all the positive samples with all the negative samples belonging to
the same branch, thus forming the fictitious transitions from the samples accumulated in the
eye diagram. Figure 4.4 shows two examples of ‘1–0’ transitions and two examples of ‘0–
4.2. Phase-Detection Schemes 64
Average-transition-slope calculator
Data-phasecalculator
16
φX[1:16]
(|Si| + |Si+1|)AVG16
1
Digital Samples
S[1:16]
Transition Flag [1:16]
Figure 4.5: Simplified block-diagram of the transition-based phase detector.
1’ transitions, while the total number of fictitious transitions is larger. To assure that these
transitions approximate the actual zero crossing in the eye diagram, the samples forming
these fictitious transitions are restricted to be at most 0.5 UI apart in the eye diagram. Next,
the PD linearly estimates the zero-crossing phase for all the fictitious transitions with a 2-bit
accuracy using the method presented in Chapter 3. Finally, the PD averages the resulting
phase values to output a single instantaneous phase value for the frame of 16 samples.
Behavioral simulations confirm that the proposed eye-based phase detection scheme
leads to a successful phase recovery in an FSR CDR. The implementation of the eye-
based PD, however, requires a large number of phase calculators since the number of fic-
titious transitions exceeds the number of actual transitions and these fictitious transitions
are spread across the eye diagram. This high complexity of the eye-based phase detection
scheme motivates the low-complexity transition-based phase detection scheme proposed in
the following section.
4.2.2 Transition-Based Phase Detector
The transition-based phase detection scheme estimates the instantaneous data phase from
the actual, rather than fictitious, transitions in the received signal. The scheme then adjusts
the estimated phase values such that they are expressed in terms of UIs with respect to a
common reference.
Figure 4.5 presents a simplified block-diagram of the transition-based PD, which con-
sists of an average-transition-slope calculator and a data-phase calculator. The PD takes 16
digital samples, S, as the input and generates 16 instantaneous data phase values, φX , as the
output. With the fractional sampling rate of 1.45x, the 16 samples correspond to 11 UIs,
and there are fewer than 16 transitions among the 16 samples. To mark the actual, or valid,
transition, the PD generates 16 transition flags. The transition flags corresponding to the
valid transitions are set to ‘1’, while the flags corresponding to no transitions are set to
4.2. Phase-Detection Schemes 65
Si
Si+1
Si+2
(a) Case 1: compare Si with Si+2
VTH
Si
Si+1
-VTH
Si+2
Si+3
Si+4
(b) Case 2: compare samples with VT H
Figure 4.6: Selection of transitions leading to low-error phase detection.
‘0’. The data-phase calculator linearly estimates φX for every pair of adjacent samples with
opposite polarities using the slopes of the transitions between the samples. However, due
to the FSR, the time between two adjacent samples exceeds 0.5 UI, and the sampling phase
changes from one UI to another. As a consequence, some transition slopes lead more accu-
rate estimates of φX , while other slopes lead to less accurate estimates of φX . To improve
the phase detection accuracy, the average-transition-slope calculator recovers the average
value of the slopes leading to the more accurate estimates of φX . The data-phase calcula-
tor then uses the recovered average slope to improve the accuracy of the phase estimation.
The remainder of this section first presents the average-transition-slope calculator and then
describes the data-phase calculator.
Figure 4.6 illustrates through two cases the distinction of the transition slopes into two
categories: those leading to more accurate phase detection (shown as solid bold lines),
and those leading to less accurate phase detection (shown as dashed lines). The average-
transition-slope calculator makes the distinction between the slopes using the amplitudes of
the samples. In the first case, shown in Figure 4.6(a), two transitions are adjacent to sample
Si+1. Since the slope of the phase-estimation line cannot exceed the actual transition slope,
the transition with the larger slope leads to a more accurate phase estimation (solid bold
4.2. Phase-Detection Schemes 66
SumFlag1
SumFlag2
SumFlag16
A ( |Si|+|Si+1| )AVG
|S1|+|S2|
|S2|+|S3|
|S16|+|S17|
Figure 4.7: Average-slope-recovery filter.
line), while the other transition leads to a lower accuracy. To select the larger of the two
slopes, it is sufficient to compare the amplitude of Si with the amplitude of Si+2. In the
second case, shown in Figure 4.6(b), this comparison of two adjacent slopes is not possible
since the two slopes are several samples apart from each other. In this case, the average
slope calculator compares the samples’ magnitudes with a threshold level, VT H , which is
also recovered from the samples’ magnitudes. A transition leads to a more accurate phase
estimation when both its samples exceed the threshold in magnitude (solid bold line). In
contrast, when one of the samples has a magnitude below the threshold level, the transi-
tion is likely to lead to a less accurate phase estimation (dashed line). The average-slope
calculator then recovers the average value of the slopes shown with the solid bold lines in
Figure 4.6.
Figure 4.7 demonstrates the implementation of the average-slope-recovery filter. Since
the time between two adjacent samples is constant (it equals one sampling interval), the
sum of magnitudes of two samples carries the same information as the actual slope of the
line between these two samples. Therefore, the filter takes the sums of magnitudes of the
samples at the input, |Si|+ |Si+1|, along with the binary sum flags, SumFlagi, that mark
the transitions contributing to the average. The filter outputs the average value of the sums,
(|Si|+ |Si+1|)AV G. The filter is of feedback type, and it resembles a simple phase-recovery
loop. First, the subtracters calculate the errors between the instantaneous sums and the
average sum. The binary sum flags then multiply the error values such that only the errors
with their flags set to ‘1’ contribute to the average. These multipliers are implemented as
bit-wise AND between the flags and error values. Next, the sum errors are combined into
a single value that is scaled by gain A. Finally, a discrete-time integrator, consisting of an
4.2. Phase-Detection Schemes 67
Si
Si+1
Si+2
Average low-phase-error slope
Figure 4.8: Reduction of phase-detection error using average transition slope.
5
-2
0
Si+1
Si
φZC i
t
00 01 10 11
SI = 11/16 UI
X
Figure 4.9: Linear estimation of instantaneous zero-crossing phase, φZC.
adder and a register, uses the error values to recover the average sum, (|Si|+ |Si+1|)AV G,
thus closing the feedback loop. Similar to the sample values, the recovered average slope is
a 5-bit value. The data-phase calculator then uses this average transition slope to improve
the phase-detection accuracy according to the scheme presented next.
Figure 4.8 illustrates a method of improving the phase-estimation accuracy for the high-
error transitions. For this type of transitions, shown as a dashed line in the figure, instead of
the actual slope, the phase calculator uses the average transition slope, shown as a solid line.
This phase-estimation line is defined by the average slope and one sample that is closed to
the zero-crossing in magnitude, Si+2 in the example of Figure 4.8.
The data-phase calculator uses either the actual or the average transition slope to define
the line for the phase estimation. The linear phase-estimation scheme, shown in Figure 4.9,
is similar in principle to the scheme used in the 2x feed-forward CDR architecture presented
in Chapter 3. Figure 4.9 illustrates an example of linear phase detection using a line defined
by two samples, Si and Si+1. The PD calculates the phase of the zero-crossing point, X ,
4.2. Phase-Detection Schemes 68
TSi + SI·1/8
TSi + SI·3/8
TSi + SI·5/8
TSi + SI·7/8
00
01
10
11
φX i, UI
φZC i, SI
Figure 4.10: Selector converting phase values from sampling intervals to unit intervals.
using:
φZC i =|Si|
|Si|+ |Si+1| . (4.2)
Since the time between two adjacent samples is 1 SI = 0.6875 UI, φZC i expresses the data
phase as a proportion of SI with respect to sample Si. The changing sampling phase due
to the FSR makes φZC i from different sampling intervals inconsistent with each other. In
order to use the instantaneous phase values, φZC i needs to be expressed in terms of UIs with
respect to a common reference. To obtain this converted instantaneous phase, φX i, φZC i is
scaled by SI and offset by the sampling phase, or the time stamp, T Si, of every sample:
φX i = T Si +SI ·φZC i, (4.3)
Since the sampling phases are defined with respect to the first sample for every 16 samples
(see Table 4.1), φX i is referenced to the same first sample. A modulo-1 addition in (4.3)
confines φX i to the range [0, 1) UI.
To maintain a low circuits complexity of PD, φZC i is calculated with a 2-bit accuracy,
which allows to replace the division operation with few additions and subtractions (see
Figure 3.5). Note that replacing the instantaneous sum in the denominator with the average
sum (slope) allows to use the same phase calculator to estimate the data phase for the high-
phase-error transitions that require the slope substitution.
The conversion of the phase values from SI to UI in (4.3) requires one multiplier and
one adder per sample. To reduce the complexity of the conversion operation, the PD makes
use of the low accuracy of φZC i, which is a 2-bit value. Since φZC i takes one of four
values, its corresponding φX i also has four possible values. Hence, instead of calculating
φX i, the PD uses a 4-to-1 selector to performs the conversion as shown in Figure 4.10. The
4.3. Phase-Recovery Filter 69
16φAVGφX
φERR K1z-1
1 – z-1
K2z-1
1 – z-1
K3z-1
1 – z-1
phase subtracter
16
3rd order low-pass filter
Transition Flag
Figure 4.11: Phase recovery filter.
selector takes φZC i as the control input and picks one of four values as the output φX i. The
four possible values of φX i are constants and they are pre-calculated in advance. Once the
sampling rate is selected for the receiver, the sampling phases, T Si, are constant for every
sample (Table 4.1 lists the sampling phases for sampling at 1.45x). The sampling interval,
SI, is also a constant. The four possible values for φZC i are 1/8, 3/8, 5/8 and 7/8, which
correspond to the middles of the four quantization bins shown at the bottom of Figure 4.9.
Since SI = 0.6875 UI and the linear phase estimation is 2-bit accurate, the effective PD
resolution is 2.54 bits/UI.
With the instantaneous phase values, φX , expressed in UI with a common reference for
all sampling intervals, the CDR recovers the average data phase using a filter presented in
the following section.
4.3 Phase-Recovery Filter
The filter averages the instantaneous phase, φX , into φAV G that tracks the data phase in the
average sense. Figure 4.11 shows a simplified diagram of the phase-recovery filter used in
the proposed FSR CDR. The filter consists of a phase subtracter and a discrete-time low-
pass filter in a feedback configuration. The filter topology is similar to the topology used in
the 2x feed-forward CDR architecture of Chapter 3. The filter is presented in Section 3.3,
while this section highlights the filter details specific to the FSR CDR.
Due to sampling at 1.45x, among the 16 samples, there are at most 11 valid zero cross-
ings. The binary transition flags corresponding to these valid transitions are set to ‘1’. To
assure that only the actual crossings contribute to the recovery of φAV G, the phase subtracter
uses the transition flags to pass only the valid phase error values, φERR, to the low-pass filter.
4.4. Data-Decision Scheme 70�� � ��������� ���� ����� ����� � � �� �� �������������� ������� ����������� ������� ����������� �����Figure 4.12: Phase subtracter.
Figure 4.12 illustrates the operation of the phase subtracter in the FSR CDR. First,
the phase subtracter calculates the difference between the instantaneous and average data
phases, φX i and φAV G. Then the phase differences are multiplied by their corresponding
binary transition flags such that only the valid transitions contribute to the average phase
recovery. In the binary domain, this multiplication by the transition flag is implemented as a
bit-wise AND operation of φX i and its corresponding flag. Next, modulo blocks confine the
values of the phase errors, φERRi to the range of [-0.5, 0.5) UI. Finally, the 16 phase errors,
φERRi, are combined to output a single phase error, φERR, for the frame of 16 samples.
The low-pass filter averages φERR to calculate φAV G, as shown in Figure 4.11. The
instantaneous and average phase values allow the CDR to recover the data bits from the
samples of the received signal according to the scheme presented in the next section.
4.4 Data-Decision Scheme
The data-decision scheme in the FSR CDR first detects the number of samples in every
UI, and then for the UIs with two samples the scheme picks one sample as the decision
bit while marking the other sample as the duplicate. Due to the fractional sampling rate
the number of samples exceeds the number of UIs: some UIs are sampled twice, while
others once. With the blind sampling, it is unknown a priori which UIs are sampled twice,
and therefore the decision block needs to find the UIs with duplicate samples. For the UIs
sampled once, the decision is trivial. In contrast, for the UIs sampled twice, the decision
block picks the sample that is closer to the UI center as the decision. The remainder of
4.4. Data-Decision Scheme 71���� ���(a) One sample per UI���� ���
(b) Two samples per UI
Figure 4.13: Detecting number of samples per UI (jitter-free case).
this section first presents the method of detecting the number of samples per UI, and then
describes the selection of the decision bit for the UIs with two samples.
Figure 4.13 illustrates the detection of the number of samples per UI through a jitter-
free example. For this detection, the decision scheme relies on φAV G, which coincides
with φX in the jitter-free case. The scheme counts the number of samples within a 1-UI
window from φAV G using the values of the sampling phases for every sample. The figure
highlights this 1-UI window with a grey outline, and it shows the UI of interest as a shaded
UI. When a UI is sampled once, as shown in Figure 4.13(a), the decision scheme takes the
sign of the sample as the decision bit. In contrast, when a UI is sampled twice, as shown in
Figure 4.13(b), the decision scheme picks one of the samples as the decision and marks the
other sample as a duplicate according to the algorithm described next.
Figure 4.14 presents the data decision scheme for the case of two samples per UI. The
goal of the decision block is to pick the sample that is closer to the center of the UI. To
find the UI center, the average UI-center phase, φPICK , is calculated by adding 0.5 UI to
the average zero-crossing phase, φAV G, using modulo-1 addition. In the jitter-free case of
Figure 4.14(a), the two samples adjacent to φPICK , samples A and B, have identical signs and
therefore either sample can be selected as the decision, while the other sample is marked as
a duplicate. Jitter causes the instantaneous phase to deviate from the average phase, and in
fact may cause a transition between the two samples that nominally belong to the same UI,
as shown in Figures 4.14(b) and 4.14(c). In this case, the decision block picks the sample
4.4. Data-Decision Scheme 72������� �� ��� ������(a) Jitter-free case���� ��� ���� �� � ���� � ������ �
(b) Jitter example 1������� ���� �� � ���� ������ ��(c) Jitter example 2
Figure 4.14: Data decision in the presence of jitter.
belonging to the same shaded UI to which φPICK points. The data-picking then reduces to
a comparison between φPICK and φX . If φX is larger than φPICK , as shown in Figure 4.14(b),
then the scheme chooses the sign of A as the decision bit. Conversely, if φX is smaller than
φPICK , as shown in Figure 4.14(c), the scheme chooses the sign of B. The remaining sample
in both cases is marked as a duplicate for the subsequent removal from the data vector.
The proposed data-decision scheme recovers error-free data as long as the cycle-to-
cycle jitter remains within the scheme’s limit, which depends on the sampling rate. The
worst case jitter conditions occur when two samples fall in a UI and the samples are cen-
tered in the UI, i.e., they are equidistant from the UI edges (see Figure 4.14(a)). In the
worst case, a cycle-to-cycle jitter of (1UI−1SI) causes the UI edges to move towards each
other such that none of the samples fall in the UI. In the example of sampling at 1.45x, this
maximum tolerance is (1UI−0.6875UI) = 0.3125UIPP. This estimated tolerance to the
cycle-to-cycle jitter is confirmed through simulations in Section 4.7.
4.5. Data Compaction Schemes 73
To assure that the recovered data contains only the valid data bits, the CDR removes
the duplicates marked by the decision block using a data compactor, which is presented in
the following section.
4.5 Data Compaction Schemes
In contrast with an integer sampling rate, the fractional sampling rate leads to a variable
number of samples per UI, causing duplicate samples in some UIs. Since the location of
the duplicates among the recovered data bits is unknown a priory, the CDR requires a
mechanism for removing the duplicates in order to avoid data recovery errors. Two data
compaction schemes are proposed here for the FSR CDR: a shift-register scheme in Sec-
tion 4.5.1 and a selector-array scheme in Section 4.5.2.
4.5.1 Shift-Register Data Compactor
Figure 4.15 presents a simplified diagram and a signal-flow graph of a sample 5-to-3 shift-
register data compactor. The compactor takes 5 sliced samples, Si, along with their valid
flags, V Fi, at the input on the left, removes two duplicate samples, S2 and S4, and outputs
on the right 3 data bits free of duplicates, Di, as shown in Figure 4.15(a). The compactor
consists of three registers and two sets of 2-to-1 selectors between them. As the data bits
and their flags are shifted from left to right, the selectors eliminate the duplicates (shaded
in the figure) one at a time. The valid flags, V Fi, are set to ‘1’ for the valid bits and to ‘0’ for
the duplicates. These flags guide the conditional selectors to remove the duplicates from
the data vector. The bold lines in the diagram highlight the path of the valid data bits.
Behavioral simulations confirm that the shift-register data compactor removes the du-
plicate samples leading to an error-free data recovery. This method of duplicate removal,
however, comes at the cost of large latency and multiple registers. In fact, the signal-flow
graph in Figure 4.15(b) shows that the number of latency cycles in this compactor equals
the number of duplicates to be removed. With sampling at 1.45x, among the 16 samples,
there are 11 valid bits and 5 duplicates, which requires 5 cycles. Every cycle in the pro-
posed CDR corresponds to 11 UIs, leading to the effective latency of 55 UIs. Every latency
cycle corresponds to a register stage in the data compactor. With 5 duplicates to remove,
the compactor requires at least 5 registers, with each register containing between 11 and 16
4.5. Data Compaction Schemes 74
VF1
VF2
VF3
VF4
VF5
D1
D2
D3
1
1
x
0
0
1
x
0
1
1
1
1
0
1
x
0
1
1
1
1
0
1
1
1
S1
S2
S3
S4
S5
Stage 1 Stage 2
(a) Simplified block-diagram
Stage 1 Stage 2 3 bits total: 3 valid 0 duplicates
5 bits total: 3 valid 2 duplicates
4 bits total: 3 valid 1 duplicate
Data Compactor
(b) Signal-flow graph
Figure 4.15: Shift-register data compactor.
positions for the data bit and its flag. This large number of registers causes high power con-
sumption and large area in the shift-register compactor. In an attempt to reduce the power,
area and latency of the data compaction, the following section proposes an alternative data
compaction scheme.
4.5.2 Selector-Array Data Compactor
Figure 4.16 presents a simplified diagram of the selector-array data compactor. The com-
pactor takes the inputs on the left and it outputs the duplicate-free data vector at the bottom.
To route valid bits from the input to output, the compactor consists of an array of condi-
tional selectors, which pass the bits in one of two directions: either from the left to bottom,
4.5. Data Compaction Schemes 75
VF1
VF2
VF3
VF4
VF16
D1 D3D2 D11
0
1
x
0
0
1
1
1
1 0 0
1
1
1
S1
S2
S3
S4
S16
01
DTIN TOP ENIN TOP
ENOUT RIGHT
DTIN LEFT
ENIN LEFT
DTOUT BOT ENOUT BOT
Figure 4.16: Shift-register data compactor.
or from the top to bottom. The valid flags, V Fi, control the direction of passing the data
bits such that the duplicates (shaded in the figure) are eliminated. In the diagram, the bold
lines highlight the paths of the valid bits.
The inset in Figure 4.16 depicts the circuit diagram of a conditional selector cell used
in the vector compactor. The cell consists of a data selector and three logic gates. The
selector passes either DTIN TOP or DTIN LEFT to DTOUT BOT according to the values of the
enable flags ENIN TOP and ENIN LEFT . The cell also updates the enable flags, ENOUT RIGHT
and ENOUT BOT , for its surrounding cells according to the truth table shown in Table 4.2.
The cell passes the data from the left to bottom only when both ENIN TOP and ENIN LEFT
are set to ‘1’, which also makes the cells in the remainders of the row and column to pass
the data only from the top to bottom. As a result, the bits with their V Fi set to ‘0’ never
propagate to the output. With this method of data compaction, it takes one cycle to remove
the duplicates regardless of the number of duplicates.
Nominally, the array consists of 16 rows and 11 columns for the total of 176 cells.
An observation that the first few inputs must correspond to the first few outputs allows to
eliminate some cells at the top right part of the array. In a similar manner, some of the cells
at the bottom left part of the array can be replaced with wires, since these cells always pass
the data bits from the top to bottom. Following this reasoning, the compactor is reduced to
4.6. Data Retiming Scheme 76
Table 4.2: Conditional selector truth table.
ENIN TOP ENIN LEFT ENOUT BOT ENOUT RIGHT DTOUT BOT
0 0 0 0 DTIN TOP
0 1 0 1 DTIN TOP
1 0 1 0 DTIN TOP
1 1 0 0 DTIN LEFT
Write
Port
Read
Port
10...12
bits
16
bits
sampling rate
16625 MHz =
data rate
16≈ 430 MHz
Figure 4.17: Simplified FIFO diagram.
33 cells located close to the diagonal of the array. In addition to reducing the data compactor
power and area, this small number of the cells allows to remove the duplicates in a single
cycle, thus reducing the compactor latency to 11 UIs, which is 5x smaller compared to the
shift-register data compactor latency.
The compacted data vector, as most of the digital CDR, is in the sampling clock domain,
which has a fractional rate with respect to the baud rate. To simplify the verification of
the proposed CDR architecture, the recovered data is retimed according to the scheme
presented next.
4.6 Data Retiming Scheme
The data retiming scheme in the FSR CDR is similar to the retiming scheme in the 2x feed-
forward CDR architecture presented in Section 3.5. This section only highlights the aspects
of the retiming scheme that are specific to the FSR CDR architecture. Figure 4.17 shows a
simplified digram of the FIFO that retimes the data bits from the fractional sampling clock
4.7. Simulation and Measurement Results 77
domain to the integer baud-rate clock domain. The FIFO is a circular register with a write
port and a read port. The write port (shaded in the diagram) places 10 to 12 data bits into
the FIFO and it is synchronized to the divided sampling clock running at 625 MHz. The
variable number of bits at the write port allows the FIFO to compensate for the frequency
offset between the transmitter and receiver, which is inevitable in blind-sampling receivers.
The read port is triggered by a divided baud-rate clock at approximately 430 MHz to remove
16 data bits at a time from the FIFO. In contrast with the vector compactor that removes
the duplicate samples, the FIFO only re-arranges the data for the purpose of retiming it to
a convenient clock domain.
The simulation and measurement results of a receiver with the FSR CDR, presented in
the following section, confirm that the FIFO successfully retimes the data from a fractional
to an integer rate clock domain, and it compensates for the frequency offsets between the
transmitter and receiver.
4.7 Simulation and Measurement Results
In order to validate the proposed FSR CDR architecture, the CDR was first simulated on
behavioral level with various sampling rates. Then, a receiver test-chip with a 1.45x FSR
CDR, shown in Figure 4.2, was fabricated and characterized in 65 nm CMOS. The FSR
CDR uses the transition-based phase detector and the selector-array vector compactor.
An event-driven approach [48] was used to build a behavioral model of the FSR CDR
in Simulink. To explore the effect of the sampling rate on the jitter tolerance, the CDR
was simulated with four different sampling rates. Figure 4.18 presents the simulated jitter
tolerance of the FSR CDR with the sampling rates annotated in the legend. The sampling
rates were chosen such that they are a ratio of 16 samples, which is the number of DeMUX
channels (see Figure 4.2), and an integer number of UIs ranging from 9 to 15. In these
simulations, the sampling rate is 10 GS/s while the baud rate is adjusted to achieve the
desired sampling rates. The simulations were performed with a 231− 1 PRBS sequence
and BER ≤ 5 ·10−6.
The low-frequency jitter tolerance is weakly-affected by the sampling rate. However,
the high-frequency jitter tolerance shows a higher dependence on the sampling rate. As
the sampling rate increases from 1.07x, the high-frequency jitter tolerance improves. With
4.7. Simulation and Measurement Results 78
103
0.1
1
10
100
1000
Jitter Frequency, Hz
Jitter Amplitude, UI PP
104 105 106 107 1080.01
109 1010
Sampling Rate
16/15 ≈ 1.07
16/13 ≈ 1.23
16/11 ≈ 1.45
16/9 ≈ 1.78
Figure 4.18: Simulated jitter tolerance (BER ≤ 5 ·10−6).
the further increase of the sampling rate, the improvements diminish. This diminishing
improvement in the jitter tolerance guided the choice of the 1.45x sampling rate for the
fabricated receiver test-chip. The variation of the high-frequency jitter tolerance with the
sampling rate is consistent with the sample energy per UI. As the sampling rate approaches
1x, with blind sampling the worst-case sample energy approaches zero when the samples
occur close to the UI boundaries. This worst-case sample energy per UI increases with the
increasing sampling rate.
The simulations also confirm the expected tolerance to cycle-to-cycle jitter. With sam-
pling at 1.45x, the time between two adjacent samples is 0.6875 UI, and the expected jitter
tolerance at half the data rate is 0.3125 UIPP (see Section 2.3.2 for details). In the sim-
ulations, the jitter tolerance at 2 GHz reduces to approximately 0.3 UIPP, which is close
to the expected jitter tolerance. The high-frequency jitter tolerance is 0.65 UIPP, which is
below the expected 1 UIPP due to the linear nature of the phase estimation in the proposed
CDR. The behavioral simulations indicate that the proposed CDR architecture is functional,
which provides grounds for the experimental verification of the FSR CDR. The design flow
of the proposed FSR CDR is the same as the flow illustrated in Figure 3.15 and outlined in
Section 3.6.
Figure 4.19 illustrates the die photo of the test-chip receiver that implements the 1.45x
FSR CDR in 65 nm standard-logic CMOS. In the receiver, the input buffers, ADC, clock
4.7. Simulation and Measurement Results 79
1900µm
Output B
uffe
rs
4 chan.2.5GS/sADCs
400x490 µm2
4:16 DeMUX60x490µm2
Input Buffers50x60 µm2
CDR430x270µm2
SynthesizedLogic
TestStruct.
Bias Gen. &Clock Div.170x140 µm2
Figure 4.19: Test-chip die photograph.
Table 4.3: Test-chip parameters.
Process 65 nm CMOSData rate 6.875 Gb/s
Sampling rate 10 GS/sSupply 1.2 VPower 175.2 mW
Receiver area 0.37 mm2
divider and the DeMUX are custom blocks, while the digital CDR and the test-structures
are all synthesized. The test-chip has an integrated PRBS comparator for the receiver verifi-
cation. Table 4.3 summarizes the test-chip parameters. The receiver samples the 6.875 Gb/s
signal at 10 GS/s for the sampling rate of 1.45x. It consumes 175.2 mW of power from a
1.2 V supply, and it occupies 0.37 mm2 of die area.
To verify the operation of the ADC and DeMUX, Figure 4.20 presents an eye diagram
that was reconstructed from the samples measured at the DeMUX output (see Figure 4.2).
To reconstruct this eye, a 6.875 Gb/s 27 − 1 PRBS sequence was applied at the receiver
input, and then 0.5 million DeMUXed samples were captured with a logic analyzer. Since
every DeMUX channel, i, has a constant sampling phase, or time stamp, T Si, associated
with it, the DeMUX channels were arranged along the horizontal axis in the ascending
order of their sampling phases. The figure also annotates the DeMUX channel numbers
4.7. Simulation and Measurement Results 80
1 4 7 10 13 16 3 6 9 12 15 2 5 8 11 14
0/16 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16 9/16 10/16 11/16 12/16 13/16 14/16 15/16
ADC Sample Value
2
30
26
22
18
14
10
6
φXTime, UI
0 0.2 0.4 0.6 0.8 1
DeMUX Channel, i
Time Stamp, TSi
Figure 4.20: Measured eye diagram at the demux output.
and their corresponding time stamps under the eye diagram. The sampling rate of 16/11
leads to the total of 16 unique sampling phases, which quantizes the eye diagram to 16 bins
in the horizontal direction. The 5-bit ADC resolution quantizes the diagram to 32 bins in
the vertical direction. The open eye at the ADC output indicates that both the ADC and
the DeMUX are functional, and that the error-free data recovery is possible with the FSR
CDR.
Figure 4.21 shows the measured jitter tolerance of the fabricated receiver test-chip. This
jitter tolerance was measured with a 6.875 Gb/s 27−1 PRBS input and the sampling rate of
10 GS/s. The blind sampling clock has 760 ps random jitter (RMS) and -128 dBc/Hz phase
noise at 1 MHz offset. The jitter tolerance was recorded at BER ≤ 10−12. For a conve-
nient comparison between the measured and simulated results, Figure 4.21 also illustrates
a simulated jitter tolerance with the same input sequence. To maintain a reasonable simula-
tion time, the jitter tolerance was recorded at BER ≤ 5 ·10−6. A close match between the
simulated and measured jitter tolerances experimentally confirms the functionality of the
proposed FSR CDR architecture.
The measurements show that the FSR CDR tolerates up to 0.98 % (9800 ppm) of the
4.8. Summary 81
1030.1
1
10
100
1000
Jitter Frequency, Hz
Jitter Amplitude, UI PP
104 105 106 107 108
Simulated, BER ≤ 5·10–6
Measured, BER ≤ 10–12
Figure 4.21: Measured jitter tolerance.
frequency offset between the transmitter and receiver with BER≤ 10−12. In these measure-
ments, the sampling clock frequency was shifted from the nominal 5 GHz to 4.951 GHz
and 5.049 GHz while the data rate remained at the nominal value of 6.875 Gb/s.
By reducing the sampling rate from 2x to 1.45x, the proposed FSR CDR reduces the
ADC power per Gb/s of data rate by 27.3 % compared to a 2x ADC-based receiver. This
reduction of the ADC power comes at the cost of doubling the gate count in the digital
CDR in comparison with the 2x feed-forward CDR presented in Chapter 3; however, the
power per Gb/s of data rate and the total receiver area are reduced by 12.5 %.
4.8 Summary
This chapter presented the blind-sampling fractional-sampling-rate ADC-based CDR ar-
chitecture. This architecture reduces the ADC sampling rate from 2x to a fractional rate
between 2x and 1x thus saving the ADC power and area per Gb/s of data rate. The dig-
ital CDR then recovers the data from the fractionally-spaced digital samples of the sig-
nal using a feed-forward topology similar to that proposed in Chapter 3. The FSR CDR
accommodates the fractional sampling rate using the phase-detection, data-decision and
vector-compaction schemes presented in this chapter.
The proposed CDR was implemented in a receiver test-chip that samples a 6.875 Gb/s
4.8. Summary 82
signal at 10 GS/s for the sampling rate of 1.45x. The CDR then successfully recovers
the data in the digital domain, which is confirmed through the measured jitter tolerance.
The receiver test-chip consumes 175.2 mW from a 1.2 V supply and occupies the area of
0.37 mm2. Sampling at 1.45x reduces the ADC area and power per Gb/s of data rate by
27.3 % compared to sampling at 2x. The simulation and measurement results show that the
proposed FSR CDR architecture is functional and it is applicable for the high-speed serial
interconnects.
Chapter 5
Conclusions
THIS THESIS has explored the blind-sampling ADC-based receivers for high-speed
signaling applications. As a result of this exploration, the thesis has proposed two
new CDR architectures for the ADC-based receivers.
First, the proposed feed-forward CDR architecture recovers the phase and data directly
from the blind digital samples of the received signal in a feed-forward manner, eliminating
the need for an interpolating feedback loop used in the previously reported blind ADC-
based CDRs. The feed-forward topology reduces the CDR’s circuit complexity, making
this architecture suitable for high-speed interconnects. To experimentally validate the pro-
posed architecture, a 5 Gb/s 2x ADC-based receiver with the feed-forward CDR was imple-
mented in 65 nm CMOS. The measurements of the receiver test-chip show that the CDR
successfully recovers the data, which validates the proposed architecture.
Second, to reduce the ADC power and area, the proposed FSR CDR architecture re-
duces the sampling rate from an integer rate of 2x to a fractional rate between 2x and 1x
the baud rate. The CDR then relies on the feed-forward topology to recover the phase and
data from the fractionally-spaced samples of the receiver signal. The feed-forward topol-
ogy enables the FSR CDR to maintain a sufficiently low circuit complexity which leads to
the overall receiver power and area savings. To verify the proposed FSR CDR architecture,
a 1.45x ADC-based receiver was implemented in 65 nm CMOS. The receiver successfully
recovers 6.875 Gb/s data from the samples taken at 10 GS/s. Reducing the sampling rate to
1.45x reduces the ADC power and area per Gb/s of data rate by 27.3 % compared to the 2x
receiver with the feed-forward CDR, while the overall receiver power and area reduce by
83
5.1. Thesis Contributions 84
12.5 %. These measurement results confirm that the FSR CDR architecture is applicable
to the high-speed interconnects, and that it reduces the area and power compared to the 2x
CDR architecture.
5.1 Thesis Contributions
The contributions of this thesis are two new CDR architectures for the high-speed blind-
sampling ADC-based receivers. These architectures are:
• A 2x feed-forward CDR architecture that reduces the circuit complexity of the blind-
sampling ADC-based receivers, making this type of receivers suitable for high-speed
interconnects. The architecture was implemented and characterized in a 5 Gb/s re-
ceiver test-chip. This work was accepted for publication in IEEE Journal of Solid-
State Circuits (JSSC), to appear in June 2010 issue [17]. This architecture was also
used in an ADC-based receiver presented at IEEE International Solid-State Circuits
Conference (ISSCC), 2010 [18].
• A fractional-sampling rate (FSR) CDR architecture that reduces the sampling rate
below 2x in order to save the ADC power and area in the blind-sampling ADC-
based receivers. The architecture was implemented and characterized in a 6.875 Gb/s
1.45x receiver test-chip. This work was presented at IEEE International Solid-State
Circuits Conference (ISSCC), 2010 [19].
5.2 Future Directions
The contributions of this thesis have shown that the clock and recovery are possible in
the blind-sampling ADC-based receivers at high data rates. However, to make the ADC-
based receivers competitive with the binary-sampling phase-tracking receivers in terms of
power efficiency and resilience to limited channel bandwidth, further work is required. This
section outlines potential future directions towards making the blind-sampling ADC-based
receivers feasible for practical high-speed interconnects.
In the ADC-based receivers, the ADC typically consumes a significant portion of the
receiver power. In the examples of the receiver test-chips presented in this thesis, the ADC
consumes approximately 2/3, while the digital CDR consumes about 1/3 of the power. This
5.2. Future Directions 85
high ADC power consumption makes it challenging to use the ADC-based receivers in the
low-power interconnects. Typically, the high-speed receivers use flash ADCs for their low-
latency conversion. One way to reduce the receiver power is to replace the low-latency high-
power flash ADCs with high-latency low-power successive approximation (SAR) ADCs.
In contrast with the feedback topologies, the feed-forward topology of the proposed CDR
architectures is insensitive to the ADC latency, which makes SAR ADCs feasible for the
blind-sampling ADC-based receivers. Further work is required, however, to assure that the
SAR ADCs for the high-speed signaling applications have a sufficiently small sampling
window compared to the bit-interval, and a sufficiently high conversion rate.
One of the primary advantages of the ADC-based receivers is the digital representation
of the signal samples, which allows for the signal equalization in the digital domain after
sampling. In fact, an adaptive feed-forward equalizer (FFE) for the 2x feed-forward CDR
is presented in [18]. This linear FFE, however, enhances the quantization noise of the ADC.
To mitigate the noise enhancement, a 1-tap speculative decision-feedback equalizer (DFE)
with a programmable tap weight is presented in [47]. To improve the resilience of the blind-
sampling ADC-based receivers to the limited channel bandwidth, it is desirable to extend
the 1-tap DFE to a multi-tap DFE as well as to develop an adaptation algorithm suitable for
this DFE scheme. A combination of an adaptive FFE with an adaptive DFE would allow
the ADC-based receivers to operate in the presence of high channel attenuation, which is
required by the high-speed signaling standards.
The equalization schemes presented in the recent publications focus on blind-sampling
ADC-based receivers with an integer sampling rate of 2x [18, 47]. However, to date, no
equalization schemes are reported for the fractional sampling rates. Extending the FFE
and DFE schemes to the fractional sampling rates would make the FSR CDR architecture
suitable for practical high-speed interconnects.
Through simulations, this thesis has shown that the FSR CDR architecture is able to
recover the data with various sampling rates. Reducing the sampling rate allows to reduce
the ADC power while maintaining the data rate. The reduction of the sampling rate, how-
ever, comes at the cost of reducing the high-frequency jitter tolerance. The jitter tolerance
is also related to the quality of the channel in an interconnect. Typically, the high channel
attenuation degrades the jitter tolerance. It would be desirable to make the sampling rate
adaptable to the channel quality in the FSR CDR architecture. This adaptability can use a
5.2. Future Directions 86
trade-off between the sampling rate and the jitter tolerance in order to save the ADC power
when the channel quality allows it. As an example, to maintain a desired jitter tolerance, a
low-attenuation channel allows for a lower sampling rate to reduce the ADC power, while
a high-attenuation channel requires a nominal sampling rate. The sampling rate adaptation
scheme might be similar to an equalizer adaption scheme since they both depend on the
amount of channel attenuation.
References
[1] C. Combs, Printed Circuits Handbook, 5th ed. McGraw Hill, New York, 2001.
[2] J. Bulzacchelli, M. Meghelli, S. Rylov, W. Rhee, A. Rylyakov, H. Ainspan, B. Parker,
M. Beakes, A. Chung, T. Beukema, P. Pepeljugoski, L. Shan, Y. Kwark, S. Gowda,
and D. Friedman, “A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS tech-
nology,” IEEE Journal of Solid-State Circuits, vol. 41, no. 12, pp. 2885–2900, De-
cember 2006.
[3] H. Sugita, K. Sunaga, K. Yamaguchi, and M. Mizuno, “A 16Gb/s 1st-tap FFE and
3-tap DFE in 90nm CMOS,” in IEEE International Solid-State Circuits Conference
Technical Digest, vol. 53, February 2010, pp. 162–163.
[4] O. Agazzi, M. Hueda, D. Crivelli, H. Carrer, A. Nazemi, G. Luna, F. Ramos, R. Lopez,
C. Grace, B. Kobeissy, C. Abidin, M. Kazemi, M. Kargar, C. Marquez, S. Ramprasad,
F. Bollo, V. Posse, S. Wang, G. Asmanis, G. Eaton, N. Swenson, T. Lindsay, and
P. Voois, “A 90 nm CMOS DSP MLSD transceiver with integrated AFE for electronic
dispersion compensation of multimode optical fibers at 10 Gb/s,” IEEE Journal of
Solid-State Circuits, vol. 43, no. 12, pp. 2939–2957, December 2008.
[5] H.-M. Bae, J. Ashbrook, J. Park, N. Shanbhag, A. Singer, and S. Chopra, “An MLSE
receiver for electronic dispersion compensation of OC-192 fiber links,” IEEE Journal
of Solid-State Circuits, vol. 41, no. 11, pp. 2541–2554, November 2006.
[6] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Col-
man, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Kil-
lips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson,
A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth, “A 12.5Gb/s
87
References 88
SerDes in 65nm CMOS using a baud-rate ADC with digital receiver equalization and
clock recovery,” in IEEE International Solid-State Circuits Conference Technical Di-
gest, February 2007, pp. 436–437, 591.
[7] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman, D. Pi,
B. Raghavan, H. Pan, I. Fujimori, and A. Momtaz, “A 500mW digitally calibrated
AFE in 65nm CMOS for 10Gb/s serial links over backplane and multimode fiber,” in
IEEE International Solid-State Circuits Conference Technical Digest, February 2009,
pp. 370–371.
[8] “Assembly and Packaging,” The International Technology Roadmap for
Semiconductors (ITRS), pp. 4–7, December 2007. [Online]. Available:
http://www.itrs.net/Links/2007ITRS/Home2007.htm
[9] “HDMI Specification Ver. 1.3a,” HDMI Licensing, LLC, Sunnyvale, CA,
USA, November 2006. [Online]. Available: http://www.hdmi.org/manufacturer/
specification.aspx
[10] “PCI Express Base 2.1 Specification,” PCI-SIG, Beaverton, OR, USA, March 2009.
[Online]. Available: http://www.pcisig.com/specifications/pciexpress/
[11] “Serial ATA Revision 3.0 Specification,” SATA-IO Administration, Beaverton,
OR, USA, June 2009. [Online]. Available: https://www.sata-io.org/developers/
purchase spec.asp
[12] “Universal Serial Bus Revision 3.0 Specification,” USB Implementers Forum, Inc.,
Beaverton, OR, USA, November 2008. [Online]. Available: http://www.usb.org/
developers/docs/
[13] J. Buckwalter, M. Meghelli, D. Friedman, and A. Hajimiri, “Phase and amplitude pre-
emphasis techniques for low-power serial links,” IEEE Journal of Solid-State Circuits,
vol. 41, no. 6, pp. 1391–1399, June 2006.
[14] B. Razavi, Design of Integrated Circuits for Optical Communications. McGraw Hill,
2003.
References 89
[15] I. Mehr and D. Dalton, “A 500-MSample/s, 6-bit Nyquist-rate ADC for disk-drive
read-channel applications,” IEEE Journal of Solid-State Circuits, vol. 34, no. 7, pp.
912–920, July 1999.
[16] B. Razavi, Phase-Locking in High-Performance Systems:From Devices to Architec-
tures. IEEE Press, 2003.
[17] O. Tyshchenko, A. Sheikholeslami, H. Tamura, M. Kibune, H. Yamaguchi, and
J. Ogawa, “A 5-Gb/s ADC-based feedforward CDR in 65 nm CMOS,” IEEE Jour-
nal of Solid-State Circuits, vol. 45, no. 6, pp. 1091–1098, June 2010.
[18] H. Yamaguchi, H. Tamura, Y. Doi, Y. Tomita, T. Hamada, M. Kibune, S. Ohmoto,
K. Tateishi, O. Tyshchenko, A. Sheikholeslami, T. Higuchi, J. Ogawa, T. Saito,
H. Ishida, and K. Gotoh, “A 5Gb/s transceiver with an ADC-based feedforward CDR
and CMA adaptive equalizer in 65nm CMOS,” in IEEE International Solid-State Cir-
cuits Conference Technical Digest, vol. 53, February 2010, pp. 168–169.
[19] O. Tyshchenko, A. Sheikholeslami, H. Tamura, Y. Tomita, H. Yamaguchi, M. Kibune,
and T. Yamamoto, “A fractional-sampling-sate ADC-based CDR with feedforward
architecture in 65nm CMOS,” in IEEE International Solid-State Circuits Conference
Technical Digest, vol. 53, February 2010, pp. 166–167.
[20] M. Horowitz, K. Y. Chih-Kong, and S. Sidiropoulos, “High-speed electrical signaling:
overview and limitations,” IEEE Micro, vol. 18, no. 1, pp. 12–24, January/February
1998.
[21] J. R. Barry, E. A. Lee, and D. G. Messerscmitt, Digital Communication. Springer,
2004.
[22] E. Sackinger, Broadband Circuits for Optical Fiber Communication. John Wiley,
2005.
[23] Y. Hidaka, G. Weixin, T. Horie, H. J. Jian, Y. Koyanagi, and H. Osone, “A 4-channel
1.2510.3 Gb/s backplane transceiver macro with 35 dB equalizer and sign-based zero-
forcing adaptive control,” IEEE Journal of Solid-State Circuits, vol. 44, no. 12, pp.
3547–3559, December 2009.
References 90
[24] Y. Tomita, M. Kibune, J. Ogawa, W. Walker, H. Tamura, and T. Kuroda, “A 10-Gb/s
receiver with series equalizer and on-chip ISI monitor in 0.11-um CMOS,” IEEE Jour-
nal of Solid-State Circuits, vol. 40, no. 4, pp. 986–993, April 2005.
[25] S. Gondi, J. Lee, D. Takeuchi, and B. Razavi, “A 10Gb/s CMOS adaptive equalizer
for backplane applications,” in IEEE International Solid-State Circuits Conference
Technical Digest, vol. 48, February 2005, pp. 328–601.
[26] A. Fiedler, R. Mactaggart, J. Welch, and S. Krishnan, “A 1.0625 Gbps transceiver
with 2x-oversampling and transmit signal pre-emphasis,” in IEEE International Solid-
State Circuits Conference Technical Digest, vol. 43, February 1997, pp. 238–239.
[27] R. Farjad-Rad, C.-K. Yang, M. Horowitz, and T. Lee, “A 0.4-um CMOS 10-Gb/s
4-PAM pre-emphasis serial link transmitter,” IEEE Journal of Solid-State Circuits,
vol. 34, no. 5, pp. 580–585, May 1999.
[28] V. Stojanovic, G. Ginis, and M. Horowitz, “Transmit pre-emphasis for high-speed
time-division-multiplexed serial-link transceiver,” in IEEE International Conference
on Communications, vol. 3, August 2002, pp. 1934–1939.
[29] H. Higashi, S. Masaki, M. Kibune, S. Matsubara, T. Chiba, Y. Doi, H. Yam-
aguchi, H. Takauchi, H. Ishida, K. Gotoh, and H. Tamura, “A 5-6.4-Gb/s 12-channel
transceiver with pre-emphasis and equalization,” IEEE Journal of Solid-State Circuits,
vol. 40, no. 4, pp. 978–985, April 2005.
[30] T. Beukema, M. Sorna, K. Selander, S. Zier, B. Ji, P. Murfet, J. Mason, W. Rhee,
H. Ainspan, B. Parker, and M. Beakes, “A 6.4-Gb/s CMOS SerDes core with feed-
forward and decision-feedback equalization,” IEEE Journal of Solid-State Circuits,
vol. 40, no. 12, pp. 2633–2645, December 2005.
[31] J. Zerbe, C. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W. Stonecypher,
A. Ho, T. Thrush, R. Kollipara, M. Horowitz, and K. Donnelly, “Equalization and
clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell,” IEEE
Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2121–2130, December 2003.
References 91
[32] S. Shekhar, J. Walling, and D. Allstot, “Bandwidth extension techniques for CMOS
amplifiers,” IEEE Journal of Solid-State Circuits, vol. 41, no. 11, pp. 2424–2439,
November 2006.
[33] A. Momtaz and M. Green, “An 80mW 40Gb/s 7-tap T/2-spaced FFE in 65nm CMOS,”
in IEEE International Solid-State Circuits Conference Technical Digest, vol. 52,
February 2009, pp. 364–365.
[34] J. Jaussi, G. Balamurugan, D. Johnson, B. Casper, A. Martin, J. Kennedy,
N. Shanbhag, and R. Mooney, “8-Gb/s source-synchronous I/O link with adaptive
receiver equalization, offset cancellation, and clock de-skew,” IEEE Journal of Solid-
State Circuits, vol. 40, no. 1, pp. 80–88, January 2005.
[35] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits. IEEE
Press, 1996.
[36] N. Nedovic, N. Tzartzanis, H. Tamura, F. Rotella, M. Wiklund, Y. Mizutani,
Y. Okaniwa, T. Kuroda, J. Ogawa, and W. Walker, “A 4044 Gb/s 3x oversampling
CMOS CDR/1:16 DEMUX,” IEEE Journal of Solid-State Circuits, vol. 42, no. 12,
pp. 2726–2735, December 2007.
[37] J. Kim and D.-K. Jeong, “Multi-gigabit-rate clock and data recovery based on blind
oversampling,” IEEE Communications Magazine, vol. 41, no. 12, pp. 68–74, Decem-
ber 2003.
[38] K. Mueller and M. Muller, “Timing recovery in digital synchronous data receivers,”
IEEE Transactions on Communications, vol. 24, no. 5, pp. 516–531, May 1976.
[39] F. Gardner, “Interpolation in digital modems – Part I: Fundamentals,” IEEE Transac-
tions on Communications, vol. 41, no. 3, pp. 501–507, March 1993.
[40] M. Spurbeck and R. Behrens, “Interpolated timing recovery for hard disk drive read
channels,” in IEEE International Conference on Communications, vol. 3, June 1997,
pp. 1618–1624.
[41] M. van Ierssel, “Circuit techniques for high-speed chip-to-chip signaling,” Ph.D. dis-
sertation, University of Toronto, 2006.
References 92
[42] G. D. Vishakhadatta, R. Croman, M. Goldenberg, J. Hein, P. Katikaneni, D. Kuai,
C. Lee, I. C. Tesu, R. Trujillo, L. Zhang, K. Anderson, R. Behrens, W. Bliss, L. Du,
T. Dudley, G. Feyh, W. Foland, M. Kastner, Q. Li, J. Mitchem, D. Reed, S. She,
M. Spurbeck, L. Sundell, H. Tran, M. Wei, and C. Zook, “An EPR4 read/write channel
with digital timing recovery,” IEEE Journal of Solid-State Circuits, vol. 33, no. 11, pp.
1851–1857, November 1998.
[43] W. Zhang and R. Spencer, “Timing recovery for backplane ethernet,” IEEE Transac-
tions on Circuits and Systems I, vol. 54, no. 8, pp. 1711–1723, August 2007.
[44] M. Pozzoni, S. Erba, D. Sanzogni, M. Ganzerli, P. Viola, D. Baldi, M. Repossi,
G. Spelgatti, and F. Svelto, “A 12Gb/s 39dB loss-recovery unclocked-DFE receiver
with bi-dimensional equalization,” in IEEE International Solid-State Circuits Confer-
ence Technical Digest, vol. 53, February 2010, pp. 164–165.
[45] V. Balan, J. Caroselli, J.-G. Chern, C. Chow, R. Dadi, C. Desai, L. Fang, D. Hsu,
P. Joshi, H. Kimura, C. Liu, T.-W. Pan, R. Park, C. You, Y. Zeng, E. Zhang, and
F. Zhong, “A 4.8-6.4-Gb/s serial link for backplane applications using decision feed-
back equalization,” IEEE Journal of Solid-State Circuits, vol. 40, no. 9, pp. 1957–
1967, September 2005.
[46] M. van Ierssel, A. Sheikholeslami, H. Tamura, and W. Walker, “A 3.2 Gb/s CDR using
semi-blind oversampling to achieve high jitter tolerance,” IEEE Journal of Solid-State
Circuits, vol. 42, no. 10, pp. 2224–2234, October 2007.
[47] S. Sarvari, T. Tahmoureszadeh, A. Sheikholeslami, H. Tamura, and M. Kibune, “A
5Gb/s speculative DFE for 2x blind ADC-based receivers in 65-nm CMOS,” in IEEE
Symposium on VLSI Circuits Digest of Technical Papers, June 2010, pp. 69–70.
[48] M. van Ierssel, H. Yamaguchi, A. Sheikholeslami, H. Tamura, and W. Walker, “Event-
driven modeling of CDR jitter induced by power-supply noise, finite decision-circuit
bandwidth, and channel ISI,” IEEE Transactions on Circuits and Systems I, vol. 55,
no. 5, pp. 1306–1315, June 2008.