Equalizer Design and Performance Trade-Offs in ADC-Based Serial Links

8/18/2019 Equalizer Design and Performance Trade-Offs in ADC-Based Serial Links

1/12

2096 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 58, NO. 9, SEPTEMBER 2011

Equalizer Design and Performance Trade-Offs inADC-Based Serial Links

Jaeha Kim , Senior Member, IEEE , E.-Hung Chen , Member, IEEE , Jihong Ren , Member, IEEE ,Brian S. Leibowitz , Member, IEEE , Patrick Satarzadeh , Member, IEEE , Jared L. Zerbe , Member, IEEE , and

Chih-Kong Ken Yang , Fellow, IEEE

Abstract—This paper investigates the performance benefit of using nonuniformly quantized ADCs for implementing high-speedserial receivers with decision-feedback equalization (DFE). A wayof determining an optimal set of ADC thresholds to achieve theminimum bit-error rate (BER) is described, which can yield avery different set from the one that minimizes signal quantizationerrors. By recognizing that both the loop-unrolling DFE receiverand ADC-based DFE receiver decide each received bit basedupon the result of a single slicer, an efficient architecture named reduced-slicer partial-response DFE (RS-PRDFE) receiver is pro-posed. The RS-PRDFE receiver eliminates redundant or unused

slicers from the previous DFE receiver implementations. Both thesimulation and measurement results from a 10 Gb/s ADC-basedreceiver fabricated in 65 nm CMOS technology and multiplebackplane channels demonstrate that the RS-PRDFE can achievethe BER of a 3–4-bit uniform ADC only with 4 data slicers. Also,the combined use of linear equalizers (LEs) can further reduce therequired slicer count in RS-PRDFE receivers, but only when theLEs are realized in analog domain.

Index Terms—Analog-digital conversion, data communication,equalizers, receivers.

I. INTRODUCTION

AS THE complexity of electrical and optical communica-tion links increases, there is a growing interest towards

implementing the transceivers based on analog-to-digital con-

verters (ADCs) and digital signal processors (DSPs) [1]–[4].

As the data rates rose, various channel impairments including

skin loss, dielectric loss, reflections, and crosstalk have become

more pronounced and call for advanced coding and modula-

tion schemes. While the aggressive scaling of CMOS has made

it feasible to build fast digital logic that can perform such so-

phisticated signal processing algorithms in the digital domain,

it is still very challenging to design an ADC with above 10

Manuscript received March 07, 2011; revised May 19, 2011; accepted June23, 2011. Date of current version September 14, 2011. This paper was recom-mended by Editor G. Manganaro.

J. Kim is with School of Electrical Engineering and Computer Science, SeoulNational University, Seoul, 151-742, Korea (e-mail: [email protected]).

E.-H. Chen and C.-K. K. Yang are with the Electrical Engineering De-partment, University of California, Los Angeles, CA, 90095 USA (e-mail:[email protected]; [email protected]).

J. Ren, B. S. Leibowitz, andJ. L. Zerbe are with Rambus, Inc., Sunnyvale,CA94089, USA (e-mail: [email protected]; [email protected]; [email protected]).

P. Satarzadeh was with Rambus, Inc. Sunnyvale, CA 94089, USA. He is nowwith Texas Instruments, Inc., Dallas, TX 75243 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2011.2162465

Gbps sampling rates and high enough resolution. For example,

even at moderate resolution of 6 bits, an ADC may dissipate

more than 1 W [5]–[8]. Such high power consumption of ADCs

has been a discouraging factor for their full adoption in back-

plane transceivers and the design of low-power ADCs has been

one of the primary research directions. For instance, a recently

published work demonstrated a 10 Gbps ADC-based backplane

transceiver consuming 0.5 W [7].

In recognition of these trends, this paper aims to strike a

balance between the flexibility and power efficiency in ADC-based transceiver designs. In particular, it describes a way of

maximizing the performance of an ADC-based receiver with

a coarse-resolution ADC, performing linear equalization (LE)

and decision feedback equalization (DFE). We find that for a

giventarget bit-error rate (BER) performance, the required ADC

resolution can be greatly relaxed when the ADC is allowed to

have nonuniform quantization levels.

It is noteworthy that the optimal placement of such nonuni-

form ADC decision thresholds is not necessarily the one that

minimizes the quantization errors, especially for low-resolution

ADCs. We explain this by demonstrating that an ADC-based

DFE receiver is in fact equivalent to a loop-unrolling DFE re-ceiver [9]–[11]. The optimal threshold placement for the min-

imum bit error rate (BER) is the one that maximizes the signal

margin of the selected data slicer. The equivalence between the

ADC-based DFE receiver and the loop-unrolling DFE receiver

is discussed in Section II.

Based on this observation, this paper proposes an opti-

mally configured, nonuniform ADC-based receiver, called a

reduced-slicer partial-response DFE (RS-PRDFE) [15], which

differentiates itself from the conventional, fully expanded,

loop-unrolling DFE, also referred to as partial-response DFE

(PRDFE). Section III describes a dynamic programming algo-

rithm that can determine the optimal placement of the ADC

thresholds for given channel characteristics. The performanceof the RS-PRDFE receivers is demonstrated both in simulation

and in measurement, as described in Section IV. Section V

then addresses the topic of jointly optimizing the RS-PRDFE

receiver with various types of linear equalizers.

II. REDUCED-SLICER PARTIAL-RESPONSE DECISION FEEDBACK

EQUALIZER (RS-PRDFE)

A. Equivalence Between ADC-Based DFE and Loop-Unrolling

DFE Receivers

Fig. 1 describes the signal flow in an ADC-based receiver per-

forming decision feedback equalization (DFE). Once the ADC

1549-8328/$26.00 © 2011 IEEE


2/12

KIM et al.: EQUALIZER DESIGN AND PERFORMANCE TRADE-OFFS IN ADC-BASED SERIAL LINKS 2097

Fig. 1. An ADC-based DFE receiver. (a) Its architecture. (b) Its signal flowdiagram where the ADC is modeled as a source of quantization noise.

converts the received signal into a digital form, the DSP pro-

cesses the DFE operation, which computes and subtracts the ap-

propriate amount of offset from the digitized input based on the

prior bit decisions. The DSP also contains the decision slicer,

which compares the resulting value with a threshold and deter-

mines the current bit. To minimize BERs, the signal-to-noise

ratio (SNR) at this decision slicer’s input must be maximized.

The quantization errors introduced by the ADC are counted to-

wards the unwanted noise and hence the ADC strives to have as

high resolution as possible.

On the other hand, an analog DFE receiver subtracts the offset

voltage in the analog domain as illustrated in Fig. 2(a). Another

difference is that its decision slicer compares the resulting signal

with an analog threshold (analog comparison) while that in the

ADC-based receiver in Fig. 1 compares the two inputs in digitalforms (digital comparison). Since the slicer output is always a

binary value, the signal around the DFE loop crosses the analog-

digital boundary twice: once through the analog comparator and

once through the feedback path generating the analog offset

from the prior bits. The two conversion steps make it difficult

to close the timing around the loop within one bit period.

Loop-unrolling DFEs or partial-response DFEs (PRDFEs)

mitigate this difficulty by shifting this timing loop entirely into

the digital domain [9]–[11]. As illustrated in Fig. 2(b), the re-

ceiver precomputes all possible offset values and compares the

input signal with each and every offset. Once all the results enter

into the digital domain, one of them is selected based on the priorbit history. Since the decision feedback loop is now entirely

within the digital domain, higher frequency operation is pos-

sible. However, a drawback is that the number of offset values

to be compared with and hence the number of decision slicers

grows exponentially with the number of DFE tap coeffi-

cients (N).

These seemingly different ADC-based DFE receiver in

Fig. 1(a) and loop-unrolling PRDFE receiver in Fig. 2(b) are in

fact equivalent and can be optimized using the same principles.

Recall that the core DFE operation is to subtract a proper offset

from the received signal before the bit decision. For both types

of receivers, the bit decision is made based on a single, critical

analog comparison. For the PRDFE, it is quite apparent sinceone of the slicer outputs is selected as the current bit decision.

Fig. 2. Analog DFE receivers. (a) Conventional DFE. (b) Loop-unrollingPRDFE.

For the ADC-based DFE, determining whether the quantized

ADC output is greater than a certain offset is equivalent to

determining whether the analog input signal is above a quan-

tization threshold that is closest to this offset. In case of a

flash-ADC, the bit is decided solely upon one particular slicer

output that has the corresponding threshold.

This observation suggests a DFE receiver architecture that we

call reduced-slicer partial-response DFE (RS-PRDFE), shown

in Fig. 3. This architecture is similar to the PRDFE in Fig. 2(b)

in that it selects one of the loop-unrolled slicer decisions as the

current bit value. But the key difference is that a slicer is selected

through a look-up table (LUT) rather than being direct-mapped,and therefore a single slicer may be selected for multiple prior

bit histories. That is, if some of the slicers in the PRDFE have

similar threshold values, RS-PRDFE can merge those redun-

dant slicers into one. On the other hand, an ADC-based DFE

receiver may contain slicers whose outputs are never used for

bit decision, especially when the ADC has uniformly fine reso-

lution. RS-PRDFE eliminates those unused slicers and replaces

the thermometer-to-binary conversion, digital feedback equal-

izer ( in Fig. 1), and binary subtraction unit with a simple

look-up table and multiplexer.

Therefore, the proposed RS-PRDFE can save power and area

by removing redundant or unused slicers in the loop-unrolling

PRDFE and ADC-based DFE receivers without degrading theBER performance.


3/12


Fig. 3. The proposed reduced-slicer PRDFE receiver (RS-PRDFE).

Fig. 4. The factors determining the bit-error rate (BER) of an ADC-based DFEreceiver in the presence of channel ISI. The signal margin is degraded by the ISIfrom the bits within the DFE tap range and ISI from those outside the DFErange . Thelowest BERis achieved when thedecisionthreshold is equalto the .

B. BER Model for ADC-Based DFE Receivers

Since the ADC-based DFE, PRDFE, and RS-PRDFE are all

functionally equivalent in that each bit decision maps to a single

critical analog comparison, their BER performance can be mod-

eled in the same way. The BER is set by the probability of the

critical comparison resolving incorrect results.

To derive the expression for the BER, consider an N-tap DFE

receiver for a signaling system whose intersymbol interference

(ISI) spansL bit periods. Foreachofthe possibleprior

N-bit histories, there is a slicer whose comparison result will be

used as the current bit decision. Let denote the threshold of

that critical slicer . If the received s ignal for the

th bit experiences the total ISI of from the neighboring L

bits, the probability of detecting the th bit incorrectly

can be expressed as

(1)

where the ISI isassumednormalized tothe received signal

level of and denotes the right tail probability of stan-

dard normal distribution. An additive Gaussian noise is assumed

at the input of the slicer.

The ISI contribution can be decomposed into two parts:

the part that can be canceled by the N-tap DFE and the part

that cannot (i.e., ISI contribution from the L-N bits outside the

DFE range; where . By enumerating all

possible bit patterns each resulting in a different amount of

ISI , one can derive the expressionfor the averageBER

for the 1- and 0-level received bits, shown at the bottom of thepage.

The expression in (2) tells how to choose the decision thresh-

olds for the best performance of an ADC-based DFE. To

achieve the minimum BER, the worst case argument for Q func-

tion must be maximized. In other words,

one must maximize the worst case signal margin for the slicer.

To do so, the decision thresholds should be placed as close as

possible to the predictable ISI levels to minimize .

This leads to a very important observation: the best BER per-

formance for an ADC-based DFE is achieved when it is opti-

mized like a PRDFE. It is noteworthy that the nonuniform ADC

quantization levels resulting from this principle can be very dif-ferent from the existing schemes that aim to minimize the signal

quantization errors, such as adaptive differential pulse-coded

modulation (ADPCM) used in voice applications.

When trying to minimize the quantization error, i.e.,

, one must place the decision thresholds where the signal

is most expected . In binary signaling, such an

optimization tends to place the thresholds near the likely levels

for logic 0 and 1 as shown in Fig. 6(b). However, the next section

will demonstrate that the optimal placement for the minimum

BER minimizes and puts the thresholds near the center

of the eye, as shown in Fig. 6(a).

The PRDFE receiver has slicers and therefore can achieve

the ideal N-tap DFE performance (i.e., by as-

signing each slicer threshold to one of the possible values

of . However, when the receiver has less than slicers, it

is necessary to optimize the slicer thresholds to minimize the

overall BER.

(2)


4/12


Fig. 5. Illustration of the optimal placement problem of slicer thresholdsin an N-tap RS-PRDFE receiver for the minimum peak error 0 . Theproblem is equivalent to contiguously grouping the sorted ISI levels ’sinto M disjoint groups while minimizing the largest group span.

Fig. 6. The optimal placements of four slicer thresholds for: (a) the minimumthreshold error (i.e., the lowest BER; RS-PRDFE) and (b) the minimum signalquantization error. 5 Gb/s operation on the FR4 channel.

III. OPTIMIZATION OF RS-PRDFE SLICER THRESHOLDS

As mentioned previously, the drawback of a PRDFE receiver

is that the number of decision slicers grows exponentially with

the number of DFE taps. However, for voice-line modems, it is

known that the number of quantization steps in the ADC need

not grow with the number of DFE taps once the resolution is fineenough for the quantization errors not to limit the overall BER.

The proposed RS-PRDFE receiver was inspired by this observa-

tion and relaxes the required number of slicers for the PRDFE by

merging the slicers with similar threshold values while keeping

the threshold quantization error small.

When an RS-PRDFE receiver is constrained with M slicers

where , one must choose the M slicer threshold levels

so that the threshold quantization error e( is min-

imized. Note that the set of ’s can take

only M unique values, meaning that some of the slicer outputs

may be selected for more than one prior bit patterns. Instead of

minimizing the BER expressed in (2) directly, it is easier to min-

imize the worst case error based on the following

approximation:

(3)

The approximation is justified by that is a very steep func-

tion and for an extremely low target BER of less than , the

BER is easily dominated by the worst case error e among

its possible values.

The problem of placing M slicer thresholds for the minimum

worst case error can be thought of as first clustering

the set of values into M disjoint groups and then assigningthe center point of each group as the slicer threshold (Fig. 5).

The worst case threshold error within each group is equal to one

half of the span (i.e., the difference between the minimum and

maximum values) of that group. Therefore, minimizing the

maximum threshold error is equivalent to finding M contiguous

grouping of ISI levels ( ’s) that minimizes the largest span

of the groups.

The optimal grouping of ISI levels to any number of Mgroups can be done via a recursive, dynamic programming

procedure. Assuming that the ISI levels ’s are sorted in

an ascending order, let be the largest group span for

the first ISI levels optimally split into groups. To recur-

sively express in terms of with and

smaller than and , respectively, we categorize the possible

-grouping of ISI points based on the number of elements

in the last th group. This last group can have as few as one

element (i.e., ) and as many as elements (since

the other groups must each have at least one element). If

the last group has elements, from through , then

its span is simply where can vary from 1 to

. And the minimum largest group span possible withthe rest of the ISI levels is . One should

choose the number of elements for the last group so that it

minimizes the overall largest group span of , which can

be expressed in the following recursive relationship:

...(4)

(5)

With this definition, is the minimum largest span

achievable for grouping ISI levels into groups. The op-

timal thresholds are given by the center of each group’s span.

The minimum worst case threshold error is hence .

While the described dynamic programming procedure is

guaranteed to find the optimal threshold placement for any

given channel response, it may not be suitable for an online

calibration scheme that can incrementally update the threshold

levels of the individual slicers and their assignments to the prior

bit patterns. One difficulty stems from the fact that the ISI levels

’s need to be sorted first, whose resulting order can vary

strongly with the channel characteristics. Until an effective, yet

low-cost scheme of incremental adaptation is found, a possible

solution is to periodically characterize the channel response

(e.g., the single-bit response) and compute the optimal slicer

thresholds and assignments by firmware or software.

A. ADC Threshold Placements for Minimum BER Versus

Minimum Quantization Error

Fig. 6 compares the optimal threshold placements for the

minimum threshold error (i.e., the optimal RS-PRDFE) and for

the minimum signal quantization error with 4 slicers. Notice

the vast difference between the two placements. The optimal

RS-PRDFE places the thresholds near the center of the eye

while the minimum quantization ADC places those near thesignal levels.


5/12


Fig. 7. Simulated bit-error rates of various types of ADC-based DFE receivers

operating at 5 Gb/s on the FR4 channel.

Prior to this work, it was reported that reducing the full-scale

range (FSR) of a uniform ADC can improve the DFE perfor-

mance, even though the ADC cannot faithfully represent the

signal due to overflows and underflows [12]. Fig. 6 provides an

explanation to these surprising results; a uniform ADC with re-

duced FSR has the threshold levels resembling those of the op-

timal RS-PRDFE receiver.

Fig. 7 compares the simulated BER for the four different

types of DFE receivers: the optimal RS-PRDFE, the DFE with

reduced-FSR ADC, the DFE with nonuniform ADC optimized

for minimum quantization error, and the conventional DFE withuniform ADC. The simulations are carried out with an in-house

statistical link simulation tool, LinkLab [13], which can simu-

late system BERs given the single-bit response of the channel.

Each analog slicer is assumed to have 10-mV deadband due to

hysteresis and metastability and 1-mV input-referred noise.

The results show that the RS-PRDFE indeed achieves the

lowest BER. The uniform ADC with reduced FSR achieves

BERs close to the minimum possible values. The results also

demonstrate that minimum quantization error is clearly a poor

criterion for a DFE receiver as the resulting performance is

sometimes even worse than that of the conventional, uniform

ADC. In this case, note that the increase in the slicer count may

even degrade the BER since the minimum quantization error

ADC would place the slicer thresholds even close to the ex-

pected signal levels , which may be farther from

the optimal levels from the BER perspective .

The next section reconfirms this finding from a 10 Gb/s

RS-PRDFE receiver prototype implemented in 65 nm CMOS

technology [14].

IV. EXPERIMENTAL RESULTS

A simplified diagram of the prototype RS-PRDFE receiver is

shown in Fig. 8 along with the photograph of the chip imple-

mented in 65 nm CMOS [14]. The receiver frontend is 4-wayinterleaved to achieve 10 Gb/s data rate and can use up to 16

slicers for RS-PRDFE operation (i.e., ) whose refer-

ence voltages can be individually adjusted within a 100-mV

range in 2-mV steps. Each slicer is running at 2.5 GS/s and con-

sumes 0.75 mW including the clock buffers. The RS-PRDFE

slicer selection is performed by a tap-assignment block, which

routes each slicer’s decision output to the proper input posi-

tion of the subsequent 32:1 multiplexer. The 32:1 multiplexerthen forwards the preselected input to the receiver’s final output

based on the 5-bit prior history . By performing the

slicer selection in the feedforward path rather than in the feed-

back path as shown in the conceptual architecture in Fig. 3, one

can shorten the critical feedback path delay around the multi-

plexer. Furthermore, when implemented as a tree type, the 32:1

multiplexer can be pipelined, exploiting the fact that the selec-

tion input bits arrive serially, to achieve a high throughput of

10-Gb/s using synthesized circuits. Also, the tap assignment

block can be utilized to reorder the slicers and minimize the

offset errors due to mismatch [16].

The receiver is tested with a 10 Gb/s, 700 mV

PRBS data pattern transmitted via a 25 -long Nelco back-plane channel that has 17 dB loss at 5 GHz. The frequency and

single-bit response (SBR) of the channel is shown in Fig. 9.

To reduce the precursor ISIs and to effectively explore the re-

ceiver performance with different single-bit responses, a tunable

prefiltering circuit consisting of a high-pass filter (HPF), a 1-tap

discrete-time FIR filter, and a variable-gain amplifier (VGA) is

implemented in front of the RS-PRDFE receiver. In one setting,

the effective SBR was as shown in Fig. 10(a).

The effective voltage margin of the receiver is measured by

inserting an extra slicer that samples the incoming signal with

an adjustable, deliberate voltage offset. Its output is fed into a

replica datapath which is identical to the main datapath but re-places one of the main slicer outputs with the extra slicer output,

as shown in Fig. 8(a). Since the output data stream of the replica

path should be identical to that of the main path except the data

originating from the extra slicer, the two data streams can be

XORed to measure the voltage margin of the data slicer being

replaced.

Fig. 10(b) compares the measured voltage margin at

of various receiver architectures that can be configured by

the described prototype chip: the RS-PRDFE receiver, loop-un-

rolling PRDFE receiver, ADC-based receiver with uniform res-

olution, and ADC-based receiver with reduced full-scale range

(FSR). Since the effective SBR has one large postcursor ISI of

53 mV followed by many small ones [Fig. 10(a)], the possible

ISI offsets and hence the optimal slicer thresholds are clustered

around mV and mV levels. Such uneven distribution

of the required ADC thresholds is difficult to realize with a low-

resolution uniform ADC and as a result, the uniform ADC re-

ceiver performs the worst in Fig. 10(b). Reducing the FSRof the

ADC improves the voltage margin somewhat. The RS-PRDFE

receiver always out-performs both the reduced-FSR ADC and

PRDFE receivers for the same number of slicers used. Espe-

cially, the 4-slicer RS-PRDFE achieves the equivalent voltage

margin of the 16-slicer (4-bit) uniformly spaced ADC.

The extra slicer performing the pseudo-BER detection is

triggered by an independently adjustable clock phase and theeffective eye-diagram seen by the DFE receiver can be obtained


6/12


Fig. 8. (a) Block diagram of the prototype RS-PRDFE receiver with a voltagemargin detection circuit. (b) Chip photograph. The receiver contains an analogfrontend, a 16-slicer flash ADC with adjustable reference, and a 5-tap digitalDFE. Total active area is 0.26 mm .

Fig. 9. The measured (a) frequency response and (b) 10 Gb/s single-bit re-sponse of a 25 Nelco backplane channel.

by measuring the BERs as a function of voltage and timing

offsets. Fig. 11 plots such effective eye diagrams measured for

the receiver configurations mentioned earlier with 4 slicers.

As expected, the RS-PRDFE receiver achieves the widest eyeopening.

Fig. 10. (a) The measured single-bit response of a 24 Nelco backplanechannel with prefiltering. (b) Measured voltage margin versus the number of slicers (M) for different ADC-based receiver configurations.

Fig. 11. Measured eye diagram of the 4-slicer ADC receivers: (a) the partialresponse eye diagrams of the individual 4 slicers; the effective eye diagrams of (b) RS-PRDFE, (c) reduced-FSR ADC, and (d) uniform quantization ADC.


7/12


Fig. 12. (a) The measured single-bit response with a different prefiltering set-ting. (b) Measured voltage margin versus the number of slicers (M) for differentADC-based receiver configurations.

It should be noted that the performance benefits of an

RS-PRDFE receiver strongly depends on the SBR charac-

teristics. For example, Fig. 12 compares the receiver voltage

margins for a different prefiltering setting. For the effective

SBR shown in Fig. 12(a), the RS-PRDFE continues to out-per-

form the uniform ADC and reduced-FSR ADC receivers, but

it has no performance gain over the PRDFE receiver. It is

because the SBR has the 5 postcursor ISIs with very distinct

values while the previous SBR in Fig. 10(a) has 3 ISIs at 5mV. The latter distribution of ISIs results in some overlaps

among the possible ISI offsets, allowing the RS-PRDFE to

save slicers for the same BER performance. For instance, if

all the postcursor ISIs had equal magnitudes, the required

number of slicers would grow only linearly with the number of

DFE taps, rather than exponentially as in a PRDFE receiver.

The reduction in the required slicer count leads to savings

in power. The measured power dissipation of this prototype

receiver including the analog frontend was 130 mW (13 pJ/bit)

with 16-slicer RS-PRDFE and only 106 mW (10.6 pJ/bit) with

8-slicer RS-PRDFE configuration.

This observation suggests that it might be possible to fur-

ther improve the performance of the RS-PRDFE receivers byshaping the effective SBRs in a certain way with linear equal-

Fig. 13. Different approaches to reduce the RS-PRDFE slicer count with linearequalizers: (a) suppress the far-end ISIs to zeros, (b) suppress any ISIs withinthe DFE tap range, (c) make the ISIs specific values to force overlaps in ISIoffsets.

izers. The next section investigates this possibility of optimizing

the RS-PRDFE receiver jointly with linear equalizers, such as

transmit and receive equalizers.

V. JOINT OPTIMIZATION WITH LINEAR EQUALIZERS

Combining the described RS-PRDFE receiver with linear

equalizers presents interesting new opportunities to further

reduce the slicer count. With linear equalizers either on the

transmitter side or on the receiver side, one can change the

effective channel characteristics to some degree. Then the

question is: how should one shape the channel to achieve the

lowest BER with minimal hardware? A typical answer for

conventional DFE receivers is to minimize the ISIs outside theDFE tap range, since the number of decision slicers does not

depend on the ISI offset values being subtracted.

In an RS-PRDFE receiver, on the other hand, the number

of required slicers does depend on the distribution of the ISI

offsets created by the N prior bits within the DFE tap range.

As described in Section III, RS-PRDFE receivers leverage the

fact that some ISI offsets can be close to one another and share

a common slicer to reduce the hardware cost. Therefore, it is

also possible to reduce the slicer count not only by making the

channel cleaner (i.e., suppressing more ISIs to zero) but also by

creating deliberate overlaps among the ISI offsets. Fig. 13 illus-

trates this point. Suppose a channel with three postcursor ISIs.An RS-PRDFE receiver would need 8 slicers if their ISI offsets

are not sufficiently close to each other. For traditional PRDFE

receivers, the slicer count can be reduced only if the last ISI is

suppressed to zero [Fig. 13(a)]. However, with RS-PRDFE re-

ceivers, the slicer count can also be reduced if any ISIs within

the DFE tap range become zero [Fig. 13(b)]. In this case, the

RS-PRDFE receiver acts as a roving-tap DFE whose nonzero

tap positions can freely move within the range. In addition, fur-

ther reduction in the slicer count can be achieved if the first

and second ISIs become equal values, for example [Fig. 13(c)].

Since this channel generates only 3 unique ISI offsets, 3 slicers

can cancel all the possible ISI offsets.

Which option is the best? The answer depends on the cost of the linear equalizer shaping the channel for each option listed


8/12


Fig. 14. An example circuit implementation of transmit FIR equalizer.

in Fig. 13. Different types of equalizers such as transmit FIR

equalizer, receive FIR equalizer, and receive continuous-time

equalizer have different costs and will be examined. Since the

cost also depends on the amount of change the linear equalizer

should affect, the answer strongly depends on the channel loss

and dispersion characteristics.

A. RS-PRDFE With Transmit FIR Equalizers

Transmit FIR filters are widely used in today’s chip-to-chip

and backplane transceivers. One reason is that its high-speed

operation can be easily achieved with low-resolution, dig-

ital-to-analog converter like circuits with nonuniform quan-

tization steps. Fig. 14 shows an example of transmit linear

equalizer.

The main drawback of transmit equalizers is that their peak

output swing is constrained. That is, the peak output voltage or

current must not exceed given limits, typically set by the circuit

topology or the operation conditions (e.g., supply voltage). If the transmit equalizer has an impulse response of , its peak

output is proportional to the sum of all absolute values of ’s

and should be limited to a certain value

(6)

The above (6) implies that every nonzero tap coefficient in the

transmit FIR equalizer leads to the reduction in the main signal

swing . In other words, the transmit equalizer may signifi-

cantly degrade signal margins if it tries to alter the channel re-

sponse too much.

Due to this peak swing constraint, for many channels en-countered in chip-to-chip and backplane applications, the

optimal configurations for transmit linear equalizer combined

with RS-PRDFE are generally the ones that use the transmit

LE to suppress the far-end ISIs first. The far-end ISIs refer

to the postcursor ISIs that are positioned far from the main

cursor. Since those ISIs are usually smaller in magnitudes than

the near-end ones (the postcursor ISIs that are nearer to the

main cursor), using the transmit FIR equalizer to suppress them

results in less degradation in the signal margin. On the other

hand, changing the larger ISIs in order to create overlaps among

the ISI offsets may require large tap coefficients and reduce the

main signal swing significantly.

To explore such trade-offs in combining RS-PRDFE with var-ious types of linear equalizers, a few representative backplane

Fig. 15. Measured S-parameters (S21) of two example backplane channels. (a)

A 3 -long trace on a FR4 backplane. (b) A 10.6 -long trace on Elma ATCADual Star backplane. The measurement includes the characteristics of the con-nectors, line card traces, and the 5 feet-long, low-loss SMA cables.

Fig. 16. The measured 10-Gb/s single-bit responses for the two channels.

channels were chosen. The measured S-parameter characteris-

tics and single-bit responses (SBR) of those channels are shown

in Figs. 15 and 16, respectively. The first channel, a 3 -long

trace on a FR4 backplane has low loss of 15 dB at 5 GHz but

strong reflections while the second channel, a 10.6”-long trace

on Elma ATCA Dual Star backplane has the higher dispersion

loss with the total loss being 20 dB at 5 GHz.

The effects of a linear equalizer were emulated by convolving

the measured 10-Gb/s, PRBS time-domain waveforms

seen at the channel output with the impulse response of thelinear equalizer in MATLAB. The time-domain waveforms

were collected with a sampling oscilloscope with pattern lock

capability (Agilent DCA-J 86100C). The time and voltage

resolutions were 6.25 ps (32 points per unit interval) and 1 mV,

respectively. Since the measured waveforms include noises,

this procedure can predict the noise enhancement effects of

certain linear equalizers as well.

The signal quality seen by the RS-PRDFE receiver can be

visualized by constructing an effective composite eye diagram,

as illustrated in Fig. 17. The eye diagrams for the individual

slicers were first composed by accumulating the input traces

only when the corresponding slicer output was selected as the

received bits. Then, these individual eye diagrams were foldedinto one after adjusting their decision thresholds to net zeros.


9/12


Fig. 17. The construction of the effective eye diagram for an RS-PrDFE re-ceiver. The eye diagrams seen by the individual decision slicers (left) are foldedonto a singleeye diagram, after being adjusted for their differentslicer thresholdlevels (right).

Fig. 18. The benefits of combining transmit FIR equalizers with RS-PRDFE.(a) Equalized eye diagrams by the optimal transmit FIR filters. (b) Effective eyediagrams seen by the optimal 4-slicer RS-PRDFE receivers.

The opening in the resulting effective eye diagram indicates the

voltage and timing margins of the RS-PRDFE receiver.

Fig. 18 shows the eye diagrams after the jointly optimized

transmit equalizer and the effective eye diagrams seen by the

RS-PRDFE receiver with 4 effective taps and 4 slicers (i.e.,

and ) for the two example channels described. For both

channels, the optimal transmit equalizers suppress mainly the

far-end ISIs and widen the eye openings. The benefits of com-

bining the transmit equalizer with RS-PRDFE are also shown

in Fig. 19, which plots the simulated signal margins of the op-

timal 4-tap RS-PRDFE receiver with different number of slicers.

For both channel examples, the RS-PRDFE receiver with only

4 slicers can achieve comparable signal margins with those of a

4-tap PRDFE receiver that would require 16 slicers.

B. RS-PRDFE With Receive Linear Equalizers

LEs on the receiver side, on the other hand, are not subjectto a similar peak swing constraints to the transmit equalizers.

Fig. 19. The simulated signal margins versus the number of slicers (M) of theRS-PRDFE receivers with and without the transmit FIR equalizers. (a) 3 FR4channel. (b) 10.6 ATCA channel.

The difference is that the equalization is applied after the signal

has been attenuated by channel rather than before. However,

boosting the high-frequency content of the received signal can

enhance noise, degrading the SNR and the BER.At the moment, one of the most widely used receive equalizer

types is the continuous-time, linear equalizer (CTLE) whose

example circuit is shown in Fig. 20(a). This circuit realizes a

high-pass filter by enhancing the transconductance of the input

stage at high frequencies. However, most CTLEs used in prac-

tice are of low orders, lacking enough degrees of freedom to

shape the channel responses into the ones in Fig. 13(b) or (c). In-

stead, their high-pass characteristics are best utilized when im-

proving the channel bandwidth and thus suppressing the far-end

ISIs first. The remaining near-end ISIs are handled by the DFE.

It is also possible to implement an FIR equalizer on the re-

ceiver side, of which example is shown in Fig. 20(b) [1]. This

circuit looks similar to the transmit equalizer circuit in Fig. 14except that the input to each stage is a discrete-time sampled


10/12


Fig.20. Example implementations of receive linear equalizers.(a) Continuous-time linear equalizer (CTLE). (b) Receive FIR equalizer.

Fig. 21. The benefits of combining receive FIR equalizers with RS-PRDFE.(a) Equalized eye diagrams by the optimal receive FIR filters. (b) Effective eyediagrams seen by the optimal 4-slicer RS-PRDFE receivers.

analog input instead of a full-swing digital one. Unlike CTLEs,

this discrete-time FIR filter has the ability to individually adjust

the tap coefficients. For example, the first postcursor tap of this

circuit in Fig. 20(b) is adjusted by the current amplitude .

Since a receive FIR equalizer can adjust the individual tap

weights without being subject to a peak swing constraint, it

can explore the opportunities illustrated in Fig. 13(b) or (c).

Fig. 21 shows the eye diagrams after the jointly optimized re-

ceive FIR equalizers and their effective eye diagrams seen bythe RS-PRDFE receiver. For the FR4 channel, the optimal re-

ceive equalizer suppresses the far-end ISIs first, making the third

and fourth ISIs zero values. However, for the ATCA channel,

it is interesting to note that the FIR equalizer suppresses the

second and third postcursor ISIs, leaving the fourth ISI positive.

Nonetheless, for both cases, Fig. 22 shows that the RS-PRDFE

with can effectively cancel the remaining ISI with only

4 slicers and their achieved signal margins are superior to those

with the 4-slicer PRDFE receivers.

C. Digital Versus Analog Receive Linear Equalizers

In the discussions so far, the receive FIR equalizer showedthe highest flexibility in reducing the required slicer count in

Fig. 22. The simulated signal margins versus the number of slicers (M) of theRS-PRDFE receivers with and without the receive FIR equalizers. (a) 3 FR4channel. (b) 10.6 ATCA channel.

an RS-PRDFE receiver. However, implementing an analog FIR

filter with wide signal bandwidth may incur large power con-

sumption [1]. Therefore, one may be interested in the potential

benefits of implementing the receive FIR equalizer in digital do-

main, after the received signal is quantized by an ADC.Analysis shows that implementing the receiver FIR equalizer

in digital domain does more harm than good unless the ADC

resolution is sufficiently high. This is evidenced by the simu-

lation results shown in Fig. 23, which plot the signal margins

of the optimal RS-PRDFE receivers combined with analog

and digital receive FIR equalizers. For both cases, for slicer

counts less than 16 (i.e., less than 4 bits of ADC resolution),

an RS-PRDFE with an analog FIR equalizer outperforms one

with a digital equalizer. It is because the RS-PRDFE and

digital FIR equalizers have conflicting requirements on the

ADC threshold placements. The former needs placement for

minimum threshold error [Fig. 6(a)] while the latter needs

placement for minimum signal quantization error [Fig. 6(b)].This gap is more pronounced at lower resolution ranges.


11/12


Fig. 23. Performance comparison of the analog and digital receive FIR equal-izers when combined with RS-PrDFE. (a) 3 FR4 channel. (b) 10.6 ATCAchannel.

Therefore, a practical way of implementing a high- perfor-

mance receiver with a low-resolution ADC is to combine the

proposed RS-PRDFE with an analog-type linear equalizer.

VI. CONCLUSION

This paper introduced a way of designing high-performance

equalizing receiver with low-resolution ADCs. The quantiza-

tion thresholds of the ADC may have to be individually ad-

justed and optimized for the best signal margins rather than for

the least quantization errors. The described RS-PRDFE receiver

with only 4 slicers demonstrated the equivalent performance to a

receiver with 3–4-bit uniformly quantizing ADC. It also showed

some synergistic effects of combining RS-PRDFE with LEs, es-

pecially with the receive FIR equalizers. It was shown to be

preferable to leave the LEs in analog domain, since the DFE

and LE have conflicting requirements on the ADC quantizationthresholds.

ACKNOWLEDGMENT

The authors would like to thank Dr. Ravi Kollipara and My

Nguyen for measuring the response characteristics of various

backplane channels used in this work.

REFERENCES[1] C.-K. K. Yang and E.-H. Chen, “ADC-based serial I/O receivers,” in

Proc. Custom Integr. Circuits Conf., Sep. 2009, pp. 323–330.[2] H. Chung and G.-Y. Wei, “Design-space exploration of backplane

receivers with high-speed ADCs and digital equalization,” in Proc.Custom Integr. Circuits Conf., Sep. 2009, pp. 555–558.

[3] M. Harwood et al., “A 12.5 Gb/s SerDes in 65 nm CMOS using abaud-rate ADC with digital receiver equalization and clock recovery,”in ISSCC Dig. Tech. Papers, Feb. 2007, pp. 436–437.

[4] O. E. Agazzi et al., “A 90 nm CMOS DSP MLSD transceiver withintegrated AFE for electronic dispersion compensation of multimodeoptical fibers at 10 Gb/s,” J. Solid-State Circuits, pp. 2939–2957, Dec.2008.

[5] P. Schvan et al., “A 24 GS/s 6 b ADC in 90 nm CMOS,” in ISSCC Dig.Tech. Papers, Feb. 2008, pp. 544–545.

[6] Y. M. Greshishchev et al., “A 40 GS/s 6 b ADC in 65 nm CMOS,” in ISSCC Dig. Tech. Papers, Feb. 2010, pp. 390–391.

[7] J. Cao et al., “A 500-mW ADC-based CMOS AFE with digital calibra-tion for 10 Gb/s serial links over KR-backplane and multimode fiberd,”

J. So lid-State Circuits, pp. 1172–1185, Jun. 2010.[8] B. Murmann, “A/D converter trends: Power dissipation, scaling and

digitallyassisted architectures,” in Proc. Custom Integr. Circuits Conf.,Sep. 2008, pp. 105–112.

[9] S. Kasturia and H. J. Winters, “Techniques for high-speed implemen-tation of non-linear cancellation,” IEEE J. Sel. Areas Commun., vol. 9,no. 5, pp. 711–717, Jun. 1991.

[10] Y.-S. Sohn et al., “A 2.2-Gbps CMOS look-ahead DFE receiver formultidrop channel with pin-to-pin time skew compensation,” in Proc.Custom Integr. Circuits Conf., Sep. 2003, pp. 473–476.

[11] V. Stojanovic etal., “Adaptive equalizationand data recoveryin a dual-mode (PAM2/4) serial link transceiver,” in Proc. VLSI Circuits Symp.,Jun. 2004, pp. 348–351.

[12] E.-H. Chen et al., “Adaptation of CDR and full scale range of ADC-based Serdes receiver,” in Proc. VLSI Circuits Symp., Jun. 2009, pp.

12–13.[13] D. Oh et al., “Accurate system voltage and timing margin simulation

in high-speed I/O system designs,” IEEE Trans. Adv. Packag., vol. 31,no. 4, pp. 722–730, Nov. 2008.

[14] E.-H. Chen et al., “10 Gb/s serial I/O receiver based on variable refer-ence ADC,” in Proc. VLSI Circuits Symp., 2011, in review.

[15] J. Kim et al., “Equalizer design and performance trade-offs in ADC-based serial links,” in Proc. Custom Integr. Circuits Conf., Sep. 2010,pp. 1–8.

[16] C. Donovan and M. P. Flynn, “A digital 6-bit ADC in 0.25- mCMOS,” IEEE J. Solid-State Circuits, vol. 37, no. 3, pp. 432–437,Mar. 2002.

Jaeha Kim (S’94-M’03-SM’10) received the B.S.degree in electrical engineering from Seoul NationalUniversity (SNU), Seoul, Korea, in 1997, and

received the M.S. and Ph.D. degrees in electricalengineering from Stanford University, Stanford, CA,in 1999 and 2003, respectively.

From 2001 to 2003, he was with True Circuits,Inc., Los Altos, CA as Circuit Designer; withInter-university Semiconductor Research Center(ISRC), SNU, as Postdoctoral Researcher from2003 to 2006; with Rambus, Inc., Los Altos, CA

as Principal Engineer from 2006 to 2009; and with Stanford University, CAas Acting Assistant Professor from 2009 to 2010. He is currently AssistantProfessor at SNU and his research interests include low-power mixed-signalsystems and their design methodologies.

Prof. Kim is a recipient of the Takuo Sugano award for outstanding far-eastpaper at 2005 International Solid-State Circuits Conference (ISSCC) and theLow Power Design Contest Award at 2001 International Symposium on LowPower Electronics and Design (ISLPED). He served on the technical program

committees of Design Automation Conference (DAC), International Conferenceon Computer AidedDesign (ICCAD), and AsianSolid-State Circuit Conference(A-SSCC).


12/12


E-Hung Chen (S’05–M’06) was born in Taipei,Taiwan. He received the B.S. degree in electricalengineering from National Taiwan University andthe M.S. degree in electrical engineering from theUniversity of California, Los Angeles (UCLA),respectively. He is currently working toward thePh.D. degree at UCLA.

Since 2005, he has been with Broadcom, Rambus,

and Texas Instruments as a summer intern workingon channel equalization technique and receiver mod-eling. His research interests are high-speed serial link

design and its adaptation.

Jihong Ren (S’03–M’06) received the Ph.D. degreein computer science from the University of BritishColumbia, Vancouver, Canada, in 2006, where sheworked on optimal equalization for chip-to-chiphigh-speed buses.

She has been with Rambus, Sunnyvale, CA, sinceJanuary 2006, where she has worked on equalizationalgorithms and link performance analysis.

Brian Leibowitz (S’97-M’05) was born in NewJersey in 1976. He received the B.Sc. degree inelectrical engineering from Columbia University,New York, in 1998 and the Ph.D. degree in electricalengineering and computer science from the Uni-versity of California, Berkeley, in 2004, where hisdoctoral research included the development of a fullyintegrated CMOS imaging receiver for free-spaceoptical communication.

Since 2004 he has been with Rambus, Inc., Sunny-vale, CA, where he has worked on equalization and

mixed-signal circuit design for a variety of high-speed and low power serial

links and memory interfaces.Dr. Leibowitz received the Edwin H. Armstrong Award from Columbia Uni-versity. His graduate studies at Berkeley were supported by a fellowship fromthe Fannie and John Hertz Foundation.

Patrick Satarzadeh (S’04-M’09) received the B.S.,M.S., and Ph.D. degrees in electrical and computerengineering from the University of California, Davis,in 2004, 2006, and 2009, respectively.

He has held internships with MIT Lincoln Lab,Lexington, MA, and Rambus, Inc., Sunnyvale, CA.In 2009, he joined Texas Instruments Inc., Dallas,TX, as a Member of the Technical Staff, where he

has since worked on continuous time delta-sigmadata converters. His research interests include signalprocessing and mixed signal circuit design.

Jared Zerbe (M’90-SM’10) was born in New York City in 1965. He received theB.S. degree in electricalengineering from Stanford University, Stanford, CA,in 1987.

From 1987 to 1992, he worked in circuit design atVLSI Technology and MIPS Computer Systems. In1992 he joined Rambus Inc., Sunnyvale, CA, wherehe has since specialized in the design of high-speedI/O, PLL/DLL clock recovery, and equalizationand data-synchronization circuits. He has authoredor coauthored over 30 papers and over 50 patents

in the area of high-speed signaling and clocking and has taught courses atBerkeley and Stanford in high-speed serial link design. He is currently aTechnical Director at Rambus where he is focused on development of futurehigh-performance and low-power signaling technologies.

Chih-Kong Ken Yang (S’94-M’98-SM’07-F’10)was born in Taipei, Taiwan. He received the B.S. andM.S. degrees in electrical engineering in 1992 andthe Ph.D. degree in electrical engineering in 1998from Stanford University, Stanford, CA in electricalengineering.

He joined the University of California, Los An-geles, as an Assistant Professor in 1999 and has beena Professor since 2009. His current research area

is high-performance mixed-mode circuit design forVLSI systems such as clock generation, high-per-

formance signaling, low-power digital functional blocks, and analog-to-digitalconversion.

Equalizer Design and Performance Trade-Offs in ADC-Based Serial Links

Documents

Transcript of Equalizer Design and Performance Trade-Offs in ADC-Based Serial Links