An Adaptive-Bandwidth Referenceless CDR with Small-area...

13
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 ISSN(Print) 1598-1657 http://dx.doi.org/10.5573/JSTS.2015.15.3.404 ISSN(Online) 2233-4866 Manuscript received Mar. 5, 2015; accepted May. 19, 2015 Pohang University of Science and Technology Dept. of Electrical Engineering E-mail : [email protected] An Adaptive-Bandwidth Referenceless CDR with Small-area Coarse and Fine Frequency Detectors Hye-Jung Kwon, Ji-Hoon Lim, Byungsub Kim, Jae-Yoon Sim, and Hong-June Park Abstract—Small-area, low-power coarse and fine frequency detectors (FDs) are proposed for an adaptive bandwidth referenceless CDR with a wide range of input data rate. The coarse FD implemented with two flip-flops eliminates harmonic locking as long as the initial frequency of the CDR is lower than the target frequency. The fine FD samples the incoming input data by using half-rate four phase clocks, while the conventional rotational FD samples the full-rate clock signal by the incoming input data. The fine FD uses only a half number of flip-flops compared to the rotational FD by sharing the sampling and retiming circuitry with PLL. The proposed CDR chip in a 65-nm CMOS process satisfies the jitter tolerance specifications of both USB 3.0 and USB 3.1. The proposed CDR works in the range of input data rate; 2 Gb/s ~ 8 Gb/s at 1.2 V, 4 Gb/s ~ 11 Gb/s at 1.5 V. It consumes 26 mW at 5 Gb/s and 1.2 V, and 41 mW at 10 Gb/s and 1.5 V. The measured phase noise was -97.76 dBc/Hz at the 1 MHz frequency offset from the center frequency of 2.5 GHz. The measured rms jitter was 5.0 ps at 5 Gb/s and 4.5 ps at 10 Gb/s. Index Terms—Clock and data recovery circuit, fine frequency detection, jitter tolerance, referenceless, adaptive bandwidth I. INTRODUCTION The clock-data recovery (CDR) circuit is widely used at the receiver of high-speed serial link interfaces such as USB, PCIe, SATA, and Display port. The CDR circuit extracts the data and clock signals from the received signal. There are two kinds of CDRs; one is a reference- based CDR (Fig. 1(a)) and the other is a referenceless CDR (Fig. 1(b)). The reference-based CDR [1, 2] generates a clock signal from the reference clock source of receiver (CK REF2 of Fig. 1(a)), and adjusts the clock to locate phase at the center of the received data eye. The CDR circuit is implemented by using a dual-loop architecture, which consists of a frequency-locked loop (FLL) and a phase-locked loop (PLL). Initially only the (a) (b) Fig. 1. Serial link transceiver (a) w/ reference-based CDR, (b) w/ referenceless CDR.

Transcript of An Adaptive-Bandwidth Referenceless CDR with Small-area...

Page 1: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 ISSN(Print) 1598-1657 http://dx.doi.org/10.5573/JSTS.2015.15.3.404 ISSN(Online) 2233-4866

Manuscript received Mar. 5, 2015; accepted May. 19, 2015 Pohang University of Science and Technology Dept. of Electrical Engineering E-mail : [email protected]

An Adaptive-Bandwidth Referenceless CDR with Small-area Coarse and Fine Frequency Detectors

Hye-Jung Kwon, Ji-Hoon Lim, Byungsub Kim, Jae-Yoon Sim, and Hong-June Park

Abstract—Small-area, low-power coarse and fine frequency detectors (FDs) are proposed for an adaptive bandwidth referenceless CDR with a wide range of input data rate. The coarse FD implemented with two flip-flops eliminates harmonic locking as long as the initial frequency of the CDR is lower than the target frequency. The fine FD samples the incoming input data by using half-rate four phase clocks, while the conventional rotational FD samples the full-rate clock signal by the incoming input data. The fine FD uses only a half number of flip-flops compared to the rotational FD by sharing the sampling and retiming circuitry with PLL. The proposed CDR chip in a 65-nm CMOS process satisfies the jitter tolerance specifications of both USB 3.0 and USB 3.1. The proposed CDR works in the range of input data rate; 2 Gb/s ~ 8 Gb/s at 1.2 V, 4 Gb/s ~ 11 Gb/s at 1.5 V. It consumes 26 mW at 5 Gb/s and 1.2 V, and 41 mW at 10 Gb/s and 1.5 V. The measured phase noise was -97.76 dBc/Hz at the 1 MHz frequency offset from the center frequency of 2.5 GHz. The measured rms jitter was 5.0 ps at 5 Gb/s and 4.5 ps at 10 Gb/s. Index Terms—Clock and data recovery circuit, fine frequency detection, jitter tolerance, referenceless, adaptive bandwidth

I. INTRODUCTION

The clock-data recovery (CDR) circuit is widely used at the receiver of high-speed serial link interfaces such as USB, PCIe, SATA, and Display port. The CDR circuit extracts the data and clock signals from the received signal. There are two kinds of CDRs; one is a reference-based CDR (Fig. 1(a)) and the other is a referenceless CDR (Fig. 1(b)). The reference-based CDR [1, 2] generates a clock signal from the reference clock source of receiver (CKREF2 of Fig. 1(a)), and adjusts the clock to locate phase at the center of the received data eye. The CDR circuit is implemented by using a dual-loop architecture, which consists of a frequency-locked loop (FLL) and a phase-locked loop (PLL). Initially only the

(a)

(b)

Fig. 1. Serial link transceiver (a) w/ reference-based CDR, (b) w/ referenceless CDR.

Page 2: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 405

FLL is enabled. It adjusts the frequency of the extracted clock to within the PLL pull-in range around the target frequency. After the FLL is locked, the PLL is enabled.

With the reference-based CDR circuit, the usable data rate is limited to one or a few discrete values. If the usable data rate of CDR can change over a continuous range, a single CDR can be used for different applications. It will reduce the design cost. The referenceless CDR satisfies this requirement. The referenceless CDR [3-16] extracts the clock signal from the received data signal alone without using any reference clock sources (Fig. 1(b)). It can be used for many different applications with a wide range of input data rate.

The most significant challenge of the referenceless CDR is the harmonic locking problem, in which the frequency of the clock signal extracted by the CDR is a sub-harmonic value of the target frequency. One solution to this problem is to limit the output frequency of the voltage-controlled oscillator (VCO) in the referenceless CDR to within ± 50 % of the target frequency [3-7]. However, this restriction limits the usable range of input data rate to a narrow range. The second solution is to find out the harmonic locking by checking the maximum run-length of the CDR output data for the case in which the maximum run-length of the CDR input data is fixed to a constant value [8, 9]. This method is limited to a specific encoding scheme, such as the 27-1 PRBS data for [8] and the 8B10B-encoded data for [9]. The third solution recovers the clock signal by using the randomness of input data [10-13]. The input data stream is divided by more than 1000 and the resultant output is applied to a frequency multiplier to recover the clock signal. This method works only for random data that have a transition density close to 0.5 to get a small frequency offset of FLL, such that the output frequency of the locked FLL is located within the PLL pull-in range (0.2 % of target frequency). To achieve a recovered clock with a reasonable jitter, this solution requires an excessively narrow bandwidth for the FLL used for the multiplication. The fourth solution [14] uses an extra delay-locked loop (DLL) for a wide-range referenceless CDR.

In this paper, a small-area FLL for the referenceless CDR is proposed to achieve a wide usable range of input data rate, and a small frequency offset of the locked FLL (<0.2 % of target frequency) without using an extra DLL

and no limits on the maximum run-length and transition density of input data.

In a CDR circuit, the PLL bandwidth must be reduced as much as possible to minimize the jitter of the recovered clock and data, because the input data of the CDR usually has a large jitter. However, the jitter tolerance is reduced as the PLL bandwidth is reduced. In the referenceless CDR with a wide range of input data rate, the PLL bandwidth is usually fixed to ~1/1000 of the minimum input data rate to minimize the jitter of the recovered data and clock for the entire range of input data rate. However, this method degrades the jitter tolerance significantly at the maximum input data rate [15, 16]. In this work, an adaptive-bandwidth tracking scheme is used such that the PLL bandwidth is proportional to the input data rate. Also, a digital loop filter is used to compensate for PVT variations.

The proportionality constant of the PLL bandwidth to the input data rate is fixed to ~1/1000 for the all the range of input data rate. The proposed adaptive-bandwidth scheme provides both a large jitter tolerance and a small jitter of the recovered clock and data for the entire range of input data rate.

Section II presents the architecture of the proposed referenceless CDR, the FLL and the adaptive-bandwidth tracking scheme of PLL. Section III explains the circuit implementations. Section IV shows the measurement results. Section V concludes this work.

II. ARCHITECTURE

The proposed CDR (Fig. 2) consists of a dual-loop architecture of a PLL and an FLL. It accepts an input data DIN and generates a four-phase recovered clock signal CLKOUT[0:3] and a recovered data DOUT[0:1]. The PLL is implemented by using digital circuits to maintain an accurate ratio between the PLL bandwidth and the

Fig. 2. Proposed CDR.

Page 3: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH REFERENCELESS CDR WITH SMALL-AREA COARSE AND FINE …

input data rate and also to eliminate a large capacitor from the loop filter. The FLL is also implemented by using digital circuits to eliminate a huge capacitor from the loop filter, which is required to get an extremely low FLL bandwidth (a few tens of kilohertz). Initially after the power-on reset, the FLL is enabled and adjusts the frequency of CLKOUT[0:3] within the range of ±0.2 % of the target frequency, which is one half of the input data rate. A 2x oversampling is used to extract the recovered data at the Alexander PD of the PLL. After the FLL is locked, both the PLL and the FLL are enabled. To avoid the interaction between the two loops, the FLL bandwidth is set to a constant value, which is < 0.01 times the PLL bandwidth.

The proposed FLL has a wide range of frequency acquisition. This helps to maximize the usable range of input data rate of the CDR. The frequency acquisition range refers to the range of initial ICO (current controlled oscillator) frequency over which frequency locking can be achieved. The PLL bandwidth is adjusted proportionately to the input data rate such that the ratio of the PLL bandwidth to the input data rate is maintained constant at ~ 1/1000 for the entire range of usable input data rate. This scheme improves the jitter tolerance at frequencies near the PLL bandwidth.

1. Frequency- locked Loop (FLL)

The FLL sets the ICO frequency within the range of

±0.2% from the target frequency. Rotational frequency detectors (RFDs)[17, 18] are widely used in the conventional FLL circuits, because RFDs lock for any input data patterns that include ‘010’ or ‘101’. The input data for RFD are not restricted in the maximum run-length or the data transition density. RFD is usually used in a single-loop FLL. However, in the CDR with a single-loop FLL using RFD, the frequency acquisition range is limited to ±50 % of the target frequency.

A dual-loop FLL that consists of a coarse and a fine frequency loops was proposed in [14]. The coarse frequency loop includes an additional DLL to increase the frequency acquisition range of CDR without imposing any limits on the maximum run-length or the data transition density of input data. The DLL eliminates the upper limit of the CDR frequency acquisition range, as long as the VCDL delay can be smaller than one

period of the output clock at the target frequency. However, the additional DLL significantly increases the chip area.

In this work, a dual-loop FLL is proposed to achieve a wide frequency acquisition range of CDR with relatively small-area frequency detectors (Fig. 3). This work eliminates the lower limit of the CDR frequency acquisition range by using only a simple coarse frequency detector (FD) with two flip-flops. Because the FLL shares the ICO with the PLL, no additional DLL is needed. The coarse frequency loop of this work works as follows. After the power-on reset, the initial frequency of the ICO is set to the minimum value of its oscillation range. This value is guaranteed to be lower than the target frequency. The coarse frequency loop increases the ICO frequency in uniform steps until it exceeds the target frequency. At this point the coarse frequency loop is declared to be locked. This operation of the coarse frequency loop eliminates the lower limit of the CDR frequency acquisition range. This also guarantees that the proposed CDR can lock at any target frequency that is within the ICO’s oscillation range.

After the power-on reset, only the coarse frequency loop is enabled. It sets the ICO frequency to a value from ± 2% from the target frequency. After the coarse frequency loop is locked, the coarse frequency loop is disabled and the fine frequency loop is enabled. The fine frequency loop sets the ICO frequency within ± 0.2 % of the target frequency. This range is mostly smaller than the PLL pull-in range. The fine FD works similarly to the conventional RFD. The fine FD consists of a sampling circuit followed by a transition detector. The sampling and retiming circuit is shared with the Alexander PD of the PLL. This sharing greatly reduces the chip area of the fine FD.

Fig. 3. FLL of proposed CDR.

Page 4: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 407

2. Adaptive-bandwidth tracking The ratio of the PLL bandwidth to the input data rate is

fixed to a constant value in the proposed referenceless CDR to maintain a good jitter tolerance performance for a wide range of input data rate. This scheme, which is called the adaptive-bandwidth tracking scheme, is applied to the referenceless CDR for the first time in this work. By using a simplified s-domain model of the PLL (Fig. 4), the PLL bandwidth BWPLL can be derived as follows if the proportional-path DAC (DACP) gain KP is much larger than the integral-path DAC gain KI [19, 20]:

PLL PD P ICO P ICOBW K K K I K» × × µ × (1)

where KPD is the gain of the Alexander PD, KICO is the ICO gain, and IP is the output current of DACP (Fig. 2). KPD and KICO are kept constant independently of the input data rate. By setting KP to be proportional to the input data rate, the ratio of BWPLL to the input data rate is maintained constant at ~1/1000. This achieves the adaptive-bandwidth tracking. IP is proportional to KP.

After the frequency lock is achieved, both the phase loop and the fine frequency loop work simultaneously. To maintain the loop stability, the bandwidth BWFLL of the fine frequency loop is set to a constant value which is < 0.01BWPLL for the entire range of input data rate. BWFLL is determined as

FLL Fine_FD F ICOBW K K K» × × (2)

where KFINE_FD is the gain of the fine FD, and KF is the gain of the frequency loop DAC (DACF).

III. CIRCUIT IMPLEMENTATION

A PLL and an FLL are combined to implement the

proposed referenceless CDR (Fig. 2). The PLL consists of an Alexander PD, a digital loop filter, a DSM (delta sigma modulator), two DACs, and an ICO. The Alexander PD converts the phase difference between the input data and the ICO output clock into two digital codes; ‘E’ and ‘L’, which represent three cases of ‘early’ (E = 1, L = 0), ‘late’ (E = 0, L = 1), and ‘no action’ (E = 0, L = 0). The two output codes of the Alexander PD are sent to DACP of the proportional path and the accumulator of the integral path of the PLL. The Alexander PD is implemented by using sense-amp flip-flops to minimize the static phase offset between the input data and the ICO output clock. DACP converts the two output codes of the Alexander PD into three current levels: 0, IP0, and 2IP0, where IP0 corresponds to the proportional path gain KP. An 18-bit accumulator, a DSM, and a 7-bit DAC are used in the integral path of the PLL. The DSM enables use of a low-resolution DAC. The four least significant bits (LSBs) of the accumulator are discarded to reduce the dithering jitter of the ICO output clock. The FLL is a first-order loop, which includes a DSM to reduce the DAC size; A 7-bit R-2R DAC is followed by a RC low pass filter with a bandwidth of around 1 MHz. The four LSBs of the 18-bit accumulator output are discarded to eliminate the steady state dithering jitter. The MUX shifter block reduces the fine frequency loop gain to 1/128 of the coarse frequency loop gain. This enables a relatively fast lock time for the coarse frequency loop and a fine frequency resolution for the fine frequency loop. The FD gain is the same for both fine and coarse frequency loops. For both FLL and PLL, the digital loop filter is used to avoid the huge capacitors [21, 22]. The circuit operations of the coarse FD, the fine FD, and the ICO are explained in the following paragraphs.

1. Coarse Frequency Detector The coarse FD of this work is similar to that of [23]; a

two-phase version is used in [23] while a four-phase version is used in this work.

The coarse FD of this work identifies whether the ICO frequency is higher than half the input data rate by counting the maximum number of rising transitions of input data during one period of the ICO output clock. For this, the coarse FD is implemented by using a series

Fig. 4. s-domain approximation of PLL.

Page 5: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

408 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH REFERENCELESS CDR WITH SMALL-AREA COARSE AND FINE …

connection of two flip-flops (Fig. 5(a)). The ICO generates four-phase clocks (CLKOUT[0:3]),

of which the target frequency is set to half the input data rate for 2x-oversampling. Initially after the power-on reset, the ICO output frequency is set to the minimum frequency available from ICO, so the maximum number of data rising transitions during one clock period is larger than 2 because the ICO minimum frequency is designed to be lower than the target frequency. Therefore, the coarse FD initially sets the FC_UP signal to ‘1’ (Fig. 5(b)). Thus, the ICO output frequency increases continuously with time. When the ICO output frequency exceeds the target frequency, the maximum number of rising transitions of input data during one period of the ICO output clock is one and the FC_UP signal is set to ‘0’ (Fig. 5(c)). If the FC_UP signal remains at ‘0’ during 1024 consecutive rising transitions of data, FLL declares the coarse lock and the ICO output frequency is located within the range from 0 to +2% from the target frequency. This satisfies the requirement of the following fine frequency loop; the initial ICO output frequency must be

located within the range from 50% to 150% of the target frequency for the fine frequency loop to be locked. The maximum data transitions during one clock period occur for the input data pattern of ‘0101’. The proposed coarse frequency loop always locks to the target frequency without harmonic locking, as long as the minimum frequency of the ICO output is lower than the target frequency.

In Fig. 5(a), the two flip-flops are clocked by the input data (DIN). The first flip-flop is reset while the divided clock (CLKOUT[0]/2) is high. The second flip-flop is reset at the power-on reset. CLKOUT[0]/2 is generated by dividing one of the four-phase ICO output clocks(CLK[0]). By using four of the above-mentioned flip-flop circuit in parallel (Fig. 5(a)), the lock time of the coarse frequency loop is reduced from 5 μs to 1.5 μs.

Instead of counting the rising edges of input data during one clock period of CLK[0], we can count them during either a half clock period or two clock periods (Table 1). When the half-clock-period counting is used, the lock time is faster than the one-clock-period counting but the frequency offset from the target frequency occurs depending on the clock duty cycle and the inter symbol interference (ISI) on the incoming data signal. With the two-clock-period counting, the lock time is slower and the circuit complexity increases compared to the one-clock-period counting. Therefore, the one-clock-period counting was chosen in this work.

2. Fine Frequency Detectors

The coarse FD of this work is similar to that of [23]; a

two-phase version is used in [23] while a four-phase version is used in this work.

After the coarse frequency loop is locked, the coarse

(a)

(b)

(c)

Fig. 5. Coarse FD (a) circuit, (b) operation (ICO freq. < target freq.), (c) operation (ICO freq. > target freq.).

Table 1. Comparison of coarse FDs with different time intervals for counting (5 Gb/s PRBS-7 input data)

Time interval for counting

(clock period) 1/2 1(this work) 2

Lock frequency offset 3~4% 1~2% 0.5~1% Lock time

(coarse lock) 0.6 μs 1.5 μs 41 μs

Circuit complexity 8 FF+1 OR 12 FF + 1 OR 16 FF + 1 OR Power 0.84 mW 1.17 mW 1.48 mW

Sensitivity to duty cycle of input data Sensitive Insensitive Insensitive

Page 6: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 409

frequency loop is disabled and the fine frequency loop is enabled. As long as the initial frequency of the ICO output clock is located within ± 50 % of the target frequency, the fine frequency loop is required to lock such that the final frequency of the ICO output clock is located within ± 0.2 % of the target frequency. The proposed fine FD generates the FF_UP and FF_DN signals by comparing the input data signal (DIN) with the four-phase ICO clock signals (CLKOUT[0:3]). The fine FD works for the input data patterns of consecutive 4-bit ‘0101’ only. For any other input data patterns, both the FF_UP and the FF_DN signals of the fine FD are set to ‘0’. The fine frequency loop is declared to be locked if both the FF_UP and FF_DN signals remain at ‘0’ during the consecutive time interval of 1024 UI (unit interval of data period) (Table 2).

The proposed fine FD consists of a sampling and retiming block and a transition detector (Fig. 6(a)). The operation of the proposed fine FD is basically the same as the conventional RFD, except that the proposed fine FD samples DIN at the rising edges of the half-rate ICO clock(CLKOUT[0:3]), whereas the conventional RFD samples the full-rate ICO clock at the rising and falling edges of DIN. 32 flip-flops are used for the conventional RFD. Although the same number of flip-flops are used for the fine FD of this work (Fig. 6), 16 flip-flops for the sampling and retiming block are shared with the Alexander PD of the PLL. Therefore, only 16 flip-flops are added for the fine FD of this work, which takes only one-half area compared to the conventional RFD. Because the half-rate clock is used for sampling in the proposed fine FD, a significant power reduction is achieved in clock drivers and flip-flops compared to the

conventional RFD. Also, no full-swing DIN is required and the fine FD operation is insensitive to the duty cycle change of DIN due to ISI, because DIN is sampled by the ICO clock in this work. When the input data rate is faster than the clock frequency (Fig. 6(b)), the data transition interval changes from A to D between two consecutive rising edges of CLKOUT[0]. A, B, C and D represent the sequence of unit data intervals (UI) synchronized to the ICO output clocks (CLK[0:3]). In this case, the FF_UP signal is set to ‘1’ and this increases the ICO frequency. When the input data rate is lower than the clock

Table 2. Comparison of fine FDs (5 Gb/s PRBS-7 input data)

Fine FDs Conventional RFD

Proposed fine FD

(not shared)

Proposed fine FD (shared w/

Alexander PD) Clock rate Full-rate Half-rate Half-rate Clock time

interval between adjacent F/F

0.25 UI 0.5 UI 0.5 UI

Circuit complexity

12 FF + 6 Gates

24 FF + 14 Gates

8 FF + 14 Gates(16 FF: shared with Alexander PD)

Power 2.41 mW 1.74 mW 0.28 mW Sensitivity to duty cycle of

input data Sensitive Insensitive Insensitive

(a)

(b)

(c)

Fig. 6. Fine FD with DIN of ‘0101’ (a) circuit, (b) operation (ICO freq. < target freq.), (c) operation (ICO freq. > target freq.).

Page 7: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

410 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH REFERENCELESS CDR WITH SMALL-AREA COARSE AND FINE …

frequency (Fig. 6(c)), the data transition interval changes from D to A and the FF_DN signal is set to ‘1’. The input data pattern is assumed to be ‘0101’ in these two cases. Between two consecutive rising edges of CLKOUT[0], the data transition interval can move only to the adjacent one. This is because the initial ICO frequency at the start of the fine frequency loop operation is within ± 50% of the target frequency and the ICO frequency is always adjusted to approach to the target frequency during the fine frequency loop operation.

3. Adaptive BW Tracking ICO

The conventional adaptive-bandwidth CDR is divided into an analog-type and a digital-type. The analog type uses parallel charge-pump circuits [22] that are turned on or off by a thermometer code. The digital type uses parallel switches at the VDD side of DCO [24] that are turned on or off by a thermometer code. Both [22] and [24] are the reference- based CDRs. The adaptive bandwidth scheme was not published for the referenceless CDRs. In this work, the adaptive bandwidth tracking scheme was applied to a referenceless CDR for the first time, by fixing the ratio of the PLL bandwidth (BWPLL) to the input data rate to a constant value. This gives both a good jitter tolerance and a small jitter in the recovered clock throughout a wide range of input data rate.

The ICO is implemented by a 2-stage pseudo-differential inverter-type ring oscillator (Fig. 7) with three separate current sources (IF, IP, and II); IF is the DACF output of the frequency loop, IP and II are the outputs of DACP and DACI associated with the proportional and integral paths of the phase loop (Fig. 2). The ICO output frequency fCLKOUT can be derived as

CLKOUT F I P ICOf (I I I ) K» + + × (3) After the fine frequency lock is achieved, fCLKOUT is

located in the range of ± 0.2% from the target frequency. Thus, IF is proportional to fCLKOUT since it is dominant over IP and II. Because fCLKOUT is the same as half the input data rate during the locked state, IF is proportional to the input data rate. BWPLL is proportional to IP as in (1). In this work, the adaptive bandwidth tracking is achieved by setting IP to be proportional to IF as shown in Fig. 8. IF is generated by a 7-bit DAC (DACF) with the digital input code of DF (Fig. 8(a)). II is generated similarly by DACI with the digital input code of DI. IP is generated by sharing the analog voltage VF’ from the IF generation circuit (Fig. 8(b)). A 2-bit digital code (E, L) is used to generate IP. IP = 0, IP0 and 2IP0 when (E, L) = (‘1’, ‘0’), (‘0’, ‘0’) and (‘0’, ‘1’), respectively. In this work, IP0 = IF/80. The DAC gains (KP, KI, KF of Fig. 2) includes the gains of the current mirror circuits of Fig. 8. Compared to the analog type CDR [22], the proposed adaptive-bandwidth CDR has a smaller phase noise and a smaller area. Compared to the conventional digital type CDR [24], it has a better immunity to VDD noise, a smaller area, and a better linearity.

Fig. 7. Current controlled oscillator (ICO).

(a)

(b)

Fig. 8. Generation of ICO current (a) IF, (b) IP.

Page 8: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 411

IV. MEASUREMENT RESULTS

The proposed adaptive-bandwidth referenceless CDR was fabricated in a 65-nm standard CMOS process (Fig. 9) on a QFN 80-pin package. The chip area is 0.17 mm2 excluding the input and output buffers.

The recovered half-rate data and clock (Fig. 10) were measured at a supply voltage of 1.2 V for data rates from 2 Gb/s to 8.4 Gb/s. The CDR chip works at data rates up to 11.2 Gb/s at a supply voltage of 1.5 V. It consumes 26 mW at 5 Gb/s and 1.2 V.

The jitter of the recovered clock was affected by

voltage supply and data rate (Fig. 11). The rms and peak-to-peak jitters of the recovered clock were 5.0 ps and 41.1ps (Fig. 11(a) at 5 Gb/s and 1.2 V for a 27-1 PRBS data. The rms jitter was reduced as the data rate was increased (Fig. 11(b), where 27-1 PRBS data were used. This is because the update period of the phase detector output is reduced as the data rate is increased. The rms

(a)

(b)

Fig. 9. (a) Layout, (b) chip photograph.

2Gbps(1.2V) 5Gbps(1.2V)

8Gbps(1.2V) 10Gbps(1.5V)

Fig. 10. Recovered clock and data (measurements).

(a)

(b)

(c)

Fig. 11. Measured jitter of recovered clock (a) jitter histogram,(b) rms. jitter versus data-rate, (c) rms. jitter versus maximum run-length of data.

Page 9: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

412 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH REFERENCELESS CDR WITH SMALL-AREA COARSE AND FINE …

jitter was increased as the maximum run-length of data was increased (Fig. 11(c), where the maximum run-length is N for the 2N-1 PRBS data.

Frequency offset of the recovered clock (Fig. 12) from the target frequency (half the data rate) was measured not to exceed 1000 ppm after the FLL was locked. The offset is smaller than the design target of ± 2000 ppm. In this measurement, 27-1 PRBS data were used. The supply voltage was 1.5V.

The frequency spectrum of the recovered clock was

measured for 5Gb/s 27-1 PRBS data at 1.2V (Fig. 13). The phase noise at 1-MHz offset was -97.76 dBc/Hz, and the integrating phase noise was 4.21 ps (Fig. 13(a)). The reference spur was -32.4 dBc (Fig. 13(b)).

The measured jitter tolerance curve satisfies the USB 3.0 spec at 5 Gb/s and the USB 3.1 spec at 10 Gb/s (Fig. 14) The measured corner frequency was 6 MHz at 5 Gb/s, 9 MHz at 8 Gb/s and 16 MHz at 10 Gb/s; It is almost proportional to the data rate, and is approximately the same as the PLL bandwidth; this verifies the adaptive bandwidth tracking operation.

The proposed CDR was compared with the recently-published referenceless CDRs (Table 3). This work shows the excellent FOM of 4.1 mW/Gb/s.

V. CONCLUSIONS

A low-power small-area FD is proposed for an adaptive-bandwidth referenceless CDR. The FD consists

Fig. 12. Measured frequency offset of recovered clock.

(a)

(b)

Fig. 13. Measured frequency spectrum of recovered clock at 5 Gb/s (a) phase noise, (b) reference spur.

16M(0.4UI)

USB3.1 spec

7.M(0.17UI)

10Gb/s

Fig. 14. Measured jitter tolerance.

Page 10: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 413

of a coarse FD and a fine FD. The coarse FD eliminates the harmonic locking problem by extending the frequency lock range from 0% to 100% of the target frequency. The coarse FD is implemented with two flip-flops. The fine FD saves power and area by sharing the sampler and re-timer circuitry with an Alexander PD in the PLL. The coarse and fine FDs adjust the ICO frequency only when the data pattern ‘0101’ is detected in the incoming data. It does not depend on the maximum run-length or the data transition density for frequency locking. Because the coarse and fine FDs respond only to the rising edges of incoming data, the proposed algorithm is insensitive to the duty cycle of the incoming data. To maintain a good jitter tolerance for a wide range of input data rate, the adaptive-bandwidth scheme is used to maintain a constant ratio of the PLL bandwidth to the input data rate. To achieve this, the proportional path gain of PLL is maintained to be proportional to the input data rate by setting the DAC gain of the proportional path to be proportional to the DAC input code of the FLL. The proposed adaptive-bandwidth referenceless CDR was implemented in a 65-nm standard CMOS process. The CDR worked for the input data rates from 2 Gb/s to 8 Gb/s at a supply voltage of 1.2 V, and from 4 Gb/s to 11 Gb/s at a supply voltage of 1.5 V. The power consumption of the proposed CDR was 26 mW. The rms jitter of the recovered clock was 5.0 ps at 5 Gb/s and 1.2 V. The phase noise of the recovered clock was -97.76 dBc/Hz at the 1-MHz frequency offset from the center frequency of 2.5 GHz (5 Gb/s). Because of the adaptive bandwidth scheme used in this work, the proposed CDR satisfies the jitter tolerance specifications of USB 3.0 and USB 3.1 at 5 Gb/s and 10 Gb/s, respectively.

ACKNOWLEDGMENTS

This work was supported by the National Research Foundation of the MSIP Korea under the contract numbers of 2014-048650 and 2014-052875, and the ITRC support program (NIPA-2014-H0301-14-1007) supervised by the NIPA Korea, and IDEC.

REFERENCES

[1] Pavan Kumar Hanumolu, Gu-Yeon Wei, and Un-Ku Moon, “A Wide-Tracking Range Clock and Data Recovery Circuit,” IEEE J. Solid-State Circuits, vol. 43, no. 2, pp. 268-278, Feb. 2008.

[2] Arnoud P. van der Wel, and Gerrit W. den Besten, “A 1.2–6 Gb/s, 4.2 pJ/Bit Clock & Data Recovery Circuit With High Jitter Tolerance in 0.14 m CMOS,” IEEE J. Solid-State Circuits, vol. 47, no. 7, pp.1768-1775, Jul. 2012.

[3] Fan-Ta Chen, Min-Sheng Kao, Yu-Hao Hsu, Chih-Hsing Lin, Jen-Ming Wu, Ching-Te Chiu, Shuo-Hung Hsu, “A 10 to 11.5GHz Rotational Phase and Frequency Detector for Clock Recovery Circuit,” Circuits and Systems (ISCAS), 2011 IEEE International Symposium on, pp.185–188, May 2011.

[4] Jri Lee and Ke-Chung Wu, “A 20-Gb/s Full-Rate Linear Clock and Data Recovery Circuit With Automatic Frequency Acquisition,” IEEE J. Solid-State Circuits, vol. 44, no. 12, pp. 3590-3602, Dec. 2009.

[5] Namik Kocaman, Siavash Fallahi, Mahyar Kargar, Mehdi Khanpour, Ali Nazemi, Ullas Singh, and Afshin Momtaz, “An 8.5–11.5-Gbps SONET Transceiver With Referenceless Frequency

Table 3. Performance comparison of referenceless CDRs CDR ISSCC09[14] JSSC13[22] JSSC11[10] This work

Technology 65 nm 0.18 μm 0.13 μm 65 nm Data rate [Gb/s] 0.65-8 4.6-5.3/9.2-10.6 0.5-2.5 2-11

FLL lock range (UI) [1,∞] [0.85, 1.15] [0, ∞] [0,1.5] Architecture Quarter-rate analog DLL Full-rate analog PLL Half-rate digital PLL Half-rate digital PLL Supply [V] 1.2 1.8 0.8/1.2 1.2 1.5

Jitter [psrms/pspp @ Gb/s] 9.7/53.3 1.04/7.5 5.4/44 5.0/41 @ 5 4.2/31 @ 10 Power

[mW @ Gb/s] 20.6 @ 0.65 88.6 @ 8 110.6 @ 10 6.1 @ 2 26 @ 5

FOM [mW/Gb/s] 31.7 11.1 11 3.05 5.2 41 @ 10

Page 11: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

414 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH REFERENCELESS CDR WITH SMALL-AREA COARSE AND FINE …

Acquisition,” IEEE J. Solid-State Circuits, vol. 48, no. 8, pp. 1875-1884, Aug. 2013.

[6] Junyoung Song, Inhwa Jung, Minyoung Song, Young-Ho Kwak, Sewook Hwang, and Chulwoo Kim, “A 1.62 Gb/s–2.7 Gb/s Referenceless Transceiver for DisplayPort v1.1a With Weighted Phase and Frequency Detection”, IEEE Trans. on Ciruits and System I: Regular papers. vol. 60, no. 2, pp. 268-278, Feb. 2013.

[7] R.-J. Yang, S.-P. Chen, and S.-I. Liu, “A 3.125-Gb/s Clock and Data Recovery Circuit for the 10-Gbase-LX4 Ethernet,” IEEE J. Solid-State Circuits, vol. 39, no. 8, pp. 1356–1360, Aug. 2004.

[8] Rong-Jyi Yang, Student Member, IEEE, Kuan-Hua Chao, Sy-Chyuan Hwu, Chuan-Kang Liang, and Shen-Iuan Liu, Senior Member, IEEE, “A 155.52 Mbps–3.125 Gbps Continuous-Rate Clock and Data Recovery Circuit,” IEEE J. Solid-State Circuits, vol. 41, no. 6, pp. 1380-1390, Jun. 2006.

[9] M.-S. Hwang, S.-Y. Lee, J.-K. Kim, S. Kim, and D.-K. Jeong, “A 180-Mb/s to 3.2-Gb/s, continuous-rate, fast-locking CDR without using external reference clock,” in Proc. IEEE Asian Solid-State Circuits Conf., pp. 144–147, Nov. 2007.

[10] Rajesh Inti, Wenjing Yin, Amr Elshazly, Naga Sasidhar, and Pavan Kumar Hanumolu, “A 0.5-to-2.5 Gb/s Reference-Less Half-Rate Digital CDR With Unlimited Frequency Acquisition Range and Improved Input Duty-Cycle Error Tolerance,” IEEE J. Solid-State Circuits, vol. 46, no. 12, pp. 3150-3162, Dec. 2011.

[11] Jinho Han, Jaehyeok Yang, and Hyeon-Min Bae, Member, IEEE, “Analysis of a Frequency Acquisition Technique With a Stochastic Reference Clock Generator,” IEEE Transactions on Circuits ans Systems—II: Express briefs, vol. 59, no. 6, pp. 336-340, Jun. 2012.

[12] Jinho Han, Hyosup Won, and Hyeon-Min Bae, “0.6–2.7-Gb/s Referenceless Parallel CDR With a Stochastic Dispersion-Tolerant Frequency Acquisition Technique,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 6, Jun. 2014.

[13] Guanghua Shu, Woo-Seok Choi, Saurabh Saxena, Tejasvi Anand, Amr Elshazly, Pavan Kumar Hanumolu, “A 4-to-10.5Gb/s 2.2mW/Gb/s Continuous-Rate Digital CDR with Automatic

Frequency Acquisition in 65nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, pp. 150-152, Feb. 2014.

[14] S.-K. Lee, Y.-S. Kim, H. Ha, Y. Seo, H.-J. Park, and J.-Y. Sim, “A 650Mb/s-to-8Gb/s referenceless CDR circuit with automatic acquisition of data rate,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 184–185, Feb. 2009.

[15] Shao-Hung Lin and Shen-Iuan Liu, “Full-Rate Bang-Bang Phase/Frequency Detectors for Unilateral Continuous-Rate CDRs,” IEEE Transactions on Circuits ans Systems—II: Express briefs, vol. 55, no. 12, Dec. 2008.

[16] Chang-Lin Hsieh and Shen-Iuan Liu, Fellow, IEEE, “A 1–16-Gb/s Wide-Range Clock/Data Recovery Circuit With a Bidirectional Frequency Detector,” IEEE Transactions on Circuits ans Systems—II: Express briefs, vol. 58, no. 8, Aug. 2011.

[17] David G. Messerschmitt. “Frequency Detectors for PLL Acquisition in Timing and Carrier R.ecovery,” IEEE Transactions on Communications, , vol. COM-27, no. 9, Sep. 1979.

[18] Razavi. B, “Frequency Detectors for PLL Acquisition in Timing and Carrier Recovery,” Monolithic Phase-Locked Loops and Clock Recovery Circuits:Theory and Design 1996, pp.107-114.

[19] Amr Elshazly, Rajesh Inti, Wenjing Yin, Brian Young, and Pavan Kumar Hanumolu, “A 0.4-to-3 GHz Digital PLL With PVT Insensitive Supply Noise Cancellation Using Deterministic Background Calibration,” IEEE J. Solid-State Circuits, vol. 46, no. 12, Dec. 2011.

[20] Mrunmay Talegaonkar, Rajesh Inti, and Pavan Kumar Hanumolu, “Digital Clock and Data Recovery Circuit Design: Challenges and Tradeoffs,” Custom Integrated Circuits Conference (CICC), 2011 IEEE, 2011, pp. 1–8.

[21] Pyung-Su Han, Woo-Young Choi, “ 1 Gb/s gated-oscillator burst mode CDR for half-rate clock recovery,” IEEE J. Semiconductor Technology and Science, vol. 4, no. 4, Dec. 2004.

[22] Hyung-Joon Jeon, Raghavendra Kulkarni, Yung-Chung Lo, Jusung Kim, and Jose Silva-Martinez, “A Bang-Bang Clock and Data Recovery Using Mixed Mode Adaptive Loop Gain Strategy,” IEEE J. Solid-State Circuits, vol. 48, no. 6, Jun. 2013.

[23] D. Dalton, K. Chai, E. Evans, M. Ferriss, D.

Page 12: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.3, JUNE, 2015 415

Hitchcox, P. Murray, S. Selvanayagam, P. Shepherd, L. Devito, and S. Member, “A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate read back,” Solid-State Circuits, IEEE J., vol. 40, no. 12, pp. 2713–2725, Dec. 2005.

[24] Heesoo Song, Deok-Soo Kim, Do-Hwan Oh, Suhwan Kim, and Deog-Kyoon Jeong, “A 1.0–4.0-Gb/s All-Digital CDR With 1.0-ps Period Resolution DCO and Adaptive Proportional Gain Control,” IEEE J. Solid-State Circuits, vol. 46, no. 2, Feb. 2011.

Hye-Jung Kwon was born in Pohang, Korea, on 1985. She received the B.S. (2007), M.S. and Ph.D. (2014) degree from the Department of Electronic and Electrical Engineering, Pohang University of Science and Technology (POSTECH), Gyeongbuk, Korea, in

2007, where she is currently working toward the Ph.D. degree in electronic engineering. She is currently working at Samsung Electronics, Korea. Her research interests include PLL/CDR circuits, PLL/CDR behavioral simulator, on-chip PVT variations monitoring.

Ji-Hoon Lim was born in Seoul, Korea, on 1989. He received the B.S. degree in the Department of Elec- tronic and Electrical Engineering from Pohang University of Science and Technology (POSTECH), Korea, in 2011. He is currently pursuing the

M.S. and Ph.D. degrees in the Department of Electronic and Electrical Engineering from Pohang University of Science and Technology (POSTECH), Korea. His interests include data converters, clock and data recovery, high-speed interface circuits and ultra-low-voltage analog circuits.

Byungsub Kim received the B.S. degree in Electronic and Electrical Engineering (EEE) from Pohang University of Science and Technology (POSTECH), Pohang, Korea, in 2000, and the M.S. (2004) and Ph.D. (2010) degrees in Electrical Engi-

neering and Computer Science (EECS) from Massachusetts Institute of Technology (MIT), Cambridge, USA. From 2010 to 2011, he worked as an analog design engineer at Intel Corporation, Hillsboro, OR, USA. In 2012, he joined the faculty of the department of Electronic and Electrical Engineering at POSTECH, where he is currently working as an assistant professor. He received several honorable awards. In 2011, Dr. Kim received MIT EECS Jin-Au Kong Outstanding Doctoral Thesis Honorable Mentions, and IEEE 2009 Journal of Solid-State Circuits Best Paper Award. In 2009, he received Analog Device Inc. Outstanding Student Designer Award from MIT, and was also a co-recipient of the Beatrice Winner Award for Editorial Excellence at the 2009 IEEE Internal Solid-State Circuits Conference.

Jae-Yoon Sim received the B.S., M.S., and Ph.D. degrees in Electronic and Electrical Engineering from Pohang University of Science and Technology (POSTECH), Korea, in 1993, 1995, and 1999, respectively.

From 1999 to 2005, he worked as a senior engineer at Samsung Electronics, Korea. From 2003 to 2005, he was a post-doctoral researcher with the University of Southern California, Los Angeles. From 2011 to 2012, he was a visiting scholar with the University of Michigan, Ann Arbor. In 2005, he joined POSTECH, where he is currently an Associate Professor. He has served in the Technical Program Committees of the International Solid-State Circuits Conference (ISSCC), Symposium on VLSI Circuits, and Asian Solid- State Circuits Conference. He is a co-recipient of the Takuo Sugano Award at ISSCC 2001. His research interests include high-speed serial/parallel links, PLLs, data converters and power module for plasma generation.

Page 13: An Adaptive-Bandwidth Referenceless CDR with Small-area …jsts.org/html/journal/journal_files/2015/06/Year2015... · 2015-07-03 · 406 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH

416 HYE-JUNG KWON et al : AN ADAPTIVE-BANDWIDTH REFERENCELESS CDR WITH SMALL-AREA COARSE AND FINE …

Hong-June Park received the B.S. degree from the Department of Electronic Engineering, Seoul National University, Seoul, Korea, in 1979, the M.S. degree from the Korea Advanced Institute of Science and Technology, Taejon, in 1981, and the

Ph.D. degree from the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, in 1989. He was a CAD engineer with ETRI, Korea, from 1981 to 1984 and a Senior Engineer in the TCAD Department of INTEL from 1989 to 1991. In 1991, he joined the Faculty of Electronic and Electrical Engineering, Pohang University of Science and Technology (POSTECH), Gyeongbuk, Korea, where he is currently Professor. His research interests include CMOS analog circuit design such as high-speed interface circuits, ROIC of touch sensors and analog/digital beamformer circuits for ultrasound medical imaging. Prof. Park is a senior member of IEEE and a member of IEEK. He served as the Editor-in-Chief of Journal of Semiconductor Technology and Science, an SCIE journal (http://www.jsts.org) from 2009 to 2012, also as the Vice President of IEEK in 2012 and as the technical program committee member of ISSCC, SOVC and A-SSCC for several years.