Great DLL Article

8/10/2019 Great DLL Article

1/13

632 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

A Portable Digital DLL forHigh-Speed CMOS Interface Circuits

Bruno W. Garlepp, Kevin S. Donnelly, Associate Member, IEEE, Jun Kim, Pak S. Chau,Jared L. Zerbe, Charles Huang, Chanh V. Tran, Clemenz L. Portmann, Member, IEEE,

Donald Stark, Yiu-Fai Chan, Member, IEEE, Thomas H. Lee, Member, IEEE, and Mark A. Horowitz

Abstract A digital delay-locked loop (DLL) that achievesinfinite phase range and 40-ps worst case phase resolution at400 MHz was developed in a 3.3-V, 0.4- m standard CMOSprocess. The DLL uses dual delay lines with an end-of-cycle detec-tor, phase blenders, and duty-cycle correcting multiplexers. Thismore easily process-portable DLL achieves jitter performancecomparable to a more complex analog DLL when placed intoidentical high-speed interface circuits fabricated on the sametest-chip die. At 400 MHz, the digital DLL provides

250 pspeak-to-peak long-term jitter at 3.3 V and operates down to 1.7 V,where it dissipates 60 mW. The DLL occupies 0.96 mm

Index Terms Delay circuits, delay-locked loops (DLLs), dig-ital control, digital DLL, phase blending, phase control, phasesynchronization.

I. INTRODUCTION

IN RECENT years, there has been a great deal of interestin delay-locked loops (DLLs) for clock alignment. Bothanalog and digital DLLs have been developed [1][6], with

analog loops generally providing better jitter performance

at the expense of greater complexity. This paper describes

a digital DLL that achieves jitter performance comparable

to an analog DLL. Although the digital DLL uses more

area and power than the analog DLL, its greater simplicity,easier portability, and lower minimum required supply voltage

makes it very attractive in many clock alignment applications.

Additionally, the digital DLL not only operates at lower supply

voltages than the analog DLL but it also demonstrates that

digital DLLs have the potential for good power-consumption

scaling as supply voltage is decreased.

The motivation for the development of this digital DLL

was the need for a clock alignment circuit for use in the

CMOS interface cells [6] of a high-speed memory system

as in [7].1 The memory system operates at 400 MHz, with

data transferred on both edges of the clock, producing an

effective 800-Mb/s/pin transfer rate. This corresponds to a

1.25-ns bit time. With such tight timing requirements, it

becomes imperative to include clock alignment circuits in

Manuscript received September 15, 1998; revised December 23, 1998.B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang,

C. V. Tran, C. L. Portmann, D. Stark, and Y.-F. Chan are with Rambus, Inc.,Mountain View, CA 94040 USA.

T. H. Lee and M. A. Horowitz are with the Center for Integrated Systems,Stanford University, Stanford, CA 94305 USA.

Publisher Item Identifier S 0018-9200(99)03668-9.1 Documentation is available at http://www.rambus.com/html/direct_docu-

mentation.html.

the interface cells to provide internal on-chip clocks that

are aligned in phase with an external system clock. The

clock alignment circuits must provide a phase resolution

better than 50 ps and produce a worst case long-term jitter

of less than 250 ps peak-to-peak (pp). To facilitate the

use of many different application-specific integrated-circuit

controllers with the memory system, the clock alignment

circuit should be easily portable across multiple processes

without compromising performance.

The clock alignment function can be provided using eitherphase-locked loops (PLLs) or DLLs. Because frequency syn-

thesis is not needed in this application, DLLs are preferred for

their unconditional stability, lower phase-error accumulation,

and faster locking time. In previous designs of the interface

cells for this memory system, we have used an analog DLL

with a two-step coarse/fine architecture. A high-level drawing

of this approach is shown in Fig. 1. This analog DLL includes

a quadrature generator, which produces four reference signals

spaced 90 apart in phase to evenly cover the full 360

of phase space. A phase interpolator circuit in the analog

DLL receives these reference signals and selects a phase

adjacent pair that define a phase quadrant for interpolation to

produce an output signal phase-aligned to a reference signal,RefClk.

Analog DLLs constructed with this approach provide sev-

eral significant benefits. Because most of the elements in the

signal path can be made from differential analog blocks with

good power-supply rejection ratio (PSRR), the analog DLL

architecture of Fig. 1 can provide very good jitter performance.

Additionally, it can be carefully designed to occupy relatively

little area and consume relatively little current. Furthermore,

the analog DLL can provide very small phase steps when

locked ( 50 ps). Finally, the architecture of Fig. 1 provides

infinite phase range, and one set of quadrature reference

signals can be fed to multiple phase interpolators, allowingphase alignment to multiple reference signals simultaneously.

However, because of the relatively high analog complexity of

this DLL and its individual elements, the analog DLL of Fig. 1

requires a detailed, process-specific implementation, making it

relatively labor intensive to port across multiple processes.

Although we have traditionally used analog DLLs to pro-

vide the clock alignment function in the CMOS interface

cells of the memory system described above, we decided to

consider using a digital DLL. Digital DLLs are characterized

by their use of a digital delay line and are typically made from

00189200/99$10.00 1999 IEEE


2/13

GARLEPP et al.: PORTABLE DIGITAL DLL 633

Fig. 1. Block diagram of a two-step, coarse/fine analog DLL architecture.

simple, digital circuit elements. This facilitates their design and

portability across multiple processes. Additionally, because

phase information in a digital DLL is stored as a digital

state, digital DLLs can provide very fast timing recovery afterbeing placed into a low power mode. However, conventional

digital DLLs provide only moderate phase resolution and jitter

performance [8], [9].

Another benefit of digital DLLs is their ability to readily

operate at lower voltages than analog DLLs. Because analog

DLLs require the use of saturated current sources, they

experience voltage headroom problems as supply voltages

decrease. Digital DLLs, on the other hand, need only enough

voltage to ensure the proper operation of their digital gate

elements. For the same reason, digital DLLs better utilize

the power-saving benefits of digital CMOS voltage scaling

than analog DLLs. The power of an analog DLL is typically

distributed between IV power (where I is power and V is

voltage) from the constant current (differential) stages and

CV f power (where Cis capacitance and fis frequency) from

the CMOS (single-ended) stages (if any). The power of digital

DLLs, on the other hand, is determined primarily by CV f

power, which decreases quadratically with supply voltage.

This paper describes a digital DLL [10] used as the clock

alignment circuit in the CMOS interface cells of a high-speed

memory system. This work improves upon the performance of

previous digital DLLs by paralleling the two-step coarse/fine

analog DLL architectures presented in [4], [5], [7], and [11],

allowing the digital DLL to achieve jitter performance com-

parable to the analog DLLs.This paper is arranged as follows. Section II describes

delay-generation techniques used in conventional digital

DLLs and describes the improved techniques implemented

in the new DLL. This section also describes infinite phase

generation with the new delay-line scheme. Section III

describes several new circuit techniques used for enhancing

the phase resolution and signal quality in the new digital DLL.

Section IV describes the overall DLL architecture. Section V

discusses our test chip and measured results, with special

attention given to making a direct, side-by-side comparison of

the new digital DLL with an analog DLL placed into identical

CMOS interface cells on the same test-chip die. Section VI

concludes this paper.

The terms phase and delay are used throughout this paper

to describe the DLLs operation. It is helpful to recall that at agiven system frequency, the two quantities are related by the

simple equation

(1)

where is phase in degrees, is delay in seconds, and

is frequency in hertz.

II. DIGITAL DELAY CIRCUIT TECHNIQUES

A. Conventional Digital Delay Lines

As mentioned above, the purpose of a DLL in a clock

alignment application is to provide an output clock signal thatis aligned in phase with a reference clock signal of the same

frequency. To do this, the DLL must include a mechanism for

providing a variable delay to an input signal. The DLL then

adjusts this variable delay such that the input signal passes

through the delay mechanism and emerges at the output of the

DLL aligned in phase with the reference signal.

Digital DLLs generally incorporate a tapped digital delay

line as the variable-delay mechanism. The delay line receives

an input clock signal (e.g., a buffered version of the reference

signal) and passes it through a series of delay elements. The

outputs of the delay elements are tapped and buffered to

provide a series of phase-adjacent signals. The DLL then

selects the delay-line tap that provides the signal that producesan output with a phase that most closely matches the desired

phase.

A conventional delay line suitable for a CMOS digital DLL

is shown in Fig. 2. The delay elements could be implemented

with almost any circuit block, but because the phase resolution

of the delay line is determined by the delay through the delay

elements, delay elements that provide minimal delay are gen-

erally preferred. Thus, the delay line of Fig. 2 uses inverters,

since they provide the shortest delay of any CMOS digital gate.

Because of the inverting characteristic of all standard CMOS

gates, the delay line is tapped only at every other inverter


3/13


Fig. 2. Conventional digital delay line with inverter delay elements.

Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.

output to ensure that each successive tap provides a signal

that is adjacent in phase to the signals at its adjacent taps.

Although conventional delay lines are attractive for their

simplicity, DLLs designed around such conventional delay

lines suffer from several significant limitations. First, the delay

line provides fairly coarse phase resolution. For example, the

delay line in Fig. 2 provides a minimum phase step corre-

sponding to two inverter delays. Such coarse phase resolution

is not fine enough for our clock alignment application. Second,

conventional delay lines deliver only a finite phase range.

Typically, in order to cover at least one full cycle of phase, the

delay-line length and element delays are adjusted to provideat least 360 of phase under the fastest process, voltage,

and temperature (PVT) conditions and minimum operating

frequency More often, however, the delay

line is designed with as much as 720 (i.e., two cycles)

of phase under these conditions. This requires the use of a

long delay line, occupying a large silicon area and dissipating

additional power as the input signal propagates through the

many delay elements. Additionally, because inverters offer

poor PSRR, voltage supply noise-induced jitter can accumulate

as the signal propagates down the delay line. This causes

the signals available from the later taps in the delay line

to be more jitter prone than the signals from the earlier

taps. Last, even with an extended delay line, the DLL cannonetheless run out of phase range and lose lock in a system

with slowing drifting phase (e.g., spread-spectrum clocking).

These limitations prohibited the use of a conventional delay

line in our DLL design.

B. Delay-Line Improvements

To overcome some of these limitations, we developed a

complementary delay line as shown in Fig. 3 for our DLL.

In this architecture, two parallel delay lines with weak cross

coupling are driven by complementary input signals ClkIn and

ClkInb. Because of the use of complementary inputs, the two

delay lines are tapped after every inverter to provide phase-

adjacent signals separated by only one inverter delay, thereby

improving the phase resolution by a factor of two. An example

of how this delay-line scheme provides single inverter delay

resolution is shown by the shaded paths in Fig. 3. The signal

that emerges from Tap 2 has passed through three inverter

delays, while the signal that emerges from Tap 3 has passed

through four inverter delays. However, ClkInb is exactly 180

out of phase with ClkIn, providing the additional inversion

required to ensure that the signals emerging from Taps 2 and

3 are indeed separated in phase by exactly one inverter delay.This complementary delay-line architecture also allows the

delay lines to be made shorter. The true taps from the delay

line can provide the first 180 of phase, while the complement

taps can provide the second 180 of phase. Thus, each of

the two delay lines can be tuned for only 180 of phase

under the fastest PVT conditions and Shorter delay

lines provide the additional benefits of reduced maximum

jitter accumulation, smaller silicon area, and lower power

consumption. The problem that this design creates is a need to

determine when to switch from the true taps to the complement

taps and vice versa to ensure full and even coverage of the

entire 360 phase plane. This is particularly important because

the number of delay elements (and output taps) needed to cover180 changes with PVT conditions and operating frequency.

C. Infinite Phase Generation

To solve the problem of determining when to switch be-

tween the true and complement taps of the complementary

delay line, we developed an end-of-cycle (EOC) detector, as

shown in Fig. 4, for use with the complementary delay line. An

EOC detector is essentially a bank of data flip-flops arranged

as a time-to-digital converter for measuring the delay through

the delay line. The EOC detector produces a thermometer code


4/13


Fig. 4. EOC detector circuit (180 ).

Fig. 5. Phasor diagram with phasors of signals from the taps of a comple-mentary delay line with one inverter delay 50

indicating the first 180 of delay in the delay lines. The first

state transition in the EOC code indicates the first true tap

from the delay line that provides a signal with phase that

lags the phase of the signal from Tap 1 by more than 180

With this information, the DLL logic knows when to switchbetween the true and complement taps of the delay line to

ensure full coverage of all 360 of phase space, with phase

steps of at most one inverter delay. Use of the EOC code also

prevents negative phase steps in the phase-transfer function as

taps are successively selected from the delay line. This allows

the complementary delay lines to provide infinite, monotonic

phase range for the DLL. The clocking signal for the EOC

detector, SampClk, is synchronized to the signal from Tap 1

by a replica timing network (not shown).

To illustrate the principle of infinite phase generation using

the EOC code with this delay-line scheme, refer to Fig. 5,

which shows a phasor diagram of the signals from the first

five true and complement taps of a complementary delay linelike the one shown in Fig. 3. The figure assumes that the

PVT conditions and operating frequency are such that the

propagation delay of each inverter stage is equal to 50 of

phase. In the figure, the solid lines correspond to signals from

the true taps, while dashed lines correspond to signals from

the complement taps. Because Tap 5 delivers a signal that is

delayed by 200 from the signal at Tap 1, the EOC detectors

thermometer code would indicate that Tap 5 is the first true

tap to provide a signal with phase beyond 180 relative to the

signal from Tap 1. With this information, the DLL knows to

switch between the true and complement taps after four stages.

In other words, to travel counterclockwise around the phase

plane, the DLL would successively select Taps 14, then Taps

1b4b, then Taps 14, etc., to provide infinite phase range.

In this manner, all phase steps are equivalent to at most one

inverter delay (i.e., 50 ), except for the Tap 4 to Tap 1b and

the Tap 4b to Tap 1 transitions, which are less (30 ).

III. RESOLUTION-ENHANCINGCIRCUIT TECHNIQUES

A. Phase Blending

Although the delay-line improvements discussed above re-

duced the required power and area of the delay line, improved

its jitter accumulation performance, enabled infinite phase

range, and improved the available phase resolution by a factor

of two, this phase resolution was still not good enough to

meet the requirements of our memory system. In the 0.4- m

process we used, the propagation delay of one inverter over all

anticipated PVT conditions varied from 100 to 300 ps. This

is much larger than the worst case phase step specification of

50 ps. Therefore, to ensure compliance with this specification,

the DLLs phase resolution needed to be improved by at least

six times over what the delay line provided.To solve this problem, we used inverter phase blend-

ing. A simple, single-stage phase-blender circuit is shown

in Fig. 6(a). This circuit receives two phase-adjacent input

signals, and , which are separated in phase by one

inverter delay. The phase blender directly passes these two

signals with a simple delay to produce output signals and

However, it also uses a pair of phase-blending inverters to

interpolate between these two input signals to produce a third

output signal, , having a phase between that of and

This effectively doubles the available phase resolution.

However, it is not sufficient to use equal-sized inverters

for the phase blending. Fig. 6(b) illustrates a simple model

[12] used for determining the ideal relative sizes of the twophase-blending inverters to ensure that the phase of lies

directly between that of and The model approximates

the two inverters with two simple switched current sources

sharing a common resistancecapacitance (RC) load. For two

rising edge input signals separated in time by the model

yields the equation

(2)


5/13


(a) (c)

(b) (d) (e)Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot ofsignal voltages in the simple model for

, (d) phase-blender output signal edges for , and (e) phase-blenderoutput signal edges for

where is the total resistive load, is the output capacitance,

is the total pulldown current of the two phase-blending

inverters, is the unit step function, and is the phase-

blending inverter relative size ratio [refer to Fig. 6(a), where

is the ratio of the device widths in

inverter to the total device widths in both inverters and

]. Equation (2) is the sum of two decaying exponential terms,

and Fig. 6(c) shows a plot of the resulting waveform according

to this equation for the case where Because the

second exponential term is delayed in time by relative to

the first, it only begins to affect the slope of the decay after

this delay has elapsed. Therefore, without explicitly solving

the equation for each case of and it is not

obvious when will cross

For input signals separated in phase by one inverter delay

(i.e., ), the model specifies that in order to ensure

that the phase of lies directly in between that of

and the phase-blending inverters must be sized in a

ratio, such that the leading phase is

coupled to an inverter that is bigger than the one that receives

the lagging phase. This ratio was also confirmed empiricallywith simulations. The effect of the relative sizing of the phase-

blending inverters is illustrated in Fig. 6(d) and (e), which

shows the resulting output signal edges for and

, respectively. Clearly, the phase of output signal

is closer to that of than to that of when the

phase-blending inverter size ratio is Although

asymmetrical inverter sizing ensures good, evenly

spaced edge placement of the three output signals, it requires

that lead Reversing the phase of these two input

signals would result in a severely misplaced since the

effective sizing ratio would then be

Another design constraint of the phase-blender circuit is that

all paths through the circuit must provide precisely the same

loading and delay to ensure that the phase relationship between

and is maintained by and

The phase-blender idea can be extended to multiple cas-

caded stages for further phase-resolution improvement, with

each additional stage improving the resolution by a factor of

two. Fig. 7 shows a two-stage cascaded phase-blender circuit

that provides a 4x improvement in phase resolution from input

to output. Although it is theoretically possible to increase phase

resolution indefinitely by adding more and more phase-blender

stages, there is a practical limit. The number of inverters in

each signal path increases by two with each additional phase-

blending stage, making the circuit increasingly susceptible

to voltage supply noise-induced jitter due to the additional

delay in the signal path. Therefore, it is prudent to increase

the number of blending stages to improve phase resolution

only until the output phase step size from the phase blender

is approximately equivalent to the anticipated voltage supply

noise-induced jitter.

There are several design limitations that must be consideredwhen designing a cascaded phase blender. First, the impor-

tance of proper (asymmetrical) sizing of the phase-blending

inverters grows with the number of cascaded blending stages

because edge misplacement has a compounding effect as the

signals travel through the multiple stages. Additionally, close

attention must be paid to ensuring equal loading for equal

delay through all paths, requiring the use of dummy devices

on otherwise unbalanced paths. Finally, like a single-stage

phase blender, a cascaded phase blender also requires the

phase of to lead that of to ensure even output phase

spacing.


6/13


Fig. 7. Two-stage, cascaded phase-blender circuit for 4x phase-resolution improvement.

Fig. 8. Three-stage, symmetrical phase-blender circuit.

To overcome these design limitations of the cascaded phase

blender, we developed a symmetrical phase blender. A block

diagram of a three-stage symmetrical phase blender is shownin Fig. 8. This circuit is essentially two parallel cascaded

phase-blender circuits, sharing some common paths. When

leads the outputs provide

equal output phase spacing. When leads the out-

puts provide equal output phase

spacing. Therefore, the circuit provides phase blending with an

8x improvement in phase resolution and equally spaced output

signals regardless of which input signal leads in phase.

Additionally, the symmetrical blender allows for seamless

input switching for continuous phase blending over multiple

input delays. For example, assume that leads in

phase. Beginning with output outputs

can be successively selected to evenly span

the phase range between and Once is selected,can be changed to another signal that lags This

switching is possible without affecting the signal be-

cause has no dependence on or coupling from Then

outputs can be successively se-

lected to evenly span the phase range between and

Once is selected, can be changed to yet another

signal that lags Again, this is possible without any change

in the signal because has no dependence on or

coupling from This process can continue indefinitely.

Also, because all paths through the symmetrical phase blender

are inherently balanced, no dummy devices are needed.


7/13


(a)

(b)

Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.

B. Signal Selection and Duty-Cycle Correction

Since the digital DLL was to be placed into a memory

system that exchanges data on both edges of the clock, good

duty cycle (i.e., close to 50%) is required to ensure that the

data exchanged on either edge of the clock have equal bit

times. Duty-cycle distortion is usually addressed in PLLs by

simply running the PLLs voltage-controlled oscillator (VCO)at twice the system frequency and using a postdivider triggered

on one edge of the VCO output to produce the output clock

from the PLL [13][15]. This ensures good, 50% duty cycle. In

a DLL, however, no frequency multiplication is possible. The

duty cycle of the output signal must be directly corrected to

50%, for example, by using a duty-cycle correcting amplifier

in the signal path as in Fig. 1 and in [4].

Although duty-cycle correction can be addressed by placing

a duty-cycle corrector at the output of the DLL, this approach

has several limitations. First, since duty cycle is corrected only

at the output of the DLL, internal DLL signals may have

poor duty cycle. It is good practice, however, to maintain

50% duty cycle throughout the signal path to maximize signalpropagation as frequency is increased. Second, performing all

the duty-cycle correction in one stage at the output of the

DLL places a great deal of strain on the duty-cycle correcting

circuit; it must have a large duty-cycle correction range to

compensate for all the duty-cycle distortion that can accumu-

late in the signal path. Finally, adding a duty-cycle corrector

directly into the signal path increases signal path delay, and

thus susceptibility to voltage supply noise-induced jitter.

To address the issue of duty cycle, we developed the

idea of duty-cycle correcting multiplexers. Since multiplexers

would be needed in our DLL regardless, by adding duty-

cycle correcting functionality to the multiplexing circuitry, we

implemented duty-cycle correction while requiring minimal

additional power, area, and delay.

A 16: 1 duty-cycle correcting multiplexer is shown in

Fig. 9(a) with a corresponding control circuit in Fig. 9(b). To

facilitate understanding of this circuits operation, consider an

example. Assume that signal is selected and has duty-

cycle distortion such that output signal has a high

duty cycle. Assume also that is sensed by a duty-cycle

error detector, which produces a differential output error signal

proportional to the difference in duty cycle be-

tween and the ideal 50%. Thus, in our example,

will be greater than causing more current to be steered

through the right branch of the control signal in Fig. 9(b) than

through the left side. This in turn increases the strength of

and compared to and in the duty-cycle

correcting multiplexer of Fig. 9(a). These transistors alter the

duty cycle of the signal as it passes from to driving

to the ideal 50% duty cycle. The use of both PMOS and

NMOS devices to perform the duty-cycle correction ensuresa symmetrical duty-cycle correction range. Furthermore, be-

cause duty-cycle correction has been distributed through two

stages, the requirements on each individual duty-cycle correct-

ing stage are reduced. By combining both necessary functions

of signal selection and duty-cycle correction, this circuit

minimizes signal path delay, jitter accumulation, circuit area,

and power compared to performing both functions separately.

IV. DLL ARCHITECTURE

Fig. 10 is a block diagram of the entire digital DLL, with

shading indicating the circuit blocks that were described in


8/13


Fig. 10. Complete block diagram of the new digital DLL.

greater detail above. The DLL receives an input clock ExtClk

and passes it through a clock amplifier and splitter to provide

the two complementary input signals (ClkIn and ClkInb) to a

16-stage, 32-tap complementary delay line with EOC detector.

The delay line provides 32 signals at its output taps, which then

feed into two 32 : 1 duty-cycle correcting multiplexers. Each

multiplexer selects one of a pair of phase-adjacent signals

from the delay line. The two selected signals then pass to

a three-stage, 2 : 16 symmetrical phase-blender circuit, which

improves the phase resolution by a factor of eight. A final 16 : 1

duty-cycle correcting multiplexer selects one of the phase-blender output signals and passes it through a clock tree to

provide the DLLs output signal ClkOut. The digital DLL also

includes two independent duty-cycle correction loops as shown

in the figure. By using two separated duty-cycle correcting

loops, duty-cycle correction is distributed throughout the signal

path. This ensures a good duty cycle throughout the signal path

and reduces the duty-cycle correcting requirements of any one

stage.

The DLL uses bang-bang-type, all-digital feedback to lock

the phase of its output signal ClkOut to that of a reference

signal RefClk. A phase detector compares the phase of ClkOut

to RefClk and produces a binary error signal, which passesthrough an optional digital filter to a control logic circuit. The

digital filter is a simple majority detector, which has no effect

when the loop is acquiring lock but reduces dithering once

lock is acquired. The control logic is composed of simple

combinational logic and counters that drive the multiplexers

to select the two phase-adjacent coarse phase signals from the

delay line and the fine phase signal from the phase blender

that minimize the phase error between ClkOut and RefClk.

Because the phase information is stored in this DLL as a

digital state, the DLL can quickly recover from low-power

modes, requiring only enough time for the signals to propagate

(a) (b)

Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLLof [6] and on the right side (b) the new digital DLL integrated into identicalinterface cells.

through the signal path of the circuit from ExtClk to ClkOut

to provide a phase-locked output signal.

It is important to recognize the role of the EOC detector andcode in this architecture. Because the delay line and blender

are uncontrolled, open-loop circuits, the architecture relies on

the control circuits use of the EOC code to ensure proper

coarse phase selection, small maximum phase step size, and

phase transfer function monotonicity. The EOC code enables

the control logic to determine when to switch between the true

and complement taps of the delay line to ensure that phase-

adjacent taps are always selected by the coarse multiplexers

for the phase blender. The EOC code also enables the control

logic to determine which set of blender taps provides evenly

spaced output signals.


9/13


(a) (b)

Fig. 12. Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.

V. MEASURED PERFORMANCE

A. Test Chip

Both the digital DLL presented here and an implementation

of the analog DLL of Donnelly et al. [6] were integrated into

identical high-speed CMOS interface cells on opposite sides

of a single test chip. A micrograph of this test chip is shown in

Fig. 11. The test chip I/O was laid out symmetrically so that

either interface cell could be tested on the same hardware by

simply removing the test chip from the test socket, rotating

it 180 and reinserting it into the socket. This allowed a

true side-by-side comparison of the two DLLs operating in a

system. The test-chip circuits were fabricated using a standard

0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages.

B. Test Results

Unless indicated otherwise, all test results described in this

section were measured with the analog and digital DLLs

operating in their respective high-speed interface cells at 3.3 V

and 400 MHz (800 Mb/s/pin) using the same test vectors.

Additionally, the test chip included noise-generator circuits,which produced digital switching noise during the testing of

both interfaces.

Fig. 12(a) and (b) shows eye diagrams of the two interfaces

with the analog and digital DLLs, respectively. The diagrams

indicate the output timing performance of the interface cells

in the test system. Although the interface with the analog

DLL provided slightly better timing performance, 320 ps pp

versus 380 ps pp for the interface with the digital DLL, the

performances of both interfaces (and therefore, both DLLs)

were comparable. This is surprisingly good considering the

extensive use of poor PSRR elements, such as inverters, in

the signal path of the digital DLL. (Note: I/O circuit duty-

cycle distortion produced the unequal eyes in both diagrams.

This is unrelated to the DLLs.)

Fig. 13(a) and (b) shows receive shmoo diagrams for the

two interfaces with the analog and digital DLLs, respectively.

The diagrams indicate the CMOS interfaces valid timing win-

dows for receiving data. On the diagrams, the -axis is supply

voltage (2.5 V 4.0 V) while the -axis indicates input

data positioning along a bit period ( Mb/s ns).

The normal data position is in the center of the bit period. A

black dot in the diagram indicates incorrectly received data for

that combination of bit position and Ideally, the window

should be entirely white, but realistically, it is limited by jitter

from the DLL and other sources. Therefore, this test measures

the amount of tolerable skew on the input timing over a range

of supply voltages. Although the interface with the analog DLL

delivers better timing performance than the interface with the

digital DLL (1.02 versus 0.92 ns), both meet the component

specification of 0.85 ns.

Fig. 14 is a circle plot of the measured phase of the DLLs

output signal ClkOut, illustrating the DLLs ability to provideinfinite phase range. The -axis indicates delay [or phase, as in

(1)] of the ClkOut signal relative to a fixed 400-MHz signal.

The -axis indicates cycle count. These data were measured by

probing the on-chip DLL output signal (ClkOut) and forcing

the DLLs phase-detector output low. This caused the DLLs

output phase to continually advance over time. The termcircle

plotis used because this diagram is equivalent to sweeping a

phasor that represents the phase of ClkOut around the phase

plane, thereby drawing a circle in the phase plane. Because

the phase of ClkOut is measured relative to a fixed 400-MHz

signal, the plotted delay appears modulo 2.5 ns, where ns


10/13


(a) (b)

Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6]and (b) the new digital DLL.

Fig. 14. Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.

at 400 MHz. The absolute value of delay (i.e., from 3.4

to 5.9 ns) is irrelevant since it includes some test-system setup

time. The data were measured and plotted using a time-interval

analyzer.

The circle plot illustrates the DLLs phase transfer function,

showing its reasonably good linearity, monotonicity, and lack

of discontinuities. The small bumps in the transfer function

indicate a change in coarse reference phase selected from

the delay line. The slope of the transfer function depends on

PVT conditions and system frequency, since these conditions

determine how many delay-line taps are required to provide

180 of phase. In this case, nine taps were required, resulting

in an average phase step size of 20 ps or 2.9

Table I presents a summary of many of the measured and

simulated results of the analog and digital DLLs operating in

their respective CMOS interfaces. Although the analog DLL


11/13


(a) (b)

Fig. 15. Measured DLL power consumption (a) as a function frequency for

V and (b) as a function supply voltage for MHz.

TABLE IANALOG AND DIGITAL DLL PERFORMANCESUMMARY AT3.3 V AND400 MHz

uses less power and area, and provides better timing perfor-mance (smaller long-term jitter) and phase resolution (smaller

maximum phase step), both DLLs enable the interface cells to

meet the component requirements when operating in the test

system. Additionally, the digital DLL has a higher maximum

operating frequency, works at lower supply voltages, and

requires muchless effort to port to other processes (one versus

four man-months).

Fig. 15(a) and (b) shows plots of measured DLL power

versus frequency at V and measured DLL power

versus voltage supply at MHz, respectively. Although

both plots show that the digital DLL dissipated more power

than the analog DLL for all measured conditions, the plots il-

lustrate the different characteristics of the power consumed bythe two DLLs. As mentioned earlier, the power of both DLLs

is distributed between IV power in the constant-current stages

and CV f power in the CMOS stages. The curves in Fig. 15(a)

show that the digital DLLs power dissipation has a greater

dependence on frequency than does the analog DLLs power.

The curves in Fig. 15(b) show that the digital DLLs power

dissipation has a predominantly square-law dependence on

supply voltage, whereas the analog DLLs power dissipation

has a mixed square-law and linear dependence. These trends

confirm that the power of the analog DLL has a relatively

higher IV term, whereas the power of the digital DLL has a

relatively higher CV f term. This indicates that digital DLLs

have the potential for providing better power scaling than

analog DLLs as supply voltages decrease in the future.Finally, we have shown in Table I and in Fig. 15(b) that

the digital DLL operates at lower supply voltages than the

analog DLL. Although the operation of the digital DLL was

limited to 1.7 V, this limitation was due to our use of several

analog elements in the digital DLL (i.e., it was a mostlydigital

DLL). The digital DLL used an analog clock amplifier, two

analog duty-cycle error detectors (see Fig. 10), and an analog

quadrature phase detector (in a second loop, not shown). Using

an analog design for these circuit blocks in the digital DLL

was faster to implement without preventing evaluation of the

key digital blocks in the DLL, but their use determined the

minimum supply voltage of the digital DLL.

VI. CONCLUSION

We have described the architecture of a portable digital

DLL and demonstrated that it provides jitter performance

comparable to an analog DLL when fabricated in the same

3.3-V, 0.4- m standard CMOS process. Several circuits were

developed to enable the DLL to provide very fine phase

resolution, infinite phase range, and good duty-cycle perfor-

mance throughout the signal path. Despite its relatively simple

architecture, the digital DLL meets all system specifications,

and it operates down to lower supply voltages than its analog

counterpart. Utilizing essentially only simple digital CMOS

gates, the DLL can be ported to new processes in mini-mal time. For these reasons, this digital DLL provides an

alternative to analog DLLs for clock alignment applications.

ACKNOWLEDGMENT

The authors thank J. McBride and P. Gordon for layout

support and S. Sidiropoulos for helpful insights.

REFERENCES

[1] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, Multifrequencyzero-jitter delay-locked loop, IEEE J. Solid-State Circuits, vol. 29, pp.6770, Jan. 1994.


12/13


[2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo,Skew minimization techniques for 256 Mb synchronous DRAM andbeyond, in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192193.

[3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura,K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T.Anezaki, M. Hasegawa, and M. Taguchi, A 256 Mb SDRAM usingregister-controlled digital DLL, in ISSCC 1997 Dig. Tech. Papers, Feb.1997, pp. 7273.

[4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, A

2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,IEEE J. Solid-State Circuits, vol. 29, pp. 14911496, Dec. 1994.

[5] S. Sidiropoulos and M. Horowitz, A semidigital dual delay-lockedloop,IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.

[6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P.Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M.Johnson, A 660MB/s interface megacell portable circuit in 0.3 m0.7 m CMOS ASIC,IEEE J. Solid-State Circuits, vol. 31, pp. 19952003,Dec. 1996.

[7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase,T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin,M. Horowitz, T. Lee, and V. Lee, A 500-Megabyte/s data-rate 4.5MDRAM,IEEE J. Solid-State Circuits, vol. 28, pp. 490508, Apr. 1993.

[8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H.Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T.Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M.Horiguchi, and Y. Nakagome, A 256 Mb SDRAM with subthresholdleakage current suppression, in ISSCC 1998 Dig. Tech. Papers, Feb.1998, pp. 8081.

[9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan,M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi,T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose,T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, A 2.5ns clock access 250 MHz 256 Mb SDRAM with synchronous mirrordelay, ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374375.

[10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran,C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, A portabledigital DLL architecture for CMOS interface circuits, in VLSI Circuits

Dig. Tech. Papers, June 1998, pp. 214215.[11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G.

Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee,V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera,T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, A processindependent 800 MB/s DRAM bytewide interface featuring command

interleaving and concurrent memory operation, in ISSCC 1998 Dig.Tech. Papers, Feb. 1998, pp. 156157.

[12] S. Sidiropoulos, High-performance interchip signalling, Ph.D. dis-sertation, Computer Systems Laboratory, Stanford University, Stan-ford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 fromhttp://elib.stanford.edu/.

[13] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHzPLL N/2 multiplier and distribution network with low jitter for micro-processors, inISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330331.

[14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132133.

[15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600MHz CMOS PLL microprocessor clock generator with a 1.2 GHzVCO, in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396397.

Bruno W. Garlepp was born in Bahia, Brazil, onOctober 29, 1970. He received the B.S.E.E. degreefrom the University of California, Los Angeles,in 1993 and the M.S.E.E. degree from StanfordUniversity, Stanford, CA, in 1995.

In 1993, he joined the Hughes Aircraft AdvancedCircuits Technology Center, Torrance, CA. There,he designed high-precision analog integrated circuitsfor A/D applications, as well as CMOS, bipolar,and SiGe RF circuits for wide-band communica-tions applications. In 1996, he joined Rambus, Inc.,

Mountain View, CA, where he designs and develops high-speed CMOSclocking and I/O circuits for synchronous chip-to-chip communication.

Kevin S. Donnelly(A93) was born in Los Angeles,CA, in 1961. He received the B.S. degree in elec-trical engineering and computer science from theUniversity of California, Berkeley, in 1985 and theM.S. degree in electrical engineering from San JoseState University, San Jose, CA, in 1992.

He was with Memorex, Sipex, and National Semi-conductor, specializing in bipolar and BiCMOSanalog circuits for disk-drive read/write and servochannels. In 1992, he joined Rambus, Inc., Moun-

tain View, CA, where he has designed high-speedCMOS PLL circuits for clock recovery and data synchronization, and high-speed I/O circuits. He currently manages a group developing I/O circuitsand PLLs. His interests include PLLs and DLLs, I/O circuits, and dataconverters. He is a Member of the ISSCC Digital Subcommittee. He hasreceived several circuit design patents.

Mr. Donnelly is a coauthor of the paper that won the Best Paper Awardat the 1994 ISSCC.

Jun Kim was born in Tokyo, Japan, on November14, 1966. He received the B.S.E.E. degree from theUniversity of California, Berkeley, in 1989.

From 1989 to 1991, he was with Vitelic, Inc.,where he worked on SRAM and DRAM develop-ment. Between 1991 and 1994, he was with SunMicrosystems, where he was involved in micropro-cessor and digital circuit design. Since 1994, hehas been with Rambus, Inc., Mountain View, CA,as a Designer of high-speed CMOS I/O and DLLcircuits.

Pak S. Chau was born in Hong Kong in 1966.He received the B.S. degree in computer systemengineering from the University of Massachusetts,Amherst, in 1989 and the M.S. degree in electri-cal engineering from the University of California,Davis, in 1991.

He was with National Semiconductor and Chron-tel, Inc., where he worked as an Analog Circuit

Designer. In 1994, he joined Rambus, Inc., Moun-tain View, CA, where he has engaged in designinghigh-speed I/O and DLL circuits.

Jared L. Zerbe was born in New York, NY, in1965. He received the B.S. degree in electrical en-gineering from Stanford University, Stanford, CA,in 1987.

He joined VLSI Technology, Inc., in 1987, wherehe worked on semicustom ASIC design. In 1989, hejoined MIPS Computer Systems, where he designedhigh-performance floating-point blocks. Since 1992,he has been with Rambus Inc., Mountain View, CA,where he has specialized in the design of high-

speed I/O and PLL/DLL clock recovery and datasynchronization circuits.

Charles Huang received the B.S. degree in elec-trical engineering from the University of Fuzhou,China, in 1982 and the M.S. degree in electricalengineering from the University of Arkansas, Fayet-teville, in 1990.

He was with ULSI and SGI, working in the areaof PLL and cache circuit design. He joined Rambus,Inc., Mountain View, CA, in 1994, where he hasbeing engaged in high-speed CMOS DLL and I/Ocircuit design.


13/13


Chanh V. Tranwas born in Vietnam in 1964. Hereceived the B.S. degree in electrical engineeringand computer science form the University of Cali-fornia, Berkeley, in 1989.

From 1989 to 1992, he was with National Semi-conductor Corp., Santa Clara, CA, where he workedon CMOS mixed-signal IC design in the DataAcquisition Group. In 1992, he joined Rambus Inc.,Mountain View, CA, where he has been involved inDLL and high-speed I/O design.

Clemenz L. Portmann (S92M95) received theB.S.E.E. degree from the University of Washington,Seattle, in 1986, the M.S.E.E. degree from theUniversity of Hawaii at Manoa, Honolulu, in 1988,and the Ph.D. degree in electrical engineering fromStanford University, Stanford, CA, in 1995.

From 1988 to 1989, he was a Visiting Researcherat Nagoya University, Nagoya, Japan, and the Toy-ohashi University of Technology, Toyohashi, Japan,under the Monbusho (Ministry of Education) schol-arship program. From 1989 to 1990, he was a

Design Engineer for VLSI Technology, Inc., San Jose, CA, where he designedstandard cell libraries and SRAMs for ASIC designs. In 1995, he joined

Rambus, Inc., Mountain View, CA, where he is engaged in the design ofhigh-speed I/O circuits and DLLs for DRAM interfaces.

Donald Stark received the B.S. degree from theMassachusetts Institute of Technology, Cambridge,in 1985 and the M.S. and Ph.D. degrees fromStanford University, Stanford, CA, in 1987 and1991, respectively, all in electrical engineering.

His research interests at Stanford included circuitdesign and CAD tools for analysis of voltage andcurrent distributions in VLSI circuits. From 1987to 1991, he was also a Member of the WesternResearch Laboratory, Digital Equipment Corp., PaloAlto, CA, working on CAD development and ECL

circuit design. From 1991 to 1993, he was with the Semiconductor DeviceEngineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAMdesign. In 1993, he joined Rambus, Inc., Mountain View, CA, where hecurrently works on DRAM, high-speed I/O design, and CAD.

Yiu-Fai Chan(S76M78) received the B.S. andM.S. degrees in electrical engineering and computerscience (with highest honors) from the Universityof California (UC), Berkeley, in 1972 and 1973,respectively.

He joined Rambus, Inc., Mountain View, CA, in1992, where he is Director of Engineering, respon-sible for the development, application engineering,and customer support of high-speed mixed-signalcircuits, device packaging, signal integrity, and sys-

tem engineering. Prior to that, he was with TeraMicrosystems in charge of developing chips for workstations based on theSparc architecture. He was with Altera Corp. from 1983 to 1990, where heled a team of engineers to develop the industrys first CMOS programmablelogic devices. From 1976 to 1983, he held various technical and managementpositions at Intersil, Inc. (later a division of General Electric), where he wasengaged in the development of various CMOS memories, microprocessors,and peripheral devices. It was there that he developed the first EPROM devicesin CMOS technology. From 1974 to 1976, he designed calculator and TVgame integrated circuits at National Semiconductor. He has received severalpatents in circuits and systems technologies.

Mr. Chan is a member of Tau Beta Pi, Phi Beta Kappa, and Eta KappaNu. He received the University Science Fellowship from UC Berkeley andconducted research on solid-state devices and microwave acoustics. He haspublished in various IEEE technical publications and presented papers at IEEEtechnical conferences.

Thomas H. Lee(S87M87), for a photograph and biography, see this issue,p. 585.

Mark A. Horowitz,for a photograph and biography, see p. 528 of the April1999 issue of this JOURNAL.

Great DLL Article

Documents

Transcript of Great DLL Article