Great DLL Article

download Great DLL Article

of 13

Transcript of Great DLL Article

  • 8/10/2019 Great DLL Article

    1/13

    632 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

    A Portable Digital DLL forHigh-Speed CMOS Interface Circuits

    Bruno W. Garlepp, Kevin S. Donnelly, Associate Member, IEEE, Jun Kim, Pak S. Chau,Jared L. Zerbe, Charles Huang, Chanh V. Tran, Clemenz L. Portmann, Member, IEEE,

    Donald Stark, Yiu-Fai Chan, Member, IEEE, Thomas H. Lee, Member, IEEE, and Mark A. Horowitz

    Abstract A digital delay-locked loop (DLL) that achievesinfinite phase range and 40-ps worst case phase resolution at400 MHz was developed in a 3.3-V, 0.4- m standard CMOSprocess. The DLL uses dual delay lines with an end-of-cycle detec-tor, phase blenders, and duty-cycle correcting multiplexers. Thismore easily process-portable DLL achieves jitter performancecomparable to a more complex analog DLL when placed intoidentical high-speed interface circuits fabricated on the sametest-chip die. At 400 MHz, the digital DLL provides

    250 pspeak-to-peak long-term jitter at 3.3 V and operates down to 1.7 V,where it dissipates 60 mW. The DLL occupies 0.96 mm

    Index Terms Delay circuits, delay-locked loops (DLLs), dig-ital control, digital DLL, phase blending, phase control, phasesynchronization.

    I. INTRODUCTION

    IN RECENT years, there has been a great deal of interestin delay-locked loops (DLLs) for clock alignment. Bothanalog and digital DLLs have been developed [1][6], with

    analog loops generally providing better jitter performance

    at the expense of greater complexity. This paper describes

    a digital DLL that achieves jitter performance comparable

    to an analog DLL. Although the digital DLL uses more

    area and power than the analog DLL, its greater simplicity,easier portability, and lower minimum required supply voltage

    makes it very attractive in many clock alignment applications.

    Additionally, the digital DLL not only operates at lower supply

    voltages than the analog DLL but it also demonstrates that

    digital DLLs have the potential for good power-consumption

    scaling as supply voltage is decreased.

    The motivation for the development of this digital DLL

    was the need for a clock alignment circuit for use in the

    CMOS interface cells [6] of a high-speed memory system

    as in [7].1 The memory system operates at 400 MHz, with

    data transferred on both edges of the clock, producing an

    effective 800-Mb/s/pin transfer rate. This corresponds to a

    1.25-ns bit time. With such tight timing requirements, it

    becomes imperative to include clock alignment circuits in

    Manuscript received September 15, 1998; revised December 23, 1998.B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang,

    C. V. Tran, C. L. Portmann, D. Stark, and Y.-F. Chan are with Rambus, Inc.,Mountain View, CA 94040 USA.

    T. H. Lee and M. A. Horowitz are with the Center for Integrated Systems,Stanford University, Stanford, CA 94305 USA.

    Publisher Item Identifier S 0018-9200(99)03668-9.1 Documentation is available at http://www.rambus.com/html/direct_docu-

    mentation.html.

    the interface cells to provide internal on-chip clocks that

    are aligned in phase with an external system clock. The

    clock alignment circuits must provide a phase resolution

    better than 50 ps and produce a worst case long-term jitter

    of less than 250 ps peak-to-peak (pp). To facilitate the

    use of many different application-specific integrated-circuit

    controllers with the memory system, the clock alignment

    circuit should be easily portable across multiple processes

    without compromising performance.

    The clock alignment function can be provided using eitherphase-locked loops (PLLs) or DLLs. Because frequency syn-

    thesis is not needed in this application, DLLs are preferred for

    their unconditional stability, lower phase-error accumulation,

    and faster locking time. In previous designs of the interface

    cells for this memory system, we have used an analog DLL

    with a two-step coarse/fine architecture. A high-level drawing

    of this approach is shown in Fig. 1. This analog DLL includes

    a quadrature generator, which produces four reference signals

    spaced 90 apart in phase to evenly cover the full 360

    of phase space. A phase interpolator circuit in the analog

    DLL receives these reference signals and selects a phase

    adjacent pair that define a phase quadrant for interpolation to

    produce an output signal phase-aligned to a reference signal,RefClk.

    Analog DLLs constructed with this approach provide sev-

    eral significant benefits. Because most of the elements in the

    signal path can be made from differential analog blocks with

    good power-supply rejection ratio (PSRR), the analog DLL

    architecture of Fig. 1 can provide very good jitter performance.

    Additionally, it can be carefully designed to occupy relatively

    little area and consume relatively little current. Furthermore,

    the analog DLL can provide very small phase steps when

    locked ( 50 ps). Finally, the architecture of Fig. 1 provides

    infinite phase range, and one set of quadrature reference

    signals can be fed to multiple phase interpolators, allowingphase alignment to multiple reference signals simultaneously.

    However, because of the relatively high analog complexity of

    this DLL and its individual elements, the analog DLL of Fig. 1

    requires a detailed, process-specific implementation, making it

    relatively labor intensive to port across multiple processes.

    Although we have traditionally used analog DLLs to pro-

    vide the clock alignment function in the CMOS interface

    cells of the memory system described above, we decided to

    consider using a digital DLL. Digital DLLs are characterized

    by their use of a digital delay line and are typically made from

    00189200/99$10.00 1999 IEEE

  • 8/10/2019 Great DLL Article

    2/13

    GARLEPP et al.: PORTABLE DIGITAL DLL 633

    Fig. 1. Block diagram of a two-step, coarse/fine analog DLL architecture.

    simple, digital circuit elements. This facilitates their design and

    portability across multiple processes. Additionally, because

    phase information in a digital DLL is stored as a digital

    state, digital DLLs can provide very fast timing recovery afterbeing placed into a low power mode. However, conventional

    digital DLLs provide only moderate phase resolution and jitter

    performance [8], [9].

    Another benefit of digital DLLs is their ability to readily

    operate at lower voltages than analog DLLs. Because analog

    DLLs require the use of saturated current sources, they

    experience voltage headroom problems as supply voltages

    decrease. Digital DLLs, on the other hand, need only enough

    voltage to ensure the proper operation of their digital gate

    elements. For the same reason, digital DLLs better utilize

    the power-saving benefits of digital CMOS voltage scaling

    than analog DLLs. The power of an analog DLL is typically

    distributed between IV power (where I is power and V is

    voltage) from the constant current (differential) stages and

    CV f power (where Cis capacitance and fis frequency) from

    the CMOS (single-ended) stages (if any). The power of digital

    DLLs, on the other hand, is determined primarily by CV f

    power, which decreases quadratically with supply voltage.

    This paper describes a digital DLL [10] used as the clock

    alignment circuit in the CMOS interface cells of a high-speed

    memory system. This work improves upon the performance of

    previous digital DLLs by paralleling the two-step coarse/fine

    analog DLL architectures presented in [4], [5], [7], and [11],

    allowing the digital DLL to achieve jitter performance com-

    parable to the analog DLLs.This paper is arranged as follows. Section II describes

    delay-generation techniques used in conventional digital

    DLLs and describes the improved techniques implemented

    in the new DLL. This section also describes infinite phase

    generation with the new delay-line scheme. Section III

    describes several new circuit techniques used for enhancing

    the phase resolution and signal quality in the new digital DLL.

    Section IV describes the overall DLL architecture. Section V

    discusses our test chip and measured results, with special

    attention given to making a direct, side-by-side comparison of

    the new digital DLL with an analog DLL placed into identical

    CMOS interface cells on the same test-chip die. Section VI

    concludes this paper.

    The terms phase and delay are used throughout this paper

    to describe the DLLs operation. It is helpful to recall that at agiven system frequency, the two quantities are related by the

    simple equation

    (1)

    where is phase in degrees, is delay in seconds, and

    is frequency in hertz.

    II. DIGITAL DELAY CIRCUIT TECHNIQUES

    A. Conventional Digital Delay Lines

    As mentioned above, the purpose of a DLL in a clock

    alignment application is to provide an output clock signal thatis aligned in phase with a reference clock signal of the same

    frequency. To do this, the DLL must include a mechanism for

    providing a variable delay to an input signal. The DLL then

    adjusts this variable delay such that the input signal passes

    through the delay mechanism and emerges at the output of the

    DLL aligned in phase with the reference signal.

    Digital DLLs generally incorporate a tapped digital delay

    line as the variable-delay mechanism. The delay line receives

    an input clock signal (e.g., a buffered version of the reference

    signal) and passes it through a series of delay elements. The

    outputs of the delay elements are tapped and buffered to

    provide a series of phase-adjacent signals. The DLL then

    selects the delay-line tap that provides the signal that producesan output with a phase that most closely matches the desired

    phase.

    A conventional delay line suitable for a CMOS digital DLL

    is shown in Fig. 2. The delay elements could be implemented

    with almost any circuit block, but because the phase resolution

    of the delay line is determined by the delay through the delay

    elements, delay elements that provide minimal delay are gen-

    erally preferred. Thus, the delay line of Fig. 2 uses inverters,

    since they provide the shortest delay of any CMOS digital gate.

    Because of the inverting characteristic of all standard CMOS

    gates, the delay line is tapped only at every other inverter

  • 8/10/2019 Great DLL Article

    3/13

    634 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

    Fig. 2. Conventional digital delay line with inverter delay elements.

    Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.

    output to ensure that each successive tap provides a signal

    that is adjacent in phase to the signals at its adjacent taps.

    Although conventional delay lines are attractive for their

    simplicity, DLLs designed around such conventional delay

    lines suffer from several significant limitations. First, the delay

    line provides fairly coarse phase resolution. For example, the

    delay line in Fig. 2 provides a minimum phase step corre-

    sponding to two inverter delays. Such coarse phase resolution

    is not fine enough for our clock alignment application. Second,

    conventional delay lines deliver only a finite phase range.

    Typically, in order to cover at least one full cycle of phase, the

    delay-line length and element delays are adjusted to provideat least 360 of phase under the fastest process, voltage,

    and temperature (PVT) conditions and minimum operating

    frequency More often, however, the delay

    line is designed with as much as 720 (i.e., two cycles)

    of phase under these conditions. This requires the use of a

    long delay line, occupying a large silicon area and dissipating

    additional power as the input signal propagates through the

    many delay elements. Additionally, because inverters offer

    poor PSRR, voltage supply noise-induced jitter can accumulate

    as the signal propagates down the delay line. This causes

    the signals available from the later taps in the delay line

    to be more jitter prone than the signals from the earlier

    taps. Last, even with an extended delay line, the DLL cannonetheless run out of phase range and lose lock in a system

    with slowing drifting phase (e.g., spread-spectrum clocking).

    These limitations prohibited the use of a conventional delay

    line in our DLL design.

    B. Delay-Line Improvements

    To overcome some of these limitations, we developed a

    complementary delay line as shown in Fig. 3 for our DLL.

    In this architecture, two parallel delay lines with weak cross

    coupling are driven by complementary input signals ClkIn and

    ClkInb. Because of the use of complementary inputs, the two

    delay lines are tapped after every inverter to provide phase-

    adjacent signals separated by only one inverter delay, thereby

    improving the phase resolution by a factor of two. An example

    of how this delay-line scheme provides single inverter delay

    resolution is shown by the shaded paths in Fig. 3. The signal

    that emerges from Tap 2 has passed through three inverter

    delays, while the signal that emerges from Tap 3 has passed

    through four inverter delays. However, ClkInb is exactly 180

    out of phase with ClkIn, providing the additional inversion

    required to ensure that the signals emerging from Taps 2 and

    3 are indeed separated in phase by exactly one inverter delay.This complementary delay-line architecture also allows the

    delay lines to be made shorter. The true taps from the delay

    line can provide the first 180 of phase, while the complement

    taps can provide the second 180 of phase. Thus, each of

    the two delay lines can be tuned for only 180 of phase

    under the fastest PVT conditions and Shorter delay

    lines provide the additional benefits of reduced maximum

    jitter accumulation, smaller silicon area, and lower power

    consumption. The problem that this design creates is a need to

    determine when to switch from the true taps to the complement

    taps and vice versa to ensure full and even coverage of the

    entire 360 phase plane. This is particularly important because

    the number of delay elements (and output taps) needed to cover180 changes with PVT conditions and operating frequency.

    C. Infinite Phase Generation

    To solve the problem of determining when to switch be-

    tween the true and complement taps of the complementary

    delay line, we developed an end-of-cycle (EOC) detector, as

    shown in Fig. 4, for use with the complementary delay line. An

    EOC detector is essentially a bank of data flip-flops arranged

    as a time-to-digital converter for measuring the delay through

    the delay line. The EOC detector produces a thermometer code

  • 8/10/2019 Great DLL Article

    4/13

    GARLEPP et al.: PORTABLE DIGITAL DLL 635

    Fig. 4. EOC detector circuit (180 ).

    Fig. 5. Phasor diagram with phasors of signals from the taps of a comple-mentary delay line with one inverter delay 50

    indicating the first 180 of delay in the delay lines. The first

    state transition in the EOC code indicates the first true tap

    from the delay line that provides a signal with phase that

    lags the phase of the signal from Tap 1 by more than 180

    With this information, the DLL logic knows when to switchbetween the true and complement taps of the delay line to

    ensure full coverage of all 360 of phase space, with phase

    steps of at most one inverter delay. Use of the EOC code also

    prevents negative phase steps in the phase-transfer function as

    taps are successively selected from the delay line. This allows

    the complementary delay lines to provide infinite, monotonic

    phase range for the DLL. The clocking signal for the EOC

    detector, SampClk, is synchronized to the signal from Tap 1

    by a replica timing network (not shown).

    To illustrate the principle of infinite phase generation using

    the EOC code with this delay-line scheme, refer to Fig. 5,

    which shows a phasor diagram of the signals from the first

    five true and complement taps of a complementary delay linelike the one shown in Fig. 3. The figure assumes that the

    PVT conditions and operating frequency are such that the

    propagation delay of each inverter stage is equal to 50 of

    phase. In the figure, the solid lines correspond to signals from

    the true taps, while dashed lines correspond to signals from

    the complement taps. Because Tap 5 delivers a signal that is

    delayed by 200 from the signal at Tap 1, the EOC detectors

    thermometer code would indicate that Tap 5 is the first true

    tap to provide a signal with phase beyond 180 relative to the

    signal from Tap 1. With this information, the DLL knows to

    switch between the true and complement taps after four stages.

    In other words, to travel counterclockwise around the phase

    plane, the DLL would successively select Taps 14, then Taps

    1b4b, then Taps 14, etc., to provide infinite phase range.

    In this manner, all phase steps are equivalent to at most one

    inverter delay (i.e., 50 ), except for the Tap 4 to Tap 1b and

    the Tap 4b to Tap 1 transitions, which are less (30 ).

    III. RESOLUTION-ENHANCINGCIRCUIT TECHNIQUES

    A. Phase Blending

    Although the delay-line improvements discussed above re-

    duced the required power and area of the delay line, improved

    its jitter accumulation performance, enabled infinite phase

    range, and improved the available phase resolution by a factor

    of two, this phase resolution was still not good enough to

    meet the requirements of our memory system. In the 0.4- m

    process we used, the propagation delay of one inverter over all

    anticipated PVT conditions varied from 100 to 300 ps. This

    is much larger than the worst case phase step specification of

    50 ps. Therefore, to ensure compliance with this specification,

    the DLLs phase resolution needed to be improved by at least

    six times over what the delay line provided.To solve this problem, we used inverter phase blend-

    ing. A simple, single-stage phase-blender circuit is shown

    in Fig. 6(a). This circuit receives two phase-adjacent input

    signals, and , which are separated in phase by one

    inverter delay. The phase blender directly passes these two

    signals with a simple delay to produce output signals and

    However, it also uses a pair of phase-blending inverters to

    interpolate between these two input signals to produce a third

    output signal, , having a phase between that of and

    This effectively doubles the available phase resolution.

    However, it is not sufficient to use equal-sized inverters

    for the phase blending. Fig. 6(b) illustrates a simple model

    [12] used for determining the ideal relative sizes of the twophase-blending inverters to ensure that the phase of lies

    directly between that of and The model approximates

    the two inverters with two simple switched current sources

    sharing a common resistancecapacitance (RC) load. For two

    rising edge input signals separated in time by the model

    yields the equation

    (2)

  • 8/10/2019 Great DLL Article

    5/13

    636 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

    (a) (c)

    (b) (d) (e)Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot ofsignal voltages in the simple model for

    , (d) phase-blender output signal edges for , and (e) phase-blenderoutput signal edges for

    where is the total resistive load, is the output capacitance,

    is the total pulldown current of the two phase-blending

    inverters, is the unit step function, and is the phase-

    blending inverter relative size ratio [refer to Fig. 6(a), where

    is the ratio of the device widths in

    inverter to the total device widths in both inverters and

    ]. Equation (2) is the sum of two decaying exponential terms,

    and Fig. 6(c) shows a plot of the resulting waveform according

    to this equation for the case where Because the

    second exponential term is delayed in time by relative to

    the first, it only begins to affect the slope of the decay after

    this delay has elapsed. Therefore, without explicitly solving

    the equation for each case of and it is not

    obvious when will cross

    For input signals separated in phase by one inverter delay

    (i.e., ), the model specifies that in order to ensure

    that the phase of lies directly in between that of

    and the phase-blending inverters must be sized in a

    ratio, such that the leading phase is

    coupled to an inverter that is bigger than the one that receives

    the lagging phase. This ratio was also confirmed empiricallywith simulations. The effect of the relative sizing of the phase-

    blending inverters is illustrated in Fig. 6(d) and (e), which

    shows the resulting output signal edges for and

    , respectively. Clearly, the phase of output signal

    is closer to that of than to that of when the

    phase-blending inverter size ratio is Although

    asymmetrical inverter sizing ensures good, evenly

    spaced edge placement of the three output signals, it requires

    that lead Reversing the phase of these two input

    signals would result in a severely misplaced since the

    effective sizing ratio would then be

    Another design constraint of the phase-blender circuit is that

    all paths through the circuit must provide precisely the same

    loading and delay to ensure that the phase relationship between

    and is maintained by and

    The phase-blender idea can be extended to multiple cas-

    caded stages for further phase-resolution improvement, with

    each additional stage improving the resolution by a factor of

    two. Fig. 7 shows a two-stage cascaded phase-blender circuit

    that provides a 4x improvement in phase resolution from input

    to output. Although it is theoretically possible to increase phase

    resolution indefinitely by adding more and more phase-blender

    stages, there is a practical limit. The number of inverters in

    each signal path increases by two with each additional phase-

    blending stage, making the circuit increasingly susceptible

    to voltage supply noise-induced jitter due to the additional

    delay in the signal path. Therefore, it is prudent to increase

    the number of blending stages to improve phase resolution

    only until the output phase step size from the phase blender

    is approximately equivalent to the anticipated voltage supply

    noise-induced jitter.

    There are several design limitations that must be consideredwhen designing a cascaded phase blender. First, the impor-

    tance of proper (asymmetrical) sizing of the phase-blending

    inverters grows with the number of cascaded blending stages

    because edge misplacement has a compounding effect as the

    signals travel through the multiple stages. Additionally, close

    attention must be paid to ensuring equal loading for equal

    delay through all paths, requiring the use of dummy devices

    on otherwise unbalanced paths. Finally, like a single-stage

    phase blender, a cascaded phase blender also requires the

    phase of to lead that of to ensure even output phase

    spacing.

  • 8/10/2019 Great DLL Article

    6/13

    GARLEPP et al.: PORTABLE DIGITAL DLL 637

    Fig. 7. Two-stage, cascaded phase-blender circuit for 4x phase-resolution improvement.

    Fig. 8. Three-stage, symmetrical phase-blender circuit.

    To overcome these design limitations of the cascaded phase

    blender, we developed a symmetrical phase blender. A block

    diagram of a three-stage symmetrical phase blender is shownin Fig. 8. This circuit is essentially two parallel cascaded

    phase-blender circuits, sharing some common paths. When

    leads the outputs provide

    equal output phase spacing. When leads the out-

    puts provide equal output phase

    spacing. Therefore, the circuit provides phase blending with an

    8x improvement in phase resolution and equally spaced output

    signals regardless of which input signal leads in phase.

    Additionally, the symmetrical blender allows for seamless

    input switching for continuous phase blending over multiple

    input delays. For example, assume that leads in

    phase. Beginning with output outputs

    can be successively selected to evenly span

    the phase range between and Once is selected,can be changed to another signal that lags This

    switching is possible without affecting the signal be-

    cause has no dependence on or coupling from Then

    outputs can be successively se-

    lected to evenly span the phase range between and

    Once is selected, can be changed to yet another

    signal that lags Again, this is possible without any change

    in the signal because has no dependence on or

    coupling from This process can continue indefinitely.

    Also, because all paths through the symmetrical phase blender

    are inherently balanced, no dummy devices are needed.

  • 8/10/2019 Great DLL Article

    7/13

    638 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

    (a)

    (b)

    Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.

    B. Signal Selection and Duty-Cycle Correction

    Since the digital DLL was to be placed into a memory

    system that exchanges data on both edges of the clock, good

    duty cycle (i.e., close to 50%) is required to ensure that the

    data exchanged on either edge of the clock have equal bit

    times. Duty-cycle distortion is usually addressed in PLLs by

    simply running the PLLs voltage-controlled oscillator (VCO)at twice the system frequency and using a postdivider triggered

    on one edge of the VCO output to produce the output clock

    from the PLL [13][15]. This ensures good, 50% duty cycle. In

    a DLL, however, no frequency multiplication is possible. The

    duty cycle of the output signal must be directly corrected to

    50%, for example, by using a duty-cycle correcting amplifier

    in the signal path as in Fig. 1 and in [4].

    Although duty-cycle correction can be addressed by placing

    a duty-cycle corrector at the output of the DLL, this approach

    has several limitations. First, since duty cycle is corrected only

    at the output of the DLL, internal DLL signals may have

    poor duty cycle. It is good practice, however, to maintain

    50% duty cycle throughout the signal path to maximize signalpropagation as frequency is increased. Second, performing all

    the duty-cycle correction in one stage at the output of the

    DLL places a great deal of strain on the duty-cycle correcting

    circuit; it must have a large duty-cycle correction range to

    compensate for all the duty-cycle distortion that can accumu-

    late in the signal path. Finally, adding a duty-cycle corrector

    directly into the signal path increases signal path delay, and

    thus susceptibility to voltage supply noise-induced jitter.

    To address the issue of duty cycle, we developed the

    idea of duty-cycle correcting multiplexers. Since multiplexers

    would be needed in our DLL regardless, by adding duty-

    cycle correcting functionality to the multiplexing circuitry, we

    implemented duty-cycle correction while requiring minimal

    additional power, area, and delay.

    A 16: 1 duty-cycle correcting multiplexer is shown in

    Fig. 9(a) with a corresponding control circuit in Fig. 9(b). To

    facilitate understanding of this circuits operation, consider an

    example. Assume that signal is selected and has duty-

    cycle distortion such that output signal has a high

    duty cycle. Assume also that is sensed by a duty-cycle

    error detector, which produces a differential output error signal

    proportional to the difference in duty cycle be-

    tween and the ideal 50%. Thus, in our example,

    will be greater than causing more current to be steered

    through the right branch of the control signal in Fig. 9(b) than

    through the left side. This in turn increases the strength of

    and compared to and in the duty-cycle

    correcting multiplexer of Fig. 9(a). These transistors alter the

    duty cycle of the signal as it passes from to driving

    to the ideal 50% duty cycle. The use of both PMOS and

    NMOS devices to perform the duty-cycle correction ensuresa symmetrical duty-cycle correction range. Furthermore, be-

    cause duty-cycle correction has been distributed through two

    stages, the requirements on each individual duty-cycle correct-

    ing stage are reduced. By combining both necessary functions

    of signal selection and duty-cycle correction, this circuit

    minimizes signal path delay, jitter accumulation, circuit area,

    and power compared to performing both functions separately.

    IV. DLL ARCHITECTURE

    Fig. 10 is a block diagram of the entire digital DLL, with

    shading indicating the circuit blocks that were described in

  • 8/10/2019 Great DLL Article

    8/13

    GARLEPP et al.: PORTABLE DIGITAL DLL 639

    Fig. 10. Complete block diagram of the new digital DLL.

    greater detail above. The DLL receives an input clock ExtClk

    and passes it through a clock amplifier and splitter to provide

    the two complementary input signals (ClkIn and ClkInb) to a

    16-stage, 32-tap complementary delay line with EOC detector.

    The delay line provides 32 signals at its output taps, which then

    feed into two 32 : 1 duty-cycle correcting multiplexers. Each

    multiplexer selects one of a pair of phase-adjacent signals

    from the delay line. The two selected signals then pass to

    a three-stage, 2 : 16 symmetrical phase-blender circuit, which

    improves the phase resolution by a factor of eight. A final 16 : 1

    duty-cycle correcting multiplexer selects one of the phase-blender output signals and passes it through a clock tree to

    provide the DLLs output signal ClkOut. The digital DLL also

    includes two independent duty-cycle correction loops as shown

    in the figure. By using two separated duty-cycle correcting

    loops, duty-cycle correction is distributed throughout the signal

    path. This ensures a good duty cycle throughout the signal path

    and reduces the duty-cycle correcting requirements of any one

    stage.

    The DLL uses bang-bang-type, all-digital feedback to lock

    the phase of its output signal ClkOut to that of a reference

    signal RefClk. A phase detector compares the phase of ClkOut

    to RefClk and produces a binary error signal, which passesthrough an optional digital filter to a control logic circuit. The

    digital filter is a simple majority detector, which has no effect

    when the loop is acquiring lock but reduces dithering once

    lock is acquired. The control logic is composed of simple

    combinational logic and counters that drive the multiplexers

    to select the two phase-adjacent coarse phase signals from the

    delay line and the fine phase signal from the phase blender

    that minimize the phase error between ClkOut and RefClk.

    Because the phase information is stored in this DLL as a

    digital state, the DLL can quickly recover from low-power

    modes, requiring only enough time for the signals to propagate

    (a) (b)

    Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLLof [6] and on the right side (b) the new digital DLL integrated into identicalinterface cells.

    through the signal path of the circuit from ExtClk to ClkOut

    to provide a phase-locked output signal.

    It is important to recognize the role of the EOC detector andcode in this architecture. Because the delay line and blender

    are uncontrolled, open-loop circuits, the architecture relies on

    the control circuits use of the EOC code to ensure proper

    coarse phase selection, small maximum phase step size, and

    phase transfer function monotonicity. The EOC code enables

    the control logic to determine when to switch between the true

    and complement taps of the delay line to ensure that phase-

    adjacent taps are always selected by the coarse multiplexers

    for the phase blender. The EOC code also enables the control

    logic to determine which set of blender taps provides evenly

    spaced output signals.

  • 8/10/2019 Great DLL Article

    9/13

    640 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

    (a) (b)

    Fig. 12. Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.

    V. MEASURED PERFORMANCE

    A. Test Chip

    Both the digital DLL presented here and an implementation

    of the analog DLL of Donnelly et al. [6] were integrated into

    identical high-speed CMOS interface cells on opposite sides

    of a single test chip. A micrograph of this test chip is shown in

    Fig. 11. The test chip I/O was laid out symmetrically so that

    either interface cell could be tested on the same hardware by

    simply removing the test chip from the test socket, rotating

    it 180 and reinserting it into the socket. This allowed a

    true side-by-side comparison of the two DLLs operating in a

    system. The test-chip circuits were fabricated using a standard

    0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages.

    B. Test Results

    Unless indicated otherwise, all test results described in this

    section were measured with the analog and digital DLLs

    operating in their respective high-speed interface cells at 3.3 V

    and 400 MHz (800 Mb/s/pin) using the same test vectors.

    Additionally, the test chip included noise-generator circuits,which produced digital switching noise during the testing of

    both interfaces.

    Fig. 12(a) and (b) shows eye diagrams of the two interfaces

    with the analog and digital DLLs, respectively. The diagrams

    indicate the output timing performance of the interface cells

    in the test system. Although the interface with the analog

    DLL provided slightly better timing performance, 320 ps pp

    versus 380 ps pp for the interface with the digital DLL, the

    performances of both interfaces (and therefore, both DLLs)

    were comparable. This is surprisingly good considering the

    extensive use of poor PSRR elements, such as inverters, in

    the signal path of the digital DLL. (Note: I/O circuit duty-

    cycle distortion produced the unequal eyes in both diagrams.

    This is unrelated to the DLLs.)

    Fig. 13(a) and (b) shows receive shmoo diagrams for the

    two interfaces with the analog and digital DLLs, respectively.

    The diagrams indicate the CMOS interfaces valid timing win-

    dows for receiving data. On the diagrams, the -axis is supply

    voltage (2.5 V 4.0 V) while the -axis indicates input

    data positioning along a bit period ( Mb/s ns).

    The normal data position is in the center of the bit period. A

    black dot in the diagram indicates incorrectly received data for

    that combination of bit position and Ideally, the window

    should be entirely white, but realistically, it is limited by jitter

    from the DLL and other sources. Therefore, this test measures

    the amount of tolerable skew on the input timing over a range

    of supply voltages. Although the interface with the analog DLL

    delivers better timing performance than the interface with the

    digital DLL (1.02 versus 0.92 ns), both meet the component

    specification of 0.85 ns.

    Fig. 14 is a circle plot of the measured phase of the DLLs

    output signal ClkOut, illustrating the DLLs ability to provideinfinite phase range. The -axis indicates delay [or phase, as in

    (1)] of the ClkOut signal relative to a fixed 400-MHz signal.

    The -axis indicates cycle count. These data were measured by

    probing the on-chip DLL output signal (ClkOut) and forcing

    the DLLs phase-detector output low. This caused the DLLs

    output phase to continually advance over time. The termcircle

    plotis used because this diagram is equivalent to sweeping a

    phasor that represents the phase of ClkOut around the phase

    plane, thereby drawing a circle in the phase plane. Because

    the phase of ClkOut is measured relative to a fixed 400-MHz

    signal, the plotted delay appears modulo 2.5 ns, where ns

  • 8/10/2019 Great DLL Article

    10/13

    GARLEPP et al.: PORTABLE DIGITAL DLL 641

    (a) (b)

    Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6]and (b) the new digital DLL.

    Fig. 14. Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.

    at 400 MHz. The absolute value of delay (i.e., from 3.4

    to 5.9 ns) is irrelevant since it includes some test-system setup

    time. The data were measured and plotted using a time-interval

    analyzer.

    The circle plot illustrates the DLLs phase transfer function,

    showing its reasonably good linearity, monotonicity, and lack

    of discontinuities. The small bumps in the transfer function

    indicate a change in coarse reference phase selected from

    the delay line. The slope of the transfer function depends on

    PVT conditions and system frequency, since these conditions

    determine how many delay-line taps are required to provide

    180 of phase. In this case, nine taps were required, resulting

    in an average phase step size of 20 ps or 2.9

    Table I presents a summary of many of the measured and

    simulated results of the analog and digital DLLs operating in

    their respective CMOS interfaces. Although the analog DLL

  • 8/10/2019 Great DLL Article

    11/13

    642 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

    (a) (b)

    Fig. 15. Measured DLL power consumption (a) as a function frequency for

    V and (b) as a function supply voltage for MHz.

    TABLE IANALOG AND DIGITAL DLL PERFORMANCESUMMARY AT3.3 V AND400 MHz

    uses less power and area, and provides better timing perfor-mance (smaller long-term jitter) and phase resolution (smaller

    maximum phase step), both DLLs enable the interface cells to

    meet the component requirements when operating in the test

    system. Additionally, the digital DLL has a higher maximum

    operating frequency, works at lower supply voltages, and

    requires muchless effort to port to other processes (one versus

    four man-months).

    Fig. 15(a) and (b) shows plots of measured DLL power

    versus frequency at V and measured DLL power

    versus voltage supply at MHz, respectively. Although

    both plots show that the digital DLL dissipated more power

    than the analog DLL for all measured conditions, the plots il-

    lustrate the different characteristics of the power consumed bythe two DLLs. As mentioned earlier, the power of both DLLs

    is distributed between IV power in the constant-current stages

    and CV f power in the CMOS stages. The curves in Fig. 15(a)

    show that the digital DLLs power dissipation has a greater

    dependence on frequency than does the analog DLLs power.

    The curves in Fig. 15(b) show that the digital DLLs power

    dissipation has a predominantly square-law dependence on

    supply voltage, whereas the analog DLLs power dissipation

    has a mixed square-law and linear dependence. These trends

    confirm that the power of the analog DLL has a relatively

    higher IV term, whereas the power of the digital DLL has a

    relatively higher CV f term. This indicates that digital DLLs

    have the potential for providing better power scaling than

    analog DLLs as supply voltages decrease in the future.Finally, we have shown in Table I and in Fig. 15(b) that

    the digital DLL operates at lower supply voltages than the

    analog DLL. Although the operation of the digital DLL was

    limited to 1.7 V, this limitation was due to our use of several

    analog elements in the digital DLL (i.e., it was a mostlydigital

    DLL). The digital DLL used an analog clock amplifier, two

    analog duty-cycle error detectors (see Fig. 10), and an analog

    quadrature phase detector (in a second loop, not shown). Using

    an analog design for these circuit blocks in the digital DLL

    was faster to implement without preventing evaluation of the

    key digital blocks in the DLL, but their use determined the

    minimum supply voltage of the digital DLL.

    VI. CONCLUSION

    We have described the architecture of a portable digital

    DLL and demonstrated that it provides jitter performance

    comparable to an analog DLL when fabricated in the same

    3.3-V, 0.4- m standard CMOS process. Several circuits were

    developed to enable the DLL to provide very fine phase

    resolution, infinite phase range, and good duty-cycle perfor-

    mance throughout the signal path. Despite its relatively simple

    architecture, the digital DLL meets all system specifications,

    and it operates down to lower supply voltages than its analog

    counterpart. Utilizing essentially only simple digital CMOS

    gates, the DLL can be ported to new processes in mini-mal time. For these reasons, this digital DLL provides an

    alternative to analog DLLs for clock alignment applications.

    ACKNOWLEDGMENT

    The authors thank J. McBride and P. Gordon for layout

    support and S. Sidiropoulos for helpful insights.

    REFERENCES

    [1] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, Multifrequencyzero-jitter delay-locked loop, IEEE J. Solid-State Circuits, vol. 29, pp.6770, Jan. 1994.

  • 8/10/2019 Great DLL Article

    12/13

    GARLEPP et al.: PORTABLE DIGITAL DLL 643

    [2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo,Skew minimization techniques for 256 Mb synchronous DRAM andbeyond, in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192193.

    [3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura,K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T.Anezaki, M. Hasegawa, and M. Taguchi, A 256 Mb SDRAM usingregister-controlled digital DLL, in ISSCC 1997 Dig. Tech. Papers, Feb.1997, pp. 7273.

    [4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, A

    2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,IEEE J. Solid-State Circuits, vol. 29, pp. 14911496, Dec. 1994.

    [5] S. Sidiropoulos and M. Horowitz, A semidigital dual delay-lockedloop,IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.

    [6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P.Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M.Johnson, A 660MB/s interface megacell portable circuit in 0.3 m0.7 m CMOS ASIC,IEEE J. Solid-State Circuits, vol. 31, pp. 19952003,Dec. 1996.

    [7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase,T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin,M. Horowitz, T. Lee, and V. Lee, A 500-Megabyte/s data-rate 4.5MDRAM,IEEE J. Solid-State Circuits, vol. 28, pp. 490508, Apr. 1993.

    [8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H.Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T.Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M.Horiguchi, and Y. Nakagome, A 256 Mb SDRAM with subthresholdleakage current suppression, in ISSCC 1998 Dig. Tech. Papers, Feb.1998, pp. 8081.

    [9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan,M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi,T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose,T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, A 2.5ns clock access 250 MHz 256 Mb SDRAM with synchronous mirrordelay, ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374375.

    [10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran,C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, A portabledigital DLL architecture for CMOS interface circuits, in VLSI Circuits

    Dig. Tech. Papers, June 1998, pp. 214215.[11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G.

    Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee,V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera,T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, A processindependent 800 MB/s DRAM bytewide interface featuring command

    interleaving and concurrent memory operation, in ISSCC 1998 Dig.Tech. Papers, Feb. 1998, pp. 156157.

    [12] S. Sidiropoulos, High-performance interchip signalling, Ph.D. dis-sertation, Computer Systems Laboratory, Stanford University, Stan-ford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 fromhttp://elib.stanford.edu/.

    [13] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHzPLL N/2 multiplier and distribution network with low jitter for micro-processors, inISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330331.

    [14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132133.

    [15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600MHz CMOS PLL microprocessor clock generator with a 1.2 GHzVCO, in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396397.

    Bruno W. Garlepp was born in Bahia, Brazil, onOctober 29, 1970. He received the B.S.E.E. degreefrom the University of California, Los Angeles,in 1993 and the M.S.E.E. degree from StanfordUniversity, Stanford, CA, in 1995.

    In 1993, he joined the Hughes Aircraft AdvancedCircuits Technology Center, Torrance, CA. There,he designed high-precision analog integrated circuitsfor A/D applications, as well as CMOS, bipolar,and SiGe RF circuits for wide-band communica-tions applications. In 1996, he joined Rambus, Inc.,

    Mountain View, CA, where he designs and develops high-speed CMOSclocking and I/O circuits for synchronous chip-to-chip communication.

    Kevin S. Donnelly(A93) was born in Los Angeles,CA, in 1961. He received the B.S. degree in elec-trical engineering and computer science from theUniversity of California, Berkeley, in 1985 and theM.S. degree in electrical engineering from San JoseState University, San Jose, CA, in 1992.

    He was with Memorex, Sipex, and National Semi-conductor, specializing in bipolar and BiCMOSanalog circuits for disk-drive read/write and servochannels. In 1992, he joined Rambus, Inc., Moun-

    tain View, CA, where he has designed high-speedCMOS PLL circuits for clock recovery and data synchronization, and high-speed I/O circuits. He currently manages a group developing I/O circuitsand PLLs. His interests include PLLs and DLLs, I/O circuits, and dataconverters. He is a Member of the ISSCC Digital Subcommittee. He hasreceived several circuit design patents.

    Mr. Donnelly is a coauthor of the paper that won the Best Paper Awardat the 1994 ISSCC.

    Jun Kim was born in Tokyo, Japan, on November14, 1966. He received the B.S.E.E. degree from theUniversity of California, Berkeley, in 1989.

    From 1989 to 1991, he was with Vitelic, Inc.,where he worked on SRAM and DRAM develop-ment. Between 1991 and 1994, he was with SunMicrosystems, where he was involved in micropro-cessor and digital circuit design. Since 1994, hehas been with Rambus, Inc., Mountain View, CA,as a Designer of high-speed CMOS I/O and DLLcircuits.

    Pak S. Chau was born in Hong Kong in 1966.He received the B.S. degree in computer systemengineering from the University of Massachusetts,Amherst, in 1989 and the M.S. degree in electri-cal engineering from the University of California,Davis, in 1991.

    He was with National Semiconductor and Chron-tel, Inc., where he worked as an Analog Circuit

    Designer. In 1994, he joined Rambus, Inc., Moun-tain View, CA, where he has engaged in designinghigh-speed I/O and DLL circuits.

    Jared L. Zerbe was born in New York, NY, in1965. He received the B.S. degree in electrical en-gineering from Stanford University, Stanford, CA,in 1987.

    He joined VLSI Technology, Inc., in 1987, wherehe worked on semicustom ASIC design. In 1989, hejoined MIPS Computer Systems, where he designedhigh-performance floating-point blocks. Since 1992,he has been with Rambus Inc., Mountain View, CA,where he has specialized in the design of high-

    speed I/O and PLL/DLL clock recovery and datasynchronization circuits.

    Charles Huang received the B.S. degree in elec-trical engineering from the University of Fuzhou,China, in 1982 and the M.S. degree in electricalengineering from the University of Arkansas, Fayet-teville, in 1990.

    He was with ULSI and SGI, working in the areaof PLL and cache circuit design. He joined Rambus,Inc., Mountain View, CA, in 1994, where he hasbeing engaged in high-speed CMOS DLL and I/Ocircuit design.

  • 8/10/2019 Great DLL Article

    13/13

    644 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

    Chanh V. Tranwas born in Vietnam in 1964. Hereceived the B.S. degree in electrical engineeringand computer science form the University of Cali-fornia, Berkeley, in 1989.

    From 1989 to 1992, he was with National Semi-conductor Corp., Santa Clara, CA, where he workedon CMOS mixed-signal IC design in the DataAcquisition Group. In 1992, he joined Rambus Inc.,Mountain View, CA, where he has been involved inDLL and high-speed I/O design.

    Clemenz L. Portmann (S92M95) received theB.S.E.E. degree from the University of Washington,Seattle, in 1986, the M.S.E.E. degree from theUniversity of Hawaii at Manoa, Honolulu, in 1988,and the Ph.D. degree in electrical engineering fromStanford University, Stanford, CA, in 1995.

    From 1988 to 1989, he was a Visiting Researcherat Nagoya University, Nagoya, Japan, and the Toy-ohashi University of Technology, Toyohashi, Japan,under the Monbusho (Ministry of Education) schol-arship program. From 1989 to 1990, he was a

    Design Engineer for VLSI Technology, Inc., San Jose, CA, where he designedstandard cell libraries and SRAMs for ASIC designs. In 1995, he joined

    Rambus, Inc., Mountain View, CA, where he is engaged in the design ofhigh-speed I/O circuits and DLLs for DRAM interfaces.

    Donald Stark received the B.S. degree from theMassachusetts Institute of Technology, Cambridge,in 1985 and the M.S. and Ph.D. degrees fromStanford University, Stanford, CA, in 1987 and1991, respectively, all in electrical engineering.

    His research interests at Stanford included circuitdesign and CAD tools for analysis of voltage andcurrent distributions in VLSI circuits. From 1987to 1991, he was also a Member of the WesternResearch Laboratory, Digital Equipment Corp., PaloAlto, CA, working on CAD development and ECL

    circuit design. From 1991 to 1993, he was with the Semiconductor DeviceEngineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAMdesign. In 1993, he joined Rambus, Inc., Mountain View, CA, where hecurrently works on DRAM, high-speed I/O design, and CAD.

    Yiu-Fai Chan(S76M78) received the B.S. andM.S. degrees in electrical engineering and computerscience (with highest honors) from the Universityof California (UC), Berkeley, in 1972 and 1973,respectively.

    He joined Rambus, Inc., Mountain View, CA, in1992, where he is Director of Engineering, respon-sible for the development, application engineering,and customer support of high-speed mixed-signalcircuits, device packaging, signal integrity, and sys-

    tem engineering. Prior to that, he was with TeraMicrosystems in charge of developing chips for workstations based on theSparc architecture. He was with Altera Corp. from 1983 to 1990, where heled a team of engineers to develop the industrys first CMOS programmablelogic devices. From 1976 to 1983, he held various technical and managementpositions at Intersil, Inc. (later a division of General Electric), where he wasengaged in the development of various CMOS memories, microprocessors,and peripheral devices. It was there that he developed the first EPROM devicesin CMOS technology. From 1974 to 1976, he designed calculator and TVgame integrated circuits at National Semiconductor. He has received severalpatents in circuits and systems technologies.

    Mr. Chan is a member of Tau Beta Pi, Phi Beta Kappa, and Eta KappaNu. He received the University Science Fellowship from UC Berkeley andconducted research on solid-state devices and microwave acoustics. He haspublished in various IEEE technical publications and presented papers at IEEEtechnical conferences.

    Thomas H. Lee(S87M87), for a photograph and biography, see this issue,p. 585.

    Mark A. Horowitz,for a photograph and biography, see p. 528 of the April1999 issue of this JOURNAL.