Design and Optimization of Low-power Level-crossing ADCs

Design and Optimization of Low-powerLevel-crossing ADCs

Colin Weltin-Wu

Submitted in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy

in the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2012

©2012

Colin Weltin-Wu

All Rights Reserved

Abstract

Mixed-Signal Circuit Techniquesfor Low Power and High Speed

Level Crossing ADCs

Colin Weltin-Wu

This thesis investigates some of the practical issues related to the implementation of level-

crossing ADCs in nanometer CMOS. A level-crossing ADC targeting minimum power is designed

and measured. Three techniques to circumvent performance limitations due to the zero-crossing

detector at the heart of the ADC are proposed and demonstrated: an adaptive resolution algorithm,

an adaptive bias current algorithm, and automatic offset cancelation. The ADC, fabricated in 130

nm CMOS, is designed to operate over a 20 kHz bandwidth while consuming a maximum of 8.5

µW. A peak SNDR of 54 dB for this 8-bit ADC demonstrates a key advantage of level-crossing

sampling, namely SNDR higher than the classic Nyquist limit.

Contents

List of Figures iii

List of Tables x

1 Level-Crossing Sampling Systems 1

1.1 Thesis Outline and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Considerations for Low Power 8

2.1 Topology and Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Feedback DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Zero Crossing Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Digital Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.1 Interconnect Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Delay Distortion and Noise Analysis 25

3.1 Input-Dependent ZCD Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i

3.1.1 A Model For ZCD Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.2 Harmonic Distortion of a Level Crossing Quantizer . . . . . . . . . . . . . 33

3.1.3 Delay-Dispersion-Induced Harmonic Distortion . . . . . . . . . . . . . . . 39

3.2 Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2.1 ZCD Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Design of a µW Programmable LCADC 55

4.1 Adaptive Resolution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Programmable Asynchronous Timing Generation . . . . . . . . . . . . . . . . . . 66

4.3 Zero Crossing Detectors with Dynamic Current Bias . . . . . . . . . . . . . . . . 69

4.3.1 Three-Stage Zero Crossing Detector . . . . . . . . . . . . . . . . . . . . . 69

4.3.2 Current Bias DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.4 Segmented Capacitor Feedback DAC . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.5 Automatic Offset Calibration with Oscillator . . . . . . . . . . . . . . . . . . . . . 81

4.5.1 Relaxation Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.6 Other Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.6.1 Bootstrapped Input Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.6.2 Current-Steering Reconstruction DAC . . . . . . . . . . . . . . . . . . . . 87

4.6.3 Asynchronous Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.6.4 SPI and Calibration Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5 LCADC Measurements 90

ii

5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.3 LCADC System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.4 Calibration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.5 Delay Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.6 Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6 Conclusions and Suggestions for Future Work 113

6.1 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.1.1 Level-Crossing ADCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.1.2 Synchronous ADCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A Derivation of Distortion Equations 123

A.1 The Fourier Components cm,n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

A.2 Non-Uniform Delay Impact on Harmonic Distortion . . . . . . . . . . . . . . . . 125

A.3 Input Slope at T Rm(t) Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . 127

B Silicon Errata 129

iii

List of Figures

1.1 A 2-bit (4-level) level-crossing sampler, with uniform quantization levels. . . . . . 2

2.1 A simple LCADC implemented as a flash. . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Transfer characteristic of an (a) mid-rise and (b) mid-tread quantizer around 0. . . . 10

2.3 The digital output xq(t) of the LCADC contains the information of the samples,

shown as the circles in the graph. The continuous-time DAC generates a zero-

order-held-like reconstruction of the analog input from the samples. . . . . . . . . 11

2.4 Feedback LCADC topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Feedback LCADC topology with a capacitive DAC. The capacitive DAC recon-

structs vxq(t) as before, and subtracts it from vx(t) capacitively so that the ZCD

input range is reduced. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6 Schematic of the capacitive DAC, using a classic charge-redistribution array. The

operation of the dacRST and dacEN signals will be explained shortly. . . . . . . . 16

iv

2.7 Timing diagram of the control signals for the capacitive DAC. The ZCD signal

refers to the varying ZCD input, either V+ for the upper ZCD or V− for the lower

ZCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.8 Different configurations of a binary bus. . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 A ZCD stage modeled with a saturating transconductor and single output pole. . . . 26

3.2 On the left, the ZCD input can be approximated as a series of voltage ramps. The

dashed horizontal grey lines indicate the input range of the ZCD where the output

is not saturated. On the right, the actual ZCD response is superimposed over an

ideal response in grey. The actual ZCD zero crossing at time tc is delayed from the

ideal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Reducing bandwidth and increasing gain proportionally cause the first-order sys-

tem to behave more and more like an ideal integrator. . . . . . . . . . . . . . . . . 30

3.4 Delay vs. input slope, and the two approximations in (3.5) and (3.12). . . . . . . . 31

3.5 Delay of a three stage cascaded amplifier, by stage. Also shown is the delay of

single stage that has the same overall gain and bandwidth as the cascade. . . . . . . 32

3.6 With symmetric quantization levels, it is possible to express the quantized signal

with 2N−1 signals T Rm(t). For an odd-symmetric input, each of the T Rm(t) signals

is also odd symmetric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.7 The Fourier series approximation of vxq(t) for an almost full-scale time-normalized

sinusoid input, for 150, 800, and 4000 Fourier terms. . . . . . . . . . . . . . . . . 37

v

3.8 Wide variation in SFDR for small changes in input amplitude. The solid vertical

line represents a level of the reconstructed quantized signal, and the dotted lines

are the quantizer thresholds. The figure spans two LSB of the quantizer input range. 38

3.9 The first harmonics of the LCADC output with ZCD delay variation. The input is

a 20 kHz 1 V peak sine wave. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.10 Impact of dispersion on SDR, 500 harmonics considered. The input is a full scale

sinusoid of the specified frequency. . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.11 Dispersion as a function of stage bandwidth for a 3-stage cascade of identical

single-time-constant stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.12 Visualization of the function p0(∆, t) defined for all real ∆. . . . . . . . . . . . . . 45

3.13 Within one period of a full-scale sine wave there are 2N+1 level crossings, which

occur at times si relative to the beginning of each period. . . . . . . . . . . . . . . 45

3.14 Decomposition of a noisy T Rm(t) signal (a) into the ideal noiseless T Rm(t) signal

(b) and a pulse sequence representing jitter due to noise (c). . . . . . . . . . . . . 46

3.15 Noise corrupts the path of the ZCD output voltage, causing the zero crossing to be

disturbed randomly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.16 Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 10 kHz

input. Resolution bandwidth is 100 Hz. . . . . . . . . . . . . . . . . . . . . . . . 51

3.17 Detail of Fig. 3.16 over a 2 MHz span with the same RBW = 100 Hz. At low

frequencies, the impulse approximation is still valid. . . . . . . . . . . . . . . . . 52

vi

3.18 Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 1 kHz input.

Resolution bandwidth is 10 Hz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.19 Detail of Fig. 3.18 over a 2 MHz span, with RBW = 10 Hz. Note the LCADC

distortion components have moved closer in, but other than that the impulse ap-

proximation still predicts the noise level. . . . . . . . . . . . . . . . . . . . . . . . 54

4.1 Top level LCADC block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 Two methods for reducing the sampling rate. . . . . . . . . . . . . . . . . . . . . . 60

4.3 The behavior of an ideal adaptive resolution scheme (a) and the proposed (b). . . . 61

4.4 If the input slope changes, the trailing boundary can catch the input leading to

oscillations, shown in (a). The alternative trailing boundary behavior shown in (b)

does not suffer this problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.5 If the input crosses a threshold during a resolution change, there is a quantization

error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6 Asynchronous timing generator formed with an 11-tap delay line, each with 4-bit

control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.7 The timing edges from the delay line are launched by either an UP trigger or DN

trigger. If the line has not finished propagating when then next trigger arrives, it

restarts from the beginning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

vii

4.8 Schematic of a delay element, with the devices of interest highlighted in gray.

Devices without sizes given can be assumed to be operating as switches and non-

critical. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.9 The ZCD comprised of a three-stage preamplifier and a latch. . . . . . . . . . . . . 70

4.10 ZCD preamplifier stage. Thick devices are LVT. . . . . . . . . . . . . . . . . . . . 71

4.11 ZCD uni-directional latch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.12 Total ZCD delay with respect to input slope for bias currents between 230 nA and

4.3 µA. The higher bias currents are lower lines (indicating faster response). . . . 74

4.13 A delay-line based slope measurement quantizes the bias current to five discrete

levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.14 Adaptive current bias keeps delay within dotted lines (60 ns dispersion) until very

slow slopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.15 Binary-weighted arrays of PMOS transistors forming the current bias DACs for

the ZCDs. The bus of 8 bias voltages is multiplexed onto the BIASH and BIASL

wires. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.16 Adaptive resolution implemented with two capacitor DACs (detail of one shown

in inset, both are identical), to give a 24×LSB thermometric range. . . . . . . . . . 78

4.17 Structure of a unit capacitance cell. Projected views from both sides are shown as

well. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.18 Schematic of the R-string calibration DAC with two output taps. The inset shows

one of the 256 identical unit elements. . . . . . . . . . . . . . . . . . . . . . . . . 82

viii

4.19 Two-step SAR calibration loop using the existing ZCDs as comparators. The ca-

pacitor DACs are modeled as voltage sources, and the offset voltages are shown as

voltage sources in series with the ZCD inputs. First the ZCD offsets are calibrated,

then the VLSB step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.20 Relaxation oscillator with tunable 4-bit MOS capacitor load. . . . . . . . . . . . . 84

4.21 Traditional switch drivers (a) bring the gate to 0 when off. The proposed driver (b)

clamps the switch gate to the source when off. . . . . . . . . . . . . . . . . . . . 85

4.22 Gate drive boostrapping circuit. The input node is connected to the analog input,

and the output is connected to the capacitive feedback DACs. . . . . . . . . . . . 86

4.23 Differential segmented current steering DAC used for testing the LCADC. Top

layout shown in (a) is comprised of 256 identical unit cells, whose schematic is

shown in (b). The notation 4× means a group of four unit steering cells. . . . . . . 87

5.1 Die microphotograph. The output DAC is not counted toward the active area in

comparison in Table 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2 Schematic of the LCADC portion of the testing board used to produce the pre-

sented results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3 Test board and IC interface board. . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.4 Test DAC reconstructing the LCADC output for a 1 kHz, small-amplitude input. . . 95

ix

5.5 DAC reconstruction of LCADC output processing a cardiac pulse signal. At the

signal peaks and valleys the quantization step is 2.8 mV. During the high-slew

portions of the signal the quantization step increases, to a maximum of 14 mV (5×

minimum). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.6 Spectrum of a single tone, 300 Hz -3dBFS input. . . . . . . . . . . . . . . . . . . 97

5.7 Spectrum of a single tone, 1 kHz -3dBFS input. . . . . . . . . . . . . . . . . . . . 98

5.8 Signal fidelity as a function of input amplitude for a 1 kHz tone input. . . . . . . . 99

5.9 SNDR as a function of input frequency for a -3 dBFS input. . . . . . . . . . . . . . 100

5.10 Power consumption as a function of frequency for a -3 dBFS input tone. . . . . . . 101

5.11 Average sampling rate in function of input frequency. . . . . . . . . . . . . . . . . 103

5.12 SNDR when the AR is disabled, and enabled. . . . . . . . . . . . . . . . . . . . . 104

5.13 Energy per conversion as defined in (5.1) with respect to frequency. . . . . . . . . 105

5.14 Signal fidelity with varying source resistance. . . . . . . . . . . . . . . . . . . . . 106

5.15 Calibrated SNDR of a dozen dies. . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.16 Delay tuning range for a delay cell, vs. the 4-bit T IMEi[3:0] control value. The

multiple lines at each color are the delay for VDD = 0.8,1.0,1.2 V. . . . . . . . . . 110

5.17 Delay jitter histogram, N = 2000, µ = 1.033 µs, σ = 4.1 ns. . . . . . . . . . . . . . 111

5.18 Calibration oscillator frequency vs. the oscillator tuning code OSCTUNE[3:0] for

12 dies and VDD = 0.8,1.0,1.2 V. . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

x

List of Tables

2.1 Comparison of published low-power LCADCs. In the computation of energy/conversion,

the most optimistic numbers are used. A † indicates that offline post-processing

was used to achieve the SNDR numbers reported. . . . . . . . . . . . . . . . . . . 14

2.2 Representative standard cell power consumption breakdown. Leakage is reported

at VDD+10% and 125 C (worst case). . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Approximate coupling capacitances in units of aF/µm, for 130 nm CMOS. . . . . . 22

4.1 ZCD preamplifier device sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.1 LCADC system performance summary. For the full-scale input definition, 0.54 V

is the voltage required to traverse 190 digital output codes. This range avoids the

large-input-amplitude distortion problem. . . . . . . . . . . . . . . . . . . . . . . 107

5.2 ZCD offsets in mV. σ = 5.9 mV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.3 Low-speed delay element comparison, results are from measurement except for

simulated mismatch. The tuning range of this cell is reduced due to the digital

4-bit control. Its area is also much larger due to the 4-bit bias DAC. . . . . . . . . 112

xi

6.1 Comparison of this work to published LCADCs. A † indicates that offline post-

processing was used to achieve the SNDR numbers reported. . . . . . . . . . . . . 115

xii

Acknowledgments

I would like to thank my advisor, Professor Yannis Tsividis for guiding my research. In spite

of our differing opinions (for which I am also grateful, for keeping me on my toes) I am constantly

inspired by his insight into an unbelievably broad array of topics, engineering and otherwise.

I owe a debt of gratitude to my thesis committee: Profs. Peter Kinget and Charles Zukowski,

and Drs. Mihai Banu and Tod Dickson, for their careful reading and honest criticisms that ulti-

mately helped shape this thesis for the better.

I would like to thank my childhood mentors Andrew Chow and Serge Lang, who taught me the

value of safety and rigor, both in work and in life. Although I had always played with wires and

magnets as a kid, my real exposure to electronics came when Bob and Ellen Lasher hired me to

work at their store Al Lasher’s Electronics in Berkeley. They gave me free reign to use the back

shop area as my little playpen and the experience fostered my interest in electricity greatly. At

MIT, I will always remember the lessons of Profs. James Roberge and Rahul Sarpeshkar, who

can manipulate transistors in ways that still seem magical. In San Diego, I am fortunate to have

found a role model and friend in Prof. Ian Galton, whose honesty and candor keeps me in line, and

whose creativity never fails to amaze. I will never forget my time in Italy, and am grateful for the

opportunity given to me by Prof. Francesco Svelto and Dr. Enrico Temporiti, who took a chance

on me with no supporting evidence.

Colleagues at STMicroelectronics continued to provide support without obligation, which made

xiii

several tape-outs much easier, and Agilent Technologies’ generosity with measurement equipment

and application support eased the typically painful process of chip verification.

I thank all my friends, too many to name here, for all their encouragement, diversions, and

enriching my life experience. I am glad to have met all my colleagues in CISL for the blackboard

discussions, happy hours, and shared enthusiasm for electronics. In particular, I am lucky to have

had the opportunity to mentor Tao Mai, who reminds me very much of myself many years ago.

I thank my family, all over the world, for their support and inspiration: my father to this day

still works harder than I do. Finally, I am forever grateful to Lynn Bos for being my source of

unwavering support and encouragement throughout the thesis writing process.

xiv

Chapter 1

Level-Crossing Sampling Systems

Level crossing sampling (LCS) is a signal processing technique where an analog quantity (volt-

age/current/temperature/etc.) whose value is continuous and time-varying is mapped to a finite set

of quantized values in a manner that preserves more timing information in the signal than with

conventional Nyquist-rate sampling. To see this, in Fig. 1.1 shows an analog signal with range

[0,1) and an LCS block with a 2-bit output, meaning the analog signal is quantized to four possible

values. In this simple case, the interval [0,1) is divided into four equal sub-intervals: [0, 14), [

14 ,

12),

[12 ,

34), and [3

4 ,1). The output of the quantizer is then an indication of in whichever interval the

analog signal currently resides, and in this uniform quantization example a logical choice for this

indication is the midpoint of each sub-interval. Therefore, whenever the analog signal is greater

than 0.25 but smaller than 0.5, the LCS output is 0.375.

Note that, in contrast with conventional uniform sampling, the LCS block does not have an

understanding of “sampling time.” Its output is a memoryless and instantaneous function of the

1

2

LEVEL

CROSSING

SAMPLER

Figure 1.1: A 2-bit (4-level) level-crossing sampler, with uniform quantization levels.

input: for example, as soon as the input crosses from below 0.5 to above 0.5, the LCS quantizer

changes output from 0.375 to 0.625. In the conventional case, the quantizer output changes only on

integer multiples of a sampling period. This re-timing of a signal to a clock leads to the well-known

phenomenon of frequency aliasing, which is absent in the LCS system.

Level-crossing sampling is not a new technique; this scheme has existed for several decades in

the context of control systems and coding theory [1–4] but has only recently been considered for

VLSI implementation [5–8]. From the perspective of information flow, this method of sampling

is very natural: there is no reason to generate a sample if the input has not changed (no new in-

formation), and conversely generating a sample the instant the input changes gives the system the

fastest possible response time. In a uniformly sampled system there is up to a 1/ fS ( fS is the syn-

chronous sampling frequency) delay between when the signal crosses a (or several) quantization

level(s) and when it is actually acknowledged by the system. This instantaneous-response prop-

erty is what was initially attractive from a control standpoint, and is still explored in the context

of switching regulators [9] that have complex dynamics sensitive to loop delay. Another promis-

ing application for LCS systems is in embedded biomedical devices. Biological signals such as

nerve action potentials, signals in the brain, and ECG waveforms are all pulse-like. The high peak-

3

to-average ratio for these signals invite circuit and signal-processing topologies which adaptively

scale activity and power with the signal itself, a property fundamental to level-crossing sampling.

Furthermore, when things go wrong (e.g. cardiac arrythmia) the signals are unpredictable, so the

fast-response property of LCS can catch these signals while maintaining low power consumption

during normal periods of low activity.

The most striking property of LCS systems is their ability to perform complex signal processing

functions typically reserved for a dedicated DSP without a synchronous clock [5,6,10]. Of course

analog signal conditioning is a mature technology and also clock-less, but in the case of continuous-

time DSP (CTDSP) systems with LCS inputs, the overall system retains the wide programmability

of conventional digital systems. The lack of a system clock is a paradigm shift especially at very

high speeds, where more than half of the power budget is dedicated simply to distributing the clock

to all portions of the IC [11]. The problem of distributing the clock is compounded by the fact that

the die size of complex digital ICs is not–in spite of process scaling–generally decreasing, which

means tight clock skew at ever higher frequencies is a major challenge addressed by having local

re-timing PLLs, which again add to the power budget. The lack of a global clock not only reduces

the system power consumption but also eliminates the dominant source of spurious tones in mixed-

signal systems. The huge power draw from the supply at each transition of the clock, in addition to

the radiated power in the IC substrate means system designers must go to great lengths to mitigate

clock injection: at the system level, techniques such as spread-spectrum clocking reduce the spot

intensity of the injected signal, and at the physical level triple wells, guard rings and split domains

4

are all used to isolate sensitive nodes. Unfortunately all these techniques incur large costs in design,

verification, and silicon area.

Without a sampling clock, the number of samples produced by an LCS system is proportional to

the activity of the input signal. Therefore, LCS implicity provides input-activity-dependent power

consumption, and when these digital streams are processed by continuous-time DSP (CTDSP)

filters, the filters too exhibit activity-dependent power consumption [5, 6, 12]. Traditional analog

signal processing has also benefitted from bias power scaling with respect to the system require-

ments [13–15], but these techniques require additional analog circuitry to implement the power

scaling, incurring a power overheard. In the digital domain, the system clock frequency imposed

by the Nyquist sampling constraint puts a large lower bound on a DSP’s power consumption, even

when no signal is being processed. Thanks to the ubiquitous nature of digital circuits, a universal

effort to improve speed, reduce power, and increase integration density has been applied at the cir-

cuit level [16], architecture level [17], and even software level [18]. As numerous and effective as

these solutions are, like the analog solutions they too require dedicated hardware and/or software

to take advantage of the power savings; moreover, the automated synthesis (one of the most at-

tractive features of digital circuits) of circuits with multiple VDD domains, clock gating, and mixed

static/dynamic logic is much more difficult to verify. The digital methods do not address the power

consumption in the ADC itself, and thus the ADC still runs at the full Nyquist rate. For applica-

tions where the input signal has long periods of inactivity, continually sampling wastes power in

the ADC since it spends most of its time sampling a DC input. Based on the above argument LCS

5

have possible applications ranging from biomedical sensing [19], intelligent sensor networks [20],

ultrasonic measurement [21], and speech processing [22] to name a few.

As previously mentioned, a property unique to LCS systems is the lack of aliased quantiza-

tion error: the spectrum of a zero-order held reconstruction of the digital stream contains only

harmonics of the input [5, 6]. The quantization noise floor (which in the Nyquist case is aliased

quantization error [23] and not true noise) is not present since the quantizer does not sample the sig-

nal in time and thus does not alias out-of-band distortion components. This means an N-bit LCS

output has a quantization-contributed SDR better than or equal to a conventional N-bit Nyquist

ADC over any finite bandwidth, since in the level-crossing ADC (LCADC) case only a fraction of

the quantization error falls in band, whereas in the Nyquist case all the quantization error folds in

band. A corollary to this feature is that the signal conditioning requirements on the LCADC input

are greatly relaxed, as noise and signals out-of-band remain out-of-band at the LCADC output,

rather than getting aliased in band. As low-noise, sharp-cutoff filters are expensive analog blocks,

this saves money and allows more architectural flexibility in mixed-signal systems.

In spite of these attractive qualities and promising applications, LCS has thus far taken back

seat to conventional sampling methods. Without speculating all the reasons why, there are a few

standout disadvantages of LCS which must be addressed before it will have mainstream appeal.

First, there is no easy way to store LCS data (at least without losing some of the properties which

make LCS attractive in the first place) as there are RAMs and ROMs for uniformly-sampled data,

since information is partly encoded in the timing of the samples themselves. This restricts LCS to

real-time systems where the data is processed and used as it arrives. For some applications such

6

as power supply controllers, wireline transceivers, or always-on systems e.g. fault monitors this

is not a problem. On the other hand, for extremely low power systems such as remote sensors

which do not have a constant reliable connection to offload the data an LCS solution may face

implementation hurdles. In order to store sampled data for later transmission, it would first have

to be resampled and then stored along with the corresponding timestamps (thus incurring aliased

quantization noise and requiring a high-speed clock, which LCS is supposed to avoid).

Another disadvantage of LCS systems thus far reported is that they are several performance

generations behind equivalent synchronous systems which perform the same task. Although some

of this disparity can be attributed to the vast difference in human effort applied to asynchronous

versus synchronous systems, as will be explored in this thesis there are issues surrounding the

zero-crossing detectors used in LCS-based systems absent from conventional sampled systems

that hamper performance.

Along the same lines as the two preceding points, there is no efficient way to convert LCS

data into something that can be processed with conventional DSPs, so if an LCADC existed which

was superior to a conventional ADC for a particular application, the overhead in the interface

with a conventional system would add complexity and cut into the performance margin. Several

applications have been proposed for LCS, but often the simulations or models in these high-level

theoretical papers fail to consider the actual silicon implementation issues that would otherwise

make the solution infeasible. In light of these complications, it is clear that LCS-based circuits

cannot be used as drop-in replacements for their conventional counterparts and be expected to give

7

superior overall system performance. Rather, new architectures [9, 10, 24] need to be developed

from the ground up with LCS in mind, and only then can the possible benefits be realized.

1.1 Thesis Outline and Goals

This thesis explores the trade-offs and limitations in the ultra-low-power design space for LCADCs.

The design specifications are based on the LCADC in [10] (a complete LCADC-CTDSP system

for low frequencies), but leverages mixed-signal techniques to optimize power consumption. The

LCADC is almost fully integrated, lacking only four external resistors. A treatment of performance

bottlenecks in existing low-frequency LCADCs justifies the design choices made, resulting in an

LCADC that improves over existing designs with respect to classical FOM.

The objective of this thesis is to understand the practical issues facing LCS-based systems by

considering the circuit-level implementation of as many features as possible. By considering only

integrable or implementatable topologies (not relying on post-processing or massive digital correc-

tion) the realizable performance of LCADCs in real-world applications is better understood. The

thesis is organized as follows. Chapter 2 reviews previous work on LCS and possible architectures

with the goal of very low power consumption in mind. Chapter 3 gives a general treatment of

nonideal effects which face all LCS systems so far published–then Chap. 4 presents the implemen-

tation details of an LCADC mindful of the nonidealities, and Chap. 5 presents measurements. The

thesis is concluded in Chap. 6.

Chapter 2

Considerations for Low Power

2.1 Topology and Specification

We begin with the simplest possible implementation of an LCADC, by replicating the standard

flash ADC structure but replacing clocked comparators with zero-crossing detectors (ZCD), shown

in Fig. 2.1. A ZCD performs the comparison operation between two inputs, in continuous time,

i.e.

out(t) =

0 if V+(t)<V−(t),

1 otherwise.

(2.1)

Keep in mind that the voltages at the input to the ZCD are a pair of continuous-time analog signals,

and the output out(t) is a continuous-time binary signal. Such circuits have existed since the

beginning of monolithic ICs, for example the LM106/LM710 released in the 1970’s [25]. The

8

9

+

-

+

-

+

-

TH

vx(t)

TH2N-1

CONTINUOUS-TIME ANALOG CONTINUOUS-TIME DIGITAL

TH0

TH2N-2

LEV0

LEV2N-2

LEV2N-1

QUANTIZER

Figure 2.1: A simple LCADC implemented as a flash.

outputs LEVj(t) are a set of continuous-time, binary signals that together form a thermometric

representation of the input vx(t). Just as in the case of a synchronous flash ADC, the reference

voltages T H j for the ZCDs evenly divide the full-scale analog input range into 2N equal parts (for

an N-bit LCADC). One subtlety is whether a half-LSB offset with respect to 0 is introduced in the

T H j voltages: this differentiates between a mid-rise and mid-tread quantizer characteristic. The

terms “rise” and “tread” refer to the vertical and horizontal dimensions of a step in a staircase,

respectively. For example, if T H j were defined as

T H j =−1+j

2N−1 , for integer j ∈ [1,2N−1], (2.2)

10

(a) (b)

DIG

ITA

L O

UT

PU

T

ANALOG INPUT

DEAD ZONE

Figure 2.2: Transfer characteristic of an (a) mid-rise and (b) mid-tread quantizer around 0.

then 0 is a quantization threshold, at j = 2(N−1). This defines a mid-rise quantizer shown in Fig.

2.2(a), because the output code changes when the analog input transitions through 0. On the other

hand, if T H j were defined as

T H j =−1+1

2N +j

2N−1 , for integer j ∈ [1,2N−1], (2.3)

then the two quantization thresholds closest to 0 are ±1/2N ; this means there is a “dead-zone”

around 0, a region of analog input containing 0 where the ADC’s output code does not change. This

defines a mid-tread quantizer, as shown in 2.2(b). We note that if we want symmetric characteristics

from both curves, then the mid-tread quantizer has one fewer quantization level than the mid-

rise quantizer (just as a two’s compliment number extends to 2N negatively but only to 2N − 1

positively), but in practice the input signal to the quantizer does not exercise the full-scale input,

11

+

-xq(t) RECONSTRUCTION

DACvxq(t)vx(t)

+

-

Figure 2.3: The digital output xq(t) of the LCADC contains the information of the samples, shownas the circles in the graph. The continuous-time DAC generates a zero-order-held-like reconstruc-tion of the analog input from the samples.

so this difference is not seen. Furthermore, N is usually large enough that other factors such as DC

offsets in other parts of the circuit make this distinction of the behavior around 0 irrelevant. The

“dead-zone” phenomenon of mid-tread quantizers is only an issue for low-resolution applications,

such as ??.

In Fig. 2.3, the outputs of the ZCDs LEVj(t) are summed digitally to get the quantized rep-

resentation of the input, xq(t). Since it is the sum of binary signals, xq(t) is a continuous-time,

discrete-leveled signal which changes each time the input to the quantizer vx(t) crosses a T H j

quantization level. In essence, the information contained in xq(t) is the collection of pairs of

12

timestamps and quantized values corresponding to each sample from the quantizer. The digital

signal xq(t) can be reconstructed by a continuous-time DAC to get an analog representation of the

quantized signal, as shown in Fig. 2.3.

In an LCADC, the location of the input signal is always known with respect to the quantization

levels–samples are always spaced in amplitude one LSB apart but their spacing in time is unknown.

This is in contrast with a Nyquist rate ADC, where the amplitude difference between consecutive

samples is unknown, but in time they are always spaced 1/ fS. For this reason, although the flash

topology is structurally identical between an LCADC and synchronous ADC, the requirements are

rather different. In a Nyquist rate ADC, the comparators are powered off for most of the sampling

period, since we know deterministically when the next sample will arrive. On the other hand, when

a sample is taken, all must be enabled to capture the unknown vx(t) within the whole VFS full-scale

input range. In an LCADC, the ZCDs must constantly be enabled, but only two need to be on at

any given time–one comparing vx(t) against the next quantization level above, and one comparing

against the quantization level below.

To save power, it would be possible to individually power-gate each ZCD, so that only the

signal-flanking ZCDs are enabled. This is impractical as 2N gating signals would need to be

generated, and even if an efficient implementation were possible (for example, the thermometric

digital output could be shifted left and right and fed back to the gating inputs) there is always the

power and area overhead of implementing the control logic and adding a gate input to all 2N ZCDs.

Instead, a more elegant solution introduced in [7] wraps two ZCDs (which would be the only two

active ones in the flash topology) within a feedback loop such that the ZCDs are level-shifted to

13

+

-

+

-

UP/DOWN COUNTER

vx(t) xq(t)

+VLSB/2

-VLSB/2

UPPER TRACKING THRESHOLD

LOWER TRACKING THRESHOLD

vxq(t) DAC

Figure 2.4: Feedback LCADC topology.

always contain vx(t) within their comparison interval. This topology is shown in Fig. 2.4. The CT

quantized output xq(t) is reconstructed with a DAC as in Fig. 2.3, and two tracking thresholds are

created, one half an LSB above and one half an LSB below. A crossing of either ZCD updates the

output xq(t), which in turn shifts the tracking interval up or down; this way, the input vx(t) is always

compared against two thresholds VLSB apart, identical to the flash case. Within the bounds of the

loop delay and the fed-back tracking interval’s maximum tracking speed, the two architectures are

functionally identical. The feedback topology even offers the added benefit of requiring only two

ZCD offsets to compensate, since the same two ZCDs are used for every level comparison.

Within the general topology in 2.4, there is much room for innovation. We begin with a com-

parison between published low-power LCADCs in Table 2.1, which all use the feedback topology.

Aside from [24] which uses a flash (but for GHz frequencies), only one other publication was

found that used a flash [26], and this did not have silicon so it was unsuitable for comparison. Out

14

Work [27]† [8] [10] [19]†

Year 2005 2006 2008 2011Tech 130 nm 0.25 µm 90 nm 0.18 µmArea 0.1 (mm)2 0.25 (mm)2 0.06 (mm)2 0.96 (mm)2

Resolution 4 bit 6 bit 8 bit 8 bitAdaptive no yes

DAC Type Cap Res Res Hybrid (cap MSB/res LSB)Bandwidth 160 kHz 55 kHz 10 kHz 1 kHz

SNDR 60 dB 53 dB 58 dB 52 dBPower 180 µW 17 mW 50 µW 25 µWJ/conv 560 fJ/conv 345 pJ/conv 3.14 pJ/conv 31 pJ/conv

Table 2.1: Comparison of published low-power LCADCs. In the computation of en-ergy/conversion, the most optimistic numbers are used. A † indicates that offline post-processingwas used to achieve the SNDR numbers reported.

of fairness in comparison, we should note that refs. [27] and [19] both use high speed re-sampling

of the output and digital post processing, and [19] also implements the digital controller off chip,

not included in the power number.

The design target for the LCADC presented in this thesis was 8 bits, 20 kHz bandwidth; such

specifications make the LCADC suitable for both voice and biomedical applications. The goal

was to optimize the power consumption as much as possible, ideally below 10 µW. In addition,

the SNDR performance should be achievable with simple zero-order hold reconstruction, without

high-speed re-sampling or post processing. Finally, all the performance-affecting circuitry needed

to be integrated on-chip.

15

+

-

+

-

UP/DOWN COUNTER

vx(t) xq(t)

+VLSB 2

zcdEN

-VLSB 2

CA

P

DA

C

Figure 2.5: Feedback LCADC topology with a capacitive DAC. The capacitive DAC reconstructsvxq(t) as before, and subtracts it from vx(t) capacitively so that the ZCD input range is reduced.

2.2 Feedback DAC

In Fig. 2.4, the feedback DAC can be implemented in several ways. In the original imple-

mentation [27], a capacitive feedback DAC was used in a structure similar to a classic charge-

redistribution SAR ADC [28]. The topology of an LCADC with a capacitive feedback DAC is

shown in Fig. 2.5. The advantage of a capacitive DAC is that it does not draw static power. On the

other hand, capacitors do not pass DC, and need to be periodically refreshed. In clocked systems

there are regular occasions to reset the capacitors, but in an asynchronous system the refresh needs

to be carefully placed in order to maximize the “listening” time of the circuit and preserve its fast

response to level crossings. During these refresh periods the ZCDs can be disabled to save power

with the zcdEN signal.

The alternative DAC structure is with resistive unit elements, either as an R− 2R ladder or

R-string (as used by [8, 10]). One advantage for resistive DACs is they tend to be smaller than

capacitive DACs, and therefore the inherently monotonic R-string structure is possible; the analo-

16

REF+

REF-

dacEN dacRST

to ZCDsIN

×2N-1

×1

×2N

DAC[7:0]

dacEN

dacRST

Figure 2.6: Schematic of the capacitive DAC, using a classic charge-redistribution array. Theoperation of the dacRST and dacEN signals will be explained shortly.

gous thermometric capacitive DAC would typically be area-prohibitive. The main disadvantage of

resistive DACs is constant current dissipation. Also, with R-string DACs the ZCD would need to

see the full input voltage range across its common mode, since the subtraction occurs implicity in

the ZCD itself. At low supply voltages, special wide-common-mode-range input stages [10] are

necessary to maximize the input signal range. The added complexity causes the ZCD performance

to suffer.

For a low power design the capacitive DAC was chosen, with the DAC schematic shown in

Fig. 2.6. The core charge redistribution array is identical to that of a conventional SAR introduced

in [28]. However the input is brought in through a series capacitor whose size is nominally equal to

the sum of the DAC array capacitance, rather than sampled directly on the SAR array. If the code

driving the DAC[7:0] bus is equivalent to −xq(t) (a bipolar output is possible because the DAC

has positive and negative references) then the voltage into the ZCDs is equal to (vx(t)− vxq(t))/2.

17

Since the maximum difference between these two voltages is ±VLSB/2, the capacitive DAC level

shifts the input so the comparison interval of the ZCDs is now fixed to a (−VLSB/4,+VLSB/4)

interval around VCM, the desired ZCD common mode voltage. Although in Fig. 2.5 the thresholds

are shown as±VLSB/2, the true voltages are indeed (−VLSB/4,+VLSB/4) due to capacitive division.

The reason that figure (and all subsequent figures, unless noted) show ±VLSB/2 is that VLSB is

referred to the input full-scale, and it simplifies the discussion.

Regarding the problem of when to refresh the capacitors, consider that for an N-bit LCADC

with a maximum sinusoidal input frequency fMAX , the minimum inter-sample spacing (granularity

time) is given by [10]

TGRAN =1

2Nπ fMAX. (2.4)

For our proposed 8-bit, 20 kHz system this gives TGRAN = 62 ns. This means that once a sample

occurs, the system has at least 62 ns of dead time before the next sample occurs, during which

it can reset the capacitors. For a back of the envelope calculation, we assume Cunit for the 8-bit

DAC is 10 fF; then the total DAC capacitance is 2.56 pF. Even if we want 10 time constants for the

voltages to settle, the time constant is 6 ns so the aggregate DAC switch resistance Ron is 2.4 kΩ,

easily achievable.

Figure 2.7 shows the REFRESH and T RACK timing sequence for the capacitive DAC. Once

a ZCD crossing occurs, the system enters the REFRESH state, where the input is disconnected

from the capacitor array by dacEN going low, and the whole capacitor array is reset by dacRST .

During the discharge of the array, a new code is loaded into DAC[7:0]. This sets the positions of

18

IN

ZCD

REFRESHTRACKTRACK

dacRST

dacEN

DAC[7:0]

zcdEN

+VLSB 2

-VLSB 2

Figure 2.7: Timing diagram of the control signals for the capacitive DAC. The ZCD signal refersto the varying ZCD input, either V+ for the upper ZCD or V− for the lower ZCD.

the switches to REF+ or REF− for the bottom plates; the capacitors are not actually connected

to the references until dacRST goes low and dacEN goes high. When the DAC is re-enabled, the

dacRST clamp does not overlap dacEN to ensure that proper charge sharing occurs between the

input voltage and the DAC voltage. The charge redistribution is allowed to settle for a few τ time

constants, after which point the circuit is in T RACK mode. In reference to Fig. 2.5, during the

REFRESH period the ZCDs are disabled to save power with the zcdEN signal, shown in the inset

of Fig. 2.7. One shortcoming of this approach is that the differential voltage at the input of the

ZCD is half that of the input–this reduces the input derivative and thus increases the response time

of the ZCD, as will be explained in the following chapter. The latter disadvantage of reduced ZCD

19

differential input is a small price to pay for having a ZCD common mode input range limited to

VCM±VLSB/2.

2.3 Zero Crossing Detectors

By using a capacitive feedback DAC, the only remaining block which dissipates static power is

the ZCD. For example, in [10] the R-string DAC dissipates 10 µA, and the remaining 40 µA are

burned in the ZCDs. Without regeneration, the choice of topology for the ZCD is fairly limited.

In monolithic ZCDs, the original LM106 [25] was two differential pair stages driving a TTL level

shifter. The more recent LMV7219 uses a folded-cascode open-loop amplifier followed by a dif-

ferential pair and a differential-to-single-ended converter [29]. Many other examples exist, but

they are without fail high-gain amplifiers optimized for open-loop application. Most monolithic

designs are bipolar, so they can get away with one or two stages, but in CMOS realizations often

three or more stages are needed (an inverter counts as a stage, since with insufficient preceding

gain, it will be biased in class A and dissipate static power) [8, 10, 19, 27, 30, 31]. While a few

of these designs [30, 31] use timing signals to disable the ZCD after a crossing occurs (thereby

implementing a quasi-latch), the core high-gain cascaded amplifier is almost identical among all

designs.

In order to minimize the power consumed by the ZCD, there are two approaches. One is

to simply reduce the power to the ZCD–up to the point that performance remains unaffected, as

analyzed in Chap. 3. The second approach uses the fact that after each sample, the ZCD can be

20

Cell Leakage Power (nW) Energy/Transition (fJ)2-NAND 2 9

2 2-3 OR-AND-INVERT 4 24D FLIP FLOP 13 32FULL ADDER 8 37

Table 2.2: Representative standard cell power consumption breakdown. Leakage is reported atVDD+10% and 125 C (worst case).

shut off for at least a period of TGRAN–the minimum inter-sample time spacing–without fear of

missing a sample, as shown using the zcdEN signal in Fig. 2.7. A solution to increase the TGRAN

and thus the ZCD off time is implemented in Chap. 4.

2.4 Digital Logic

So far the focus has been on the analog blocks as the dominant power consumers, as has been the

case in previously reported designs. Toward a low power solution, it is necessary to know how

much can be done digitally with less than 10 µW. Given in Table 2.2 is a sample of digital cell

power in 130 nm CMOS1, broken down into static leakage power and dynamic power.

For a full-amplitude 20 kHz input to the 8-bit LCADC, the digital circuits must process 2×

28×20 kHz = 10 MS/s. In the feedback topology LCADC in Fig. 2.4 there is an up-down counter,

which can be implemented as 8 concatenated full adders with an output 8-wide register to prevent

glitches in the output code. For a full-scale input, the counter continuously ramps up and down

over its full scale. In a binary-coded ramp (i.e. 001,010,100... etc.), there are on average two bit

1The numbers have been slightly adjusted to avoid directly reporting standard cell data.

21

changes per step. If we assume that only the adders whose outputs change consume power (not

true) then the dynamic power consumption is

PD = (2×37fJ/transition+8×32fJ/transition)×10MS/s (2.5)

= 3.3µW (2.6)

This is an optimistic estimate, since this number only counts internal switching power and not

loading effects nor wiring capacitance.

2.4.1 Interconnect Capacitance

Digital logic is often placed as far from the sensitive analog circuitry as possible to prevent switch-

ing noise from propagating through the substrate into high-impedance nodes, which exist in a

charge-redistribution topology. However, the distance between the feedback DAC and digital logic

adds large parasitics to the digital bus driving the feedback DAC, which must switch at the full

sample rate of the LCADC. Let us consider a simple example with an 8-bit binary bus, shown in

Fig. 2.8(a). For a full-scale sinusoidal input, the DAC code will ramp from 0 to 255 and back to 0

in one period (modulo 256). For a binary encoded bus, there are a total of 254 0→ 1 transitions

when counting from 0→ 255→ 0. Furthermore, there are 127 differential transitions per period: a

differential transition is defined as the situation where two adjacent wires transition simultaneously

in opposite directions. These matter because for the wire being charged, the coupling capacitance

to the adjacent wire counts double. Table 2.3 shows some rounded capacitances between a high

22

(a)

b7 b6 b5 b4 b3 b2 b1 b0

(b)

b0 b7 b1b6b5 b3b4b2

Figure 2.8: Different configurations of a binary bus.

Min Space 0.2 µm 0.5 µm 1 µm To Substrate100 60 30 20 50

Table 2.3: Approximate coupling capacitances in units of aF/µm, for 130 nm CMOS.

metal layer to the substrate, and between parallel wires in the same metal with different spacing.

Using the table, we can compute the power dissipated just in driving the bus capacitance. For an

isolated 0→ 1 transition, the capacitance which needs to be charged is Csub +2×CC, where Csub

is the capacitance from the line to substrate, and CC is the coupling capacitance to the two adjacent

lines, assuming the line is in the middle. If an adjacent line is switching 1→ 0, we add another

CC term to double-count that capacitance (and the 1→ 0 transition takes no power itself since it is

being discharged). Adding up all the transitions for a full period, the frequency and length normal-

ized energy dissipated in a minimum-spaced bus comes out to be 180 fJ/µm·Hz. For example, a

300 µm long bus requires 1.1 µW just to drive its capacitance when processing a full-scale 20 kHz

input. There is an additional unaccounted-for increase in power since the line capacitance slows

down the transitions times, increasing the cross-conduction currents in the bus receiver.

23

Bus Reordering

For the regular binary bus 50% of the total transitions were differential. It makes sense that if the

most active lines (LSB) were isolated from one another by interleaving them with more slowly

moving lines (MSB) the differential transitions could be reduced. By simply reordering the bus

lines from being sequential to b0,b7,b2,b5,b4,b3,b6,b1 as shown in Fig. 2.8(b) the number of

differential transitions reduces to 8%, and the normalized energy to drive such a bus reduces to

135 fJ/µm·Hz. For the same 300 µm long bus the driving power is now 810 nW. This is just a

simple example; the optimal order depends on the encoding method of the bus and the particular

switching patterns (even more advanced techniques exist if some bus latency is allowed [32]).

Selective Spacing

The inter-wire coupling capacitance can go to zero if we space the lines infinitely far apart, and

since the coupling capacitance is 80% of the total capacitance on a wire, this would be desirable–

unfortunately this would require infinite area. Instead, only the lines with the most activity are

isolated. Starting with a minimum-spacing bus, increasing all the line spacing to 0.5 µm from 0.2

µm reduces the energy from 180 fJ/µm·Hz to 80 fJ/µm·Hz, but triples the width of the bus, a costly

sacrifice. However, if only the three MSBs are spaced at 0.5 µm, the energy still drops to 105

f J/µm·Hz, a 40% savings, and the bus widens by 50%. Finally, if we combine the two ideas and

reorder the bus as well as space the outer two signals (b0 and b1) 0.5 µm from the other lines, the

24

energy is 95 fJ/µm·Hz. To compare, driving this bus at full speed consumes 570 nW, as opposed

to 1.1 µW for the baseline case.

Although the numbers above were just approximations, it is clear that toward an optimal

minimum-power implementation the total power consumed becomes the sum of many small con-

tributions rather than being dominated by a single block, and attention must be paid equally to all

parts of the system.

Chapter 3

Delay Distortion and Noise Analysis

In this chapter, the impact of ZCD nonidealities on LCADC performance is analyzed. Although

the math is tedious and at times opaque, ultimately simple results are achieved which help direct

the design of the ZCD.

3.1 Input-Dependent ZCD Delay

We first consider how the slope of the input signal at the time of zero crossing affects the delay

through the ZCD, a phenomenon called delay dispersion. Then, a mathematical model of an ideal

LCADC output is derived based on Fourier series, and these two results are combined to achieve

an estimate of the dispersion-induced distortion.

25

26

gm ro CL

vin voutVSWro

VSWgmro

vin

iout

Figure 3.1: A ZCD stage modeled with a saturating transconductor and single output pole.

3.1.1 A Model For ZCD Delay

In the context of the LCADC with a capacitive DAC, the ZCD has a very well-defined input to

process. From Fig. 2.7, we see that the ZCD input is comprised of a chopped signal, and to

simplify we can approximate each segment of the signal as a voltage ramp. For an 8-bit converter

with a VFS of 1 V, the LSB step is 4 mV so a small-signal approximation–at least of the first stage

of the ZCD–is valid [33]. A simple model for a ZCD stage is shown in Fig. 3.1. It models

a transconductor with a load capacitance, and a hard saturation that limits the output voltage to

±VSW . Writing AV = gmro and τ = roCL, we know the step response of this circuit is

vout,s(t) = AV

(1− e−

tτ

)· (1V). (3.1)

We are more interested in the ramp response. Again in the context of an LCADC application, the

ZCD input can be approximated as a linear ramp, whose total span is larger than the ZCD input

linear range. Let us imagine the input looks as in Fig. 3.2, where at initial time t = 0 the transcon-

ductor leaves its saturated state (meaning changes in the input voltage cause the transconductor

27

current to change). The ramp rises with a slope a, so that it will cross 0 at t =VSW/aAV seconds. If

the ZCD were ideal, the ZCD’s zero crossing would coincide exactly with the input’s zero crossing

at t =VSW/aAV seconds. However due to the time constant τ = roCL, the ZCD’s response will be

delayed. The actual response is given by the integral of the step response (3.1), with the initial

condition that vout,r(0) =−VSW :

vout,r(t) =−VSW −aAV τ+aAV t +aAV τe−tτ . (3.2)

This equation is plotted in the right side of Fig. 3.2. This can be algebraically manipulated to arrive

at a closed form solution for the delay between the output zero crossing and input zero crossing

given by

td = τW(−e−

(1+ VSW

aAV τ

))+ τ (3.3)

where W(·) is Lambert’s function [34], defined as a solution x = W(y) to the differential equation

y = xex. Unfortunately this is not too helpful since W itself does not have a closed form.

What is more interesting is if we analyze a few cases, for a fixed AV and τ. First, consider when

the input slope is very small. We are interested in the delay between the actual zero crossing, tc

in Fig. 3.2, and the ideal zero crossing at VSW/aAV seconds. To solve for tc, we begin with the

equation for the ramp response in (3.4) and solve the equation vout,r(tc) = 0, since we want the

time where the response crosses zero.

0 =−VSW −aAV τ+aAV tc +aAV τe−tcτ . (3.4)

28

VSWAV

VSWaAV

a

VSW

ZCD DELAY

t t

vin vout

tc

td

Figure 3.2: On the left, the ZCD input can be approximated as a series of voltage ramps. Thedashed horizontal grey lines indicate the input range of the ZCD where the output is not saturated.On the right, the actual ZCD response is superimposed over an ideal response in grey. The actualZCD zero crossing at time tc is delayed from the ideal.

Since the slope is very low, in absolute terms the zero crossing time tc is large; therefore the

exponential term is negligible so we are left with

0 =−VSW −aAV τ+aAV tc (3.5)

tc =VSW

aAV+ τ (3.6)

We can thus express the difference between tc and the ideal zero crossing as td , the ZCD delay:

td = τ. (3.7)

What this says is that a first order system responding to a linear ramp will eventually settle to a con-

stant τ time shift between the input and output. To quantify what a small input slope means, recall

29

that the ideal zero crossing does not occur until VSW/aAV , so if VSW/aAV τ, this approximation

(the exponential is negligible) is valid.

If we consider the alternative scenario where the input slope a is very high, then the ideal zero

crossing time VSW/aAV τ. To solve (3.4) to get tc, again we want to solve vout,r(tc) = 0, but this

time we need to expand the exponential term into its Taylor series

0 =−VSW −aAV τ+aAV tc +aAV τ

(1− tc

τ+

t2c

2τ2 −t3c

6τ3 . . .

)(3.8)

0 =− VSW

aAV τ−1+

tcτ+

(1− tc

τ+

t2c

2τ2 −t3c

6τ3 . . .

)(3.9)

VSW τ

aAV=

t2c2+

(− t3

c6τ

+t4c

24τ2 . . .

)(3.10)

tc ≈√

2VSW τ

aAV(3.11)

Again we are finally interested in td , the delay between the actual ZCD zero crossing time tc and

the ideal crossing time, expressed as

td ≈√

2VSW τ

aAV− VSW

aAV. (3.12)

Note that AV/τ is the unity gain bandwidth of this single-pole amplifier, so if we kept that con-

stant and let AV → ∞ and 1/τ→ 0 then the system behaves more and more like an idea integrator,

looking at the frequency response in Fig. 3.3. When this happens, the high order terms of the Tay-

lor expansion in (3.10) go to 0, and the response of the system to a ramp is quadratic, as expected.

Then the approximation (3.12) becomes exact.

30

decreasing bandwidth

w

|A(jw)|

incr

easi

ng

gai

n

Figure 3.3: Reducing bandwidth and increasing gain proportionally cause the first-order systemto behave more and more like an ideal integrator.

From the above approximations we can characterize the ZCD’s response by two simplifications,

depending on the relation between the time the ZCD input is not saturating the transconductor

(VSW/aAV , shown in Fig. 3.2(a)) and the time constant τ of the ZCD. If VSW/aAV τ, then

the ZCD delay between ideal zero crossing and actual zero crossing is roughly constant, τ. We

call this the constant delay approximation. On the other hand, if VSW/aAV τ, then the ZCD

behaves like an ideal integrator, and the ZCD delay decreases by (3.12). We call this the integrator

approximation. The two approximations for the delay are shown in Fig. 3.4, overlaid atop the

exact delay expression from (3.3). For an example single stage with VSW = 0.4 V, a bandwidth of

5 MHz, and AV = 10, then the input slope that divides the output response between the constant

delay approximation and the integrator approximation is a = VSW/τAV = 1.2 V/µs. Note that for

steady-state LTI system analysis, all that is needed is the frequency of the input to characterize the

31

103

104

105

106

107

108

109

10−1

100

101

102

Input Slope (V/s)

Del

ay (

ns)

Exact DelayConstant Delay ApproximationIntegrator Approximation

Figure 3.4: Delay vs. input slope, and the two approximations in (3.5) and (3.12).

system response. Since we do not have a steady-state input, both the derivative and the amplitude

of the input affect the ZCD’s response.

This figure suggests that a modest bandwidth of 5 MHz can support a 1 V peak 20 kHz sine

wave with no delay variation since dv/dt|MAX = 125 kV/s. However, there is the practical issue

of gain. With an ADC LSB of 4 mV, the ZCD needs at least 48 dB of gain just to amplify an LSB

signal to logic levels, and in reality much more. So then the ZCD delay becomes the sum of several

stage delays, and the slope at the input of each stage increases down the chain. Figure 3.5 shows

32

103

104

105

106

107

108

109

10−2

10−1

100

101

102

Input Slope (V/s)

Del

ay (

ns)

First StageSecond StageThird StageTotal DelaySingle Stage Approximation

Figure 3.5: Delay of a three stage cascaded amplifier, by stage. Also shown is the delay of singlestage that has the same overall gain and bandwidth as the cascade.

the delay through the same amplifier, cascaded 3 times. The total delay is shown, as well as what

a single stage delay would be, had it the same -3 dB bandwidth and gain of the three cascaded

stages. The single stage approximation severely underestimates the overall delay since in the real

amplifier the signal must physically travel through three times as many devices [21]. Therefore the

simple model of a ZCD stage shown in Fig. 3.1 is useful for estimating a single stage’s delay as a

function of small-signal parameters gm, ro, etc. but cannot be directly applied to a cascade. As a

33

result, in the design process of the ZCD presented in sec. 4.3, the final device sizes were achieved

through optimization scripts.

3.1.2 Harmonic Distortion of a Level Crossing Quantizer

We now consider how the slope-varying ZCD delay affects the spurious performance of the LCADC.

Although in section 2.1 we chose an LCADC topology which only uses two ZCDs in a feedback

loop, for analysis we can still use a flash ADC as shown in Fig. 2.1–during normal operation the

level shifting in the feedback structure (shown in Fig. 2.4) is transparent to the ZCD and we can

think of the flash LCADC as the feedback LCADC “unrolled.” In the previous section, we de-

fined the LEVj binary signal as the output of the jth ZCD. While this is the most straightforward

decomposition, by using our assumption that the quantization thresholds are uniformly spaced and

symmetric around 0, we can conveniently group pairs of positive and negative thresholds together,

as demonstrated in Fig. 3.6 for 8 quantization levels. This results in half the number of signals,

and odd-symmetric waveforms for pure sinusoidal inputs. Defining a set of positive thresholds as

Tm =1

2N +m

2N−1 , for integer m ∈ [0,2N−1−1], (3.13)

34

then the input signal is decomposed into simple signals

T Rm(t) =

−1 if vx(t)<−Tm,

+1 if vx(t)> Tm,

0 otherwise.

(3.14)

The index m indicates the distance of the threshold from 0; m = 0 are the thresholds closest to

0, and m = 2N−1− 1 corresponds to the thresholds closest to ±1. Using this decomposition, the

continuous time reconstruction of the quantized signal is just the scaled sum of all the T Rm levels,

vxq(t) =1

2N−1

2N−1−1

∑m=0

T Rm(t). (3.15)

If we assume we have a periodic input vx(t), then each of the T Rm(t) signals are periodic, and

by linearity we can then write the Fourier series of the reconstructed quantized output vxq(t) as the

summation of the Fourier series for each T Rm(t), precisely

F

vxq(t)=

∞

∑n=−∞

fne−2π jnt (3.16)

35

T0

TR0

TR1

TR2

TR3

-T0-T1-T2-T3

T1T2T3

Figure 3.6: With symmetric quantization levels, it is possible to express the quantized signalwith 2N−1 signals T Rm(t). For an odd-symmetric input, each of the T Rm(t) signals is also oddsymmetric.

where fn is the nth Fourier component of vxq, expressible as

fn =2N−1−1

∑m=0

cm,n, (3.17)

cm,n = Fn T Rm(t) (3.18)

The notation Fn T Rm denotes the nth Fourier coefficient of T Rm(t). This is a useful decompo-

sition, because it allows us to analyze the distortion of the level-crossing quantization with simple

functions. If we have a pure sinusoidal input then ideally the Fourier series of the output F

vxq(t)

will have only one non-zero coefficient, f1. However since the quantization introduces some dis-

tortion we will have harmonics of the input at f3, f5, f7. . . (with a symmetric quantizer we should

36

ideally only have odd-order distortion). The above decomposition allows us to express the Fourier

coefficients and therefore distortion components as the sum of the cm,n Fourier coefficients of T Rm

signals, which are easy to compute.

Appendix A.1 derives the Fourier coefficients cm,n for a full-scale, frequency normalized sinu-

soidal input vx(t) = sin(2πt) as

cm,n =

2cos(narcsin(Tm))

π jn2N−1 , n ∈ [1,3,5 · · ·)

0, otherwise.(3.19)

This is intuitively satisfying, because (a) there are only odd harmonics which is what we would

expect from a signal with odd symmetry, and (b) the coefficients cm,n are purely imaginary, which

means the Fourier series is a real sine series (signals with odd symmetric have sine series, signals

with even symmetry have cosine series). We can simplify the Fourier expansion of vxq(t) to:

vxq(t) =∞

∑n=1

2N−1−1

∑m=0

bm,n sin(2πnt) (3.20)

bm,n = j(cm,n− cm,−n) =

4cos(narcsin(Tm))

πn2N−1 , n ∈ [1,3,5 · · ·)

0, otherwise.(3.21)

As a sanity check, if we let N = 1 and Tm = 0, then this equation boils down to bn = 4/nπ, which

are the Fourier coefficients of a square wave. The partial Fourier sums of this series are shown in

Fig. 3.7. Observe that quite a few terms need to be included before the step-discontinuities of

the quantizer levels are accurately modeled-this is due to the fact that the many small steps (512

37

0.22 0.23 0.24 0.25 0.26 0.27

0.97

0.975

0.98

0.985

0.99

0.995

Time (s)

Vol

tage

(V

)

Input150 Terms800 Terms4000 Terms

Figure 3.7: The Fourier series approximation of vxq(t) for an almost full-scale time-normalizedsinusoid input, for 150, 800, and 4000 Fourier terms.

per period for an 8-bit quantizer) of the LCADC waveform have a lot of high frequency content

beyond the fundamental.

Another aspect of the LCADC waveform which can be explained by the Fourier series is the

extreme input-amplitude-sensitivity of the harmonic distortion components. In Fig. 3.8, the SFDR

(defined as the difference in dB between the fundamental component and the largest distortion

component) is plotted as a function of the input amplitude. The sharp peaks and valleys of this

curve, over an extremely narrow input amplitude window of 2 LSB from full scale, can be ex-

38

0.985 0.99 0.995 1

−100

−95

−90

−85

−80

−75

−70

Input Amplitude (V)

SF

DR

(dB

)

Figure 3.8: Wide variation in SFDR for small changes in input amplitude. The solid verticalline represents a level of the reconstructed quantized signal, and the dotted lines are the quantizerthresholds. The figure spans two LSB of the quantizer input range.

plained as follows. In [35] the LCADC quantization error waveform was viewed as the sum of

two components, a high-frequency sawtooth wave corresponding to the intervals in which the in-

put is quickly moving, and a low-frequency bell shaped wave, corresponding to the local peaks

of the wave. By varying the input amplitude slightly, the sawtooth (high frequency) is mostly un-

affected, while the bell wave is strongly affected, which modulates the low-frequency distortion

39

components. In practical situations, because these peaks and valleys are so narrow, the measured

spurious performance of the LCADC is some sort of average of the extrema.

3.1.3 Delay-Dispersion-Induced Harmonic Distortion

As derived in Sec. 3.1.1, a real ZCD has a delay which varies depending on the slope of the input

signal. The variation of ZCD delay is called delay dispersion. We can incorporate the appropriate

signal-dependent delay into each of the Fourier components of (3.18), written generally as

cm,n = Fn

T Rm

(t− f (v′x(t))

). (3.22)

If we return to a pure sinusoid, as shown in Fig. 3.6 each T Rm(t) signal has two rising transi-

tions and two falling transitions per period. At each of these transitions the magnitude of the input

slope is the same, by symmetry of the sine wave. Therefore each T Rm(t) signal is delayed by a

constant amount of time, dm. Combining equations (3.16,3.17,3.18) we can write

F

vxq(t)=

∞

∑n=−∞

2N−1−1

∑m=0

Fn T Rm(t−dm)e−2π jnt . (3.23)

We note that a constant shift in time dm of a periodic function results in a phase rotation by

e−2π jndm of the nth Fourier coefficient, so we simplify to get

40

F

xq(t)=

∞

∑n=−∞

2N−1−1

∑m=0

Fn T Rm(t−dm)e−2π jnt (3.24)

=∞

∑n=−∞

2N−1−1

∑m=0

e−2π jndmFn T Rm(t)e−2π jnt (3.25)

=∞

∑n=−∞

2N−1−1

∑m=0

e−2π jndmcm,ne−2π jnt (3.26)

From this point onward, to unclutter we omit explicit bounds of summation, with the understand-

ing that summation over the n index is on (−∞,∞) and that summation on the m index is on

[0,2N−1−1]. In appendix A.2 we derive that the power of the nth harmonic distortion component

of F

xq(t)

is equal to the power of fn, the nth distortion component of the ideal quantizer, plus

a term en equal to the extra distortion due to delay dispersion

en = 2πn ∑r 6=s

dr,sIcr,ncs,n−2π2n2

∑r 6=s

d2r,sR cr,ncs,n, (3.27)

where dr,s is defined as the difference between ZCD delay for the T Rr level and the T Rs level. We

can use this to predict the increase in distortion power as a function of dr,s in order to determine

the maximum dr,s allowed for a given distortion specification. To do so, we first have to define the

dm delays. The delay variation is modeled as a linear slope-dependent variable,

d(t) = TDEL−TSPREAD

v′x,MAXv′x(t), (3.28)

41

where TDEL is the absolute delay, and TSPREAD is the difference between maximum and minimum

delays. and v′x,MAX is the maximum derivative of the input to normalize the delay dispersion. As

derived in appendix A.3, for a full-scale frequency normalized input, the magnitude of the slope at

each of the T Rm(t) transitions is 2πcos(arcsin(Tm))). We can therefore write

dr,s = 2πTSPREAD

v′x,MAX(cos(arcsin(Tr))− cos(arcsin(Ts)))). (3.29)

If we plug (3.29) into (3.27), we can plot the effect of delay dispersion on harmonic distortion,

as shown in Fig. 3.9. We see that the close-in terms are most affected.

For a fixed dispersion, the effect is more pronounced at higher input frequencies, since a fixed

time delay results in a larger phase error at higher frequencies. Figure 3.10 shows the variation

in SDR for three different input frequencies on a log scale, necessary to see all plots clearly. For

this example, SDR is defined as the power of the signal divided by the sum power of the first 500

harmonics. To keep the delay variation from affecting SDR by more than 1 dB at all frequencies

the ZCD dispersion must be kept below 60 ns.

To use this information to specify the ZCD, recall from (3.5) and (3.12) that the maximum

delay for low-slope inputs is τ, the time constant of the single-stage ZCD, but goes by the square

root of the input slope for high-slope inputs. Unfortunately with a cascade of stages there is no

simple solution for overall delay. The delay dispersion of a chain of 3 stages with a total gain of

80 dB, with a worst-case 20kHz 1 V peak input is plotted using (3.3) in Fig. 3.11. From this plot

it is seen that each stage needs a bandwidth greater than 3.5 MHz for a dispersion less than 60 ns.

42

5 10 15 20 25 30−80

−75

−70

−65

−60

−55

−50

−45

Harmonic Number (× fin

)

Rel

ativ

e P

ower

(dB

c)

No Delay Dispersion100 ns Delay Dispersion200 ns Delay Dispersion

Figure 3.9: The first harmonics of the LCADC output with ZCD delay variation. The input is a20 kHz 1 V peak sine wave.

At this point, it is worth noting that a cascade of three 3.5 MHz amplifiers can have an absolute

delay up to 140 ns, which is larger than the required TGRAN discussed in section 2. This means that

in the generic LCADC topology shown in Fig. 2.4, the requirement on absolute ZCD delay given

by Eq. (2.4) is a more stringent specification than the delay dispersion requirement. It is not until

TGRAN is increased thanks to the adaptive resolution algorithm introduced in section 4.1 that the

dispersion becomes a limitation and must be compensated.

43

20 40 60 80 100 120 140 160 180 20010

−6

10−5

10−4

10−3

10−2

10−1

100

101

Delay Dispersion (ns)

SD

R D

egra

datio

n (d

B)

200 Hz Input2 kHz Input20 kHz Input

Figure 3.10: Impact of dispersion on SDR, 500 harmonics considered. The input is a full scalesinusoid of the specified frequency.

3.2 Noise Analysis

3.2.1 ZCD Noise

We have just considered the effect of a deterministic, signal-dependent delay in the ZCD output

on the fidelity of the reconstructed quantizer output vxq(t). Now, we analyze the effect of random

variations in delay time, i.e. jitter, on vxq(t). Before proceeding, we define the notation of the pulse

44

1 2 3 4 5 6 7 8 9 1010

1

102

103

Stage Bandwidth (MHz)

Del

ay D

ispe

rsio

n (n

s)

Figure 3.11: Dispersion as a function of stage bandwidth for a 3-stage cascade of identical single-time-constant stages.

shown in Fig. 3.12:

p0(∆, t) =

−1 when ∆ < 0, ∆≤ t < 0

+1 when ∆≥ 0, 0≤ t < ∆

0 otherwise.

(3.30)

45

|D|

1

|D|

-1

D≥0 D<0

Figure 3.12: Visualization of the function p0(∆, t) defined for all real ∆.

s0 s1 s2 s3

s2(N+1)s2(N+1)-1

Figure 3.13: Within one period of a full-scale sine wave there are 2N+1 level crossings, whichoccur at times si relative to the beginning of each period.

By time-shifting and scaling a square pulse we get the Fourier transform of p0(∆, t) as [36]:

P0(∆, f ) = e− jπ∆ f∆sinc(π∆ f ). (3.31)

If we feed a noiseless N-bit quantizer with a full-scale input of frequency fin, then the sample

times themselves form a periodic sequence. The input is periodic with time 1/ fin, so over one

period without loss of generality from 0≤ t < 1/ fin, we have a length 2N+1 sequence of crossing

times s0,s1,s2 . . . such that the crossing of the first positive level closest to 0 is s0, the next level

from 0 at time s1, etc. and this sequence repeats at multiples of 1/ fin. This is depicted in Fig. 3.13.

46

(a)

(b)

(c)

Figure 3.14: Decomposition of a noisy T Rm(t) signal (a) into the ideal noiseless T Rm(t) signal(b) and a pulse sequence representing jitter due to noise (c).

Now let us suppose that when resolving the ith zero crossing of the first period from 0 (at time si),

there is a disturbance in the ZCD which delays its decision time by a time ∆i. Then we can write

the resulting quantizer output (with disturbance) vxqn(t) using the p(∆, t) notation:

vxqn(t) = vxq(t)+1

2N−1 p0(∆i, t− si) (3.32)

The ith level is delayed which is equivalent to appending a small pulse of time ∆i whose height

is equal to the width of each level. Note this is valid for negative ∆i as well, since in this case

the negative pulse subtracts from the level, causing the crossing to occur earlier. From Eq. (3.15)

we know the input can be decomposed into a sum of three-level signals T Rm(t). Each of these

signals is derived by comparing the crossings of the input signal with different thresholds Tm, in

the topology of Fig. 2.5. Therefore, each of the transitions of each T Rm(t) is corrupted by noise.

We can then express the actual T Rm(t) with noise as the sum of an ideal noiseless T Rm(t) signal,

plus a random pulse sequence as shown in Fig. 3.14.

47

Let us define the sequence ∆i,k, which is a Gaussian sequence of independent samples of noise,

which we assume to be cyclostationary on the i index: the suffix i,k refers to the jitter on the ith

crossing of the kth period [k/ fin,(k+1)/ fin) of vx(t). In the context of the LCADC, cyclostation-

arity is reasonable since after each crossing the ZCD resets itself so each sample is independent

from the previous, and the only difference from sample to sample is the slope of the ZCD input,

which is a periodic function. From [37] we can decompose the cyclostationary process ∆i,k into a

stationary process and its envelope, namely

∆i,k = αi( fin)∆τi,k. (3.33)

From the previous section we know the delay is a function of input slope. In our ZCD model

from Fig. 3.1, let’s assume that when the input is saturating the transconductor the voltage at the

output is clamped to −VSW . As the output voltage leaves the clamped state, random noise causes

the output voltage to cross 0 at a different time, thus disturbing the zero crossing randomly. This

is shown in Fig. 3.15. The exact calculation of the statistics of a noisy signal tc seconds after an

initial condition is the solution to a stochastic differential equation [38] and is too complicated to

lend insight. Instead, we can approximate the circuit in Fig. 3.1 as a white current noise source

(modeling drain current and resistor current noises) being integrated onto the capacitor. We know

that for a white noise current charging a capacitor, jitter ∝√

delay, [33, 39, 40]. Thus we can

evaluate the jitter of the ZCD for the maximum input slope, and the samples ∆τi,k are the samples

of jitter at that slope. Then αi( fin) is the scaling factor for each of the i level crossings, which

48

VSW t

vouttctc

Figure 3.15: Noise corrupts the path of the ZCD output voltage, causing the zero crossing to bedisturbed randomly.

decreases as fin increases since the ZCD delay decreases, and vice versa. Then we can rewrite the

noisy quantizer output as

vxqn(t) = vxq(t)+1

2N−1 vn(t). (3.34)

From the above discussion and Fig. 3.14, the noise term vn(t) is the sum of scaled shifted

pulses, the statistics of which are cyclostationary with period 1/ fin. Expanded fully, we can write

vn(t) =∞

∑k=−∞

2N+1−1

∑i=0

p0

(αi( fin)∆τi,k, t− si−

kfin

). (3.35)

The total added noise is the linear sum of pulse trains, repeating at fin, with pulses of random width

αi( fin)∆τi,k. To analyze such pulse trains, typically the approximation is made that the pulse width

αi( fin)∆τi,k is much smaller than the period, 1/ fin [41]. The Fourier transform of this pulse given

in (3.31) substituting ∆ = αi( fin)∆τi,k. The transform is a sinc function with a main lobe width

of 1/αi( fin)∆τi,k, so by saying the pulse width is much narrower than the period is equivalent

to saying the pulse energy is broadband compared to the modulation frequency fin, and hence it

49

appears white within the band of interest. Thus the pulse train is approximated as a Dirac delta

train, whose amplitudes are modulated by the random sequence ∆τi,k:

xn(t) =2N+1−1

∑i=0

αi( fin)∞

∑k=−∞

∆τi,kδ

(t− si−

kfin

). (3.36)

The sequence of points can be thought of as the sum of 2N+1 independent, identically dis-

tributed sub-sequences at rate fin. Each of these sub-sequences is scaled by αi( fin), and the

integrated noise N2n,i of each sequence over a band [0, fBW ) is given by

N2n,i = fBW finσ

2n, (3.37)

where σ2n is the jitter variance of the ZCD at maximum input slope. The appearance of fin in this

equation represents that within a 1 second interval, fin pulses occur, thus in this case it is a unit-less

multiplier. So the total noise power due to ZCD jitter over a [0, fBW ) bandwidth is

Snn( f ) =1

22N−2

2N+1−1

∑i=0

α2i ( fin) fBW σ

2n (3.38)

=1

22N−2 α2RMS( fin) fin fBW σ

2n. (3.39)

Note that this analysis follows closely the noise analysis for oscillators presented in [40]. Since the

50

fundamental has amplitude 1 and therefore power 1/2, the SNR is given by

SNR = 10log10

[2N−3

α2RMS( fin) fin fBW σ2

n

](3.40)

The SNR improves by 3 dB for every bit of resolution. This is because for every bit, the number of

uncorrelated noise sources doubles (which add in variance) yet the amplitude of each noise source

has halved since the LSB step halves, which results in a quarter the power, yielding a net 3 dB

improvement in SNR.

The validity of this approximation depends on the jitter magnitude with respect to the whole

period, (1/ fin). A model in Matlab was built using the precise noise model in (3.35), for an 8-bit

LCADC with 35 ns RMS ZCD jitter (much larger than anticipated, to exacerbate the difference

between approximation and actual). The ZCD input-dependent delay was modeled using the ap-

proximation in (3.28). The system was fed a full-scale 10 kHz tone, and the bandwidth of SNR

calculation was 20 kHz. The precise zero-crossings were calculated analytically, then re-quantized

to a 2 GHz sampling clock to enable an FFT operation. As seen in Fig. 3.16, looking over a very

wide bandwidth the approximation is invalid. On the other hand, it is unlikely a system designed to

process signals up to 20 kHz would have a bandwidth of 100 MHz. Looking at the same spectrum

over a 2 MHz bandwidth in Fig. 3.17, we see the two curves coincide. The precise SNR is 71.9

dB, and SNR with the impulse approximation 71.6 dB (over a 20kHz bandwidth).

With the same system, when the input frequency is reduced to 1 kHz, the same plots in Figs.

3.18 and 3.19 show similar results: now both the real model and approximation model give 75.8

51

0 10 20 30 40 50 60 70 80 90 100−140

−120

−100

−80

−60

−40

−20

0

Frequency (MHz)

Spe

ctru

m (

dB)

Precise Jitter ModelImpulse Jitter Approximation

Figure 3.16: Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 10 kHz input.Resolution bandwidth is 100 Hz.

dB SNR with the same 35 ns jitter in the ZCD (again, over a 20 kHz bandwidth). The reason the

SNR does not improve by 10 dB as suggested by 3.40 is that the value of αRMS( fin) increases as fin

decreases, due to the longer delays through the ZCDs, and hence greater crossing jitter variation.

It should be noted that although the developed model concerns ZCD jitter-induced output noise,

other noise sources in the circuit such as jitter in the asynchronous logic and switch noise in the

capacitor DAC can be integrated into this model with little effort.

52

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−120

−100

−80

−60

−40

−20

0

Frequency (MHz)

Spe

ctru

m (

dB)


Figure 3.17: Detail of Fig. 3.16 over a 2 MHz span with the same RBW = 100 Hz. At lowfrequencies, the impulse approximation is still valid.

53

0 10 20 30 40 50 60 70 80 90 100−160

−140

−120

−100

−80

−60

−40

−20

0

Frequency (MHz)

Spe

ctru

m (

dB)


Figure 3.18: Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 1 kHz input.Resolution bandwidth is 10 Hz.

54

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−140

−120

−100

−80

−60

−40

−20

0

Frequency (MHz)

Spe

ctru

m (

dB)


Figure 3.19: Detail of Fig. 3.18 over a 2 MHz span, with RBW = 10 Hz. Note the LCADCdistortion components have moved closer in, but other than that the impulse approximation stillpredicts the noise level.

Chapter 4

Design of a µW Programmable LCADC

This chapter describes the implementation of a 20 kHz, 8-bit LCADC in 0.13 µm CMOS. In the

following design, due to incompatibility between the technology kit and available CAD tools, no

post-layout extraction was possible. Not knowing the parasitic capacitances meant many circuit

blocks had to be designed conservatively, and thus consume more power than necessary.

The top level block diagram is shown in Fig. 4.1. What follows is a high-level overview of

the system operation, with an in-depth block-by-block discussion in the following sections. While

Fig. 4.1 bears a resemblance to the prototypical capacitor feedback DAC LCADC structure in Fig.

2.5, there are a few additional blocks. The underlying operation is the same: during operation, a

pair of capacitive DACs perform the subtraction of the input signal with the analog reconstructed

digital output. This difference is nominally contained within a comparison window by the ZCDs.

When the difference grows large enough, indicating that the analog input has diverged from the

55

56

reconstructed output, one of the ZCDs triggers, causing the ASYNC TIMING block to update the

digital output value, as well as the value to be fed back to the capacitive DAC.

The improvements upon the basic architecture are briefly outlined. When a ZCD crossing

occurs, the ASYNC TIMING block triggers computations in the ADAPTIVE RES CONTROL

block which dynamically varies the LCADC resolution on a sample-by-sample basis, in a manner

that reduces the average number of samples taken (versus a non-adaptive LCADC [8,10,27]) while

having minimal impact on the fidelity of the digitization. The need to support a varying resolution

is why the capacitive DAC is split into binary and thermometric banks. The ASYNC TIMING

block also causes the ZCD BIAS CONTROL block to adjust the ZCD bias current in order to

partially cancel the effect of input-slope-dependent delay dispersion through the ZCD, which as

was discussed in Sec. 3.1.3 is a source of distortion in an LCADC. The digital bias values from the

ZCD BIAS CONTROL block are converted through a pair of DACs to bias currents for the ZCDs.

In order to support all the digital computations performed by the sub-blocks in the ASYN-

CHRONOUS CONTROLLER without resorting to an always-on clock, a delay-line based timing

generator is used. The edges generated by this block are synchronized by the ZCD triggers, and

therefore signal-dependent and asynchronous with respect to absolute time; an input which pro-

duces samples sparingly produces correspondingly fewer edges which clock the digital, and there-

fore input-activity dependent digital power consumption is achieved. The delay elements have

individually controllable delays through an array of bias DACs, such that the adaptive algorithms

can be tailored to the expected input characteristics.

Rounding out the IC is a separate digital block which is clocked from an integrated oscillator.

57

The oscillator is required to drive the calibration algorithm that cancels the ZCD offsets; because

the calibration does not need to be run continuously, it is enabled only to perform the calibration

cycle and disabled immediately afterward. Also contained in the block is a standard SPI interface

used to program the IC and read off various internal states during operation.

In the sections that follow, the detailed operation of each of these blocks will be discussed.

58

ZC

D B

IAS

DA

CS

ZC

D B

IAS

CO

NT

RO

L

AD

AP

TIV

E

RE

S

CO

NT

RO

L

AS

YN

CH

RO

NO

US

CO

NT

RO

LL

ER

CAL DAC

DE

LA

Y B

IAS

DA

CS

CA

LIB

RA

TIO

NS

PI

SY

NC

HR

ON

OU

S D

IGIT

AL

IN

OU

T

tt

t

PR

OG

RA

M

OS

CIL

LA

TO

R

+-+-

BIN

AR

Y D

AC

S (M

SB

)T

HE

RM

OM

ET

RIC

DA

CS

(LS

B)

INP

UT

SW

ITC

H

dacRST

dacEN

zcdEN

AS

YN

C

TIM

ING

zcdEN

dacRSTdacEN

TIM

ING

GE

NE

RA

TIO

N

CA

PA

CIT

OR

FE

ED

BA

CK

DA

C

ZC

D

BIASHBIASL

BINARY[4:0]

THERMH[23:0]

THERML[23:0]

TIME0[3:0]

TIME10[3:0]

TIM

ING

ED

GE

S

UP

DN

Figure4.1:

ToplevelL

CA

DC

blockdiagram

.

59

4.1 Adaptive Resolution Algorithm

Since the minimum inter-sample timing TGRAN is a function of input derivative, the issue of narrow

sample spacing only occurs for large-amplitude, high-frequency inputs. For a full scale 100 Hz

input, TGRAN is 12.6 µs, and likewise for a 20 kHz -40 dBFS input, TGRAN is 6 µs–both of these are

far from the minimum TGRAN of 62 ns, obtained for a full-scale, 20 kHz input. Therefore, the full

speed (and power) of the ZCD is only needed for a limited set of inputs, and when the input activity

is reduced so are the ZCD requirements. Fortunately as demonstrated in [42], it is possible to skip

samples during high slope inputs without a significant impact on in-band fidelity, and therefore

get away with a lower-power ZCD. Intuitively, the quantization error from high-slope regions of

the curve contributes to distortion at very high frequencies; thus reducing the “effective” sampling

rate in these portions will only increase the out-of-band error. By skipping samples during high-

slope portions, the ZCD requirements can be relaxed exactly where they are most stringent, so the

average ZCD performance requirements are also relaxed (and therefore the ZCDs consume less

power).

Two proposed implementations of a variable-resolution quantizer for CTDSP applications sug-

gested in [42] are shown in Fig. 4.2. In the 4.2(a), a continuous time derivative of the input is

digitized and controls the resolution of the main ADC. In the more digitally-oriented implemen-

tation 4.2(b), the slope is estimated by comparing the inter-sample time with known time delays,

and if the inter-sample time is lower than a certain threshold the resolution is reduced. The first

implementation unfortunately does not solve our problem of ZCD response time-it just moves it to

60

LCADC

RES

CTL

LCADC

RES

CTL

ttt

(a) (b)

DERIV

LCADC

Figure 4.2: Two methods for reducing the sampling rate.

a different ZCD. The derivative is estimated by a simple CR filter, and the output is quantized by a

coarse flash LCADC, whose value is used to determine the resolution of the main LCADC. How-

ever the ZCDs in the flash LCADC require full speed at all times, and thanks to the added delays

in this controlling path, the ZCD in the controller needs to be faster than the main ZCD would oth-

erwise need to be. In addition, unlike most other analog blocks whose power consumption scales

with resolution, even though the controller path ZCD only needs to quantize a few levels, its power

consumption is still dominated by the static bias current required to get the speed (response time).

In the second proposal, a digital block(RES CTL) computes an estimate of the derivative of the

output signal. In order for the controller to determine that the input slope has increased the LCADC

itself must be able to produce samples at the higher rate, which in turn requires a fast ZCD. This

returns us once again to the problem of high power consumption in the LCADC itself. The fact that

neither of the proposed techniques directly solves the ZCD power problem in LCADCs is under-

standable since the focus of [42] was reducing the power consumption in the subsequent CTDSP

through reduced sample throughput, for which both techniques are ideally suited. The main find-

61

+

-

+

-

VIN

VH

VL

VH

VL

1 LSB STEP

(HIGHEST

RESOLUTION)

5 LSB STEP

3 LSB STEP

VIN

RECONSTRUCTED

QUANTIZED OUTPUT

1×LSB

3×LSB

5×LSB

7×LSB

WINDOW IS WIDEST

IMMEDIATELY AFTER

A SAMPLE OCCURS

NARROWING

WINDOW OVER TIME

TD0

TD1

TD2

(a) (b)

Figure 4.3: The behavior of an ideal adaptive resolution scheme (a) and the proposed (b).

ing that reducing sampling rates during high slope inputs minimally affects in-band fidelity is still

valid, but a different method needs to be sought in order to reduce LCADC power consumption

itself.

Figure 4.3 shows the basic idea of the proposed adaptive resolution (AR) algorithm. We begin

with Fig. 4.3(a), which shows an ideal adaptive resolution controller. We define the “leading

boundary” as the comparison voltage the input is expected to cross should its derivative not change

sign during the current sample, and the “trailing boundary” as the other voltage. In the ideal case,

the leading boundary increments on each sample by an amount proportional to the slope of the

input. Likewise, the LCADC’s output values predict the trajectory of the analog input based on

information at the current sample, visible in Fig. 4.3(a), each time the dotted line (reconstructed

quantized output) leads the solid black line (analog input). The problem with this implementation is

62

that a lot of information is required about the signal. A first derivative allows the system to precisely

predict level crossings for linear (ramp) inputs. In order to accurately predict level crossings for

more complex inputs, more derivatives are needed. Obviously, in this energy-constrained system

it is not feasible to compute and process multiple signal derivatives on a per-sample basis.

Moving toward an algorithm with low implementation complexity, the approach adopted for

this work is shown in Fig. 4.3(b). Instead of predicting when the next sample will occur (and

hence the resolution required), immediately after each sample the (VL,VH) comparison interval is

widest. A level crossing which occurs while the comparison window is wide is quantized with low

resolution. If TD0 seconds after the previous level crossing a new level crossing has not occurred,

the comparison window is narrowed one LSB step. This is equivalent to incrementally increasing

the ADC resolution. Until a crossing occurs, the quantizer resolution continues to increase over

time until the quantizer is in its highest resolution state. In this manner, high-slope inputs result

in boundary crossings soon after the previous sample, when the quantizer is in a low-resolution

state, while slow-moving inputs produce crossings when the quantizer resolution is higher. This

behavior is consistent with the proposed variable-resolution algorithms presented in [42], although

the implementation is different. The quantizer output is always updated with a step equal to twice

the instantaneous resolution step; as seen in Fig. 4.3(b), the dotted line is always symmetrically

spaced above and below each sample. The reason for this is that the adaptive resolution controller

can be simply implemented without need for deep memory of multiple previous samples (only the

directly preceding one is needed) or complex, power-consuming prediction logic.

While the behavior of the leading interval boundary (the boundary which the signal is expected

63

(a) (b)

CROSSING

DURING

RESOLUTION

SWITCH

Figure 4.4: If the input slope changes, the trailing boundary can catch the input leading to oscil-lations, shown in (a). The alternative trailing boundary behavior shown in (b) does not suffer thisproblem.

to cross, assuming its derivative does not change sign) is well defined, there are two options for

the behavior of the trailing boundary, as shown in Fig. 4.4. In 4.4(a), the trailing boundary is

increased in accordance with the TDx time intervals until it is 0.5VLSB below the reconstructed

output value, which means the total comparison interval is now VLSB wide at the highest resolution

setting, similar to the LCADC behavior without AR. In this implementation both the leading and

trailing edges have the same convergent behavior. The problem with this implementation is the

situation where the input derivative decreases enough that the trailing boundary catches the input

signal, thereby triggering a false downward increment. In Fig. 4.4(a), the crossings are hard to

see because the crossings occur when a boundary is switching resolutions: thus the crossing is

not due to the input crossing a quantization threshold, but due to the AR algorithm itself. This is

discussed shortly. Since the signal is still increasing, on the next comparison interval an upward

increment occurs, and this process can repeat, so that the output oscillates around the input value.

64

From behavioral simulations, while this effect did not degrade distortion performance (it increased

high frequency “noise,” similar to a first-order ∆Σ converter) it increased the average sample rate,

wasting power.

The alternative solution shown in 4.4(b) has the trailing boundary start VLSB below the actual

crossing voltage of the previous sample, then increment by one VLSB and stop. That the boundary

initially starts below the actual crossing and increments to the previous crossing threshold is a form

of hysteresis that reduces the chance of noise in slow-moving inputs from triggering false samples.

Compared with the first option, because the trailing boundary stops at the previous crossing voltage

ensures that a negative transition is only possible if the input has an inflection point.

In this system, the AR controller can be programmed to different minimum resolutions (the

quantizer resolution immediately after a sample), from 1×VLSB (effectively fixed at maximum

resolution) to 15×VLSB. Since the feedback DAC hardware is fixed at 8 bits, if each quantizer step

skips 15×VLSB the quantizer resolution is reduced to 17 levels, or about 4 bits. To compute the

new TGRAN at this minimum resolution, we can compute the amount of time it takes a full-scale 20

kHz input to cross the wider quantization step:

TGRAN =Quantization Step

Input Slope=

15VLSB

2π2N−1VLSB fMAX= 930 µs. (4.1)

We have therefore relaxed the minimum inter-sample time from 62 ns to almost 1 µs.

From a hardware perspective this method is simpler than using an analog derivative estima-

tor, because like the scheme in Fig. 4.2(b), it requires no measurement on the input aside from

65

CROSSING

DURING

RESOLUTION

SWITCH

Figure 4.5: If the input crosses a threshold during a resolution change, there is a quantization error.

the LCADC sampling itself. It also requires no additional analog circuitry, just a modification of

the feedback DAC to support the variable comparison interval. The timing intervals TDi are pro-

vided by a delay line which is already present to implement other timing functions, and the entire

algorithm is implemented in the digital controller. Its downside is that because the algorithm is

open-loop, the discrete steps in the resolution change can introduce quantization error if the AR is

in between two resolutions when a crossing occurs, already shown in Fig. 4.4(a) and shown again

clearly in Fig. 4.5. While this could be a performance limitation in high resolution LCADCs, for

this 8-bit converter simulations showed > 50 dB SNDR, so alternative solutions were not consid-

ered.

66

DELAY

CELL

DELAY

CELL

DELAY

CELL

UP

DN

COMBINATIONAL LOGIC

TIMING

EDGES

B

IAS

D

AC

B

IAS

D

AC

B

IAS

D

AC

t

TIME0[3:0]

TIME10[3:0]

11× 4-bit TIMING CONTROL DACs

ZCD SIGNALS

Figure 4.6: Asynchronous timing generator formed with an 11-tap delay line, each with 4-bitcontrol.

4.2 Programmable Asynchronous Timing Generation

The AR algorithm in the preceding section as well as the adaptive ZCD bias algorithm (to be

explained in the following section) both require a time base with which to compare the inter-

sample timing, in order to estimate input activity. In synchronous systems a timer is realized with

a high-speed clock and counter, but in the absence of any clock a delay line must be used. Shown

in Fig. 4.6 is the 11-tap delay line with 4-bit bias DACs, one for each delay cell. The delay values

T IMEi[3:0] are provided externally though the serial-peripheral interface (SPI).

The timing of the first few elements of the delay line are shown in Fig. 4.7. When a ZCD

crossing occurs (UP or DN go high) the combination logic resets each delay element simultane-

ously. The τ delay element at the output of the XOR allows adequate time for the elements to reset,

before the pulse generator launches the delay line. In this manner, the delay line always generates

the same set of edges (whose relative spacings TDi are dictated by the T IMEi[3:0] control words)

67

UP

DN

EDGE 0

EDGE 1

EDGE 2

EDGE 3

EQUAL DELAY

DELAY LINE IS

RESET BY

INCOMING

TRIGGER

Figure 4.7: The timing edges from the delay line are launched by either an UP trigger or DNtrigger. If the line has not finished propagating when then next trigger arrives, it restarts from thebeginning.

relative to a ZCD crossing event, which the AR algorithm uses to control the leading and trailing

comparison boundaries to produce the resolution transitions in Fig. 4.3.

During operation it will often happen that a new sample arrives while the AR is still converg-

ing the comparison boundaries; this is equivalent to saying that the delay line has not finished

propagating through from the previous sample. As explained above, an UP or DN rising edge

asynchronously resets each delay cell simultaneously–shown as the truncated pulse on EDGE 3 in

Fig. 4.7–and the added logic needed to implement a fast asynchronous reset complicates the design

of this cell with respect to those previously reported [8, 43, 44]. The full schematic of each delay

68

IN

BIAS

OUT

RST

RST

1

1

2.4

2

DEVICES WITH NO SIZE ARE

MINIMUM SIZE 0.15/0.13

1

1

1

0.2

M1a M1b

M2

M3

Figure 4.8: Schematic of a delay element, with the devices of interest highlighted in gray. Deviceswithout sizes given can be assumed to be operating as switches and non-critical.

cell is shown in Fig. 4.8. The core devices are those in the dotted box, all other devices behave

as switches. Devices M1a,b are current sources whose gate voltage is set by one of two master

references (set by an off-chip resistor). Before the cell triggers, IN and OUT are both high, and

RST is low. This means M1a is not sourcing current, and the switch M3 is closed thus discharging

the MOS capacitor. When IN goes low, the M1a current begins charging the MOS cap through M3,

until the output of the actively loaded common source amplifier formed by M2 and M1b transitions

from high to low. The following switches and logic implement a regenerative latch with fast reset.

The propagation of the falling M2 drain to the falling OUT is so fast in comparison to the voltage

ramp on the MOS cap that when OUT goes low (and opens M3) the voltage on the MOS capacitor

is not much higher than VT of M2. The total charge taken from the supply1 is therefore CtotalVT ,

where Ctotal is the sum of the MOS capacitance and all parasitics on the shared M1a drain/M2 gate

1Not counting the switching currents in the logic, which is admittedly a large portion of the overall power con-sumption.

69

node. This reduced charging is contrasted with the delay cell presented in [43], where when the

equivalent threshold is reached a positive feedback path quickly charges the capacitor to VDD, and

in the process drawing a total of CtotalVDD charge from the supply.

4.3 Zero Crossing Detectors with Dynamic Current Bias

In a previous LCADC without adaptive resolution [10], TGRAN was 62 ns for a full-scale 20 kHz

input. Since the total sum of the propagation delay around that loop (ZCD delay, digital logic

delays, feedback DAC delay) was smaller than TGRAN = 62 ns, the maximum ZCD delay was by

necessity smaller than 62 ns. The upper bound on ZCD delay dispersion is the absolute delay

through the ZCD thus in that system the input-dependent delay dispersion was not a dominant

effect–in Sec. 3.1.3 it was found that the delay dispersion needed to be less than 60 ns to not affect

SDR. In this system, from eq. (4.1) the loop is allowed to have almost 1 µs total delay, which means

the ZCD delay can also be relaxed. We expect that the ZCD delay dispersion scales with absolute

delay, so with a longer allowable ZCD absolute delay the delay dispersion becomes an issue. To

minimize delay dispersion while maintaining low power, an adaptive current bias is implemented.

4.3.1 Three-Stage Zero Crossing Detector

The ZCD follows closely the topology of a standard comparator, with a preamplifier followed by

a latch as shown in Fig. 4.9.

70

+

-

-

+

+

-

-

+

+

-

-

+

+

-IN+

IN-

OUT

zcdRST

STAGE1 STAGE2 STAGE3

BIAS

Figure 4.9: The ZCD comprised of a three-stage preamplifier and a latch.

Transistor Stage 1 Stage 2 Stage 3M0 10/0.5 6/0.5 8/0.5M1 4/0.4 0.4/0.2 0.3/0.2M2 1.6/0.4 0.8/0.3 0.8/0.13

Table 4.1: ZCD preamplifier device sizes.

Preamplifier

The preamplifier is the cascade of three scaled but otherwise identical stages. Each stage, shown

in Fig. 4.10, is a basic differential pair with local common-mode feedback (CMFB). The ratio

of main tail to CMFB current is 4:1, which puts a 25% overhead on overall power. The choice

of local CMFB guarantees stability over any bias current, which is necessary due to the adaptive

current bias. The CMFB circuit used is very simple and has a limited differential range, but this is

acceptable since the preamplifier only needs to amplify the relative difference between two signals;

normal amplifier parameters, such as linearity and swing, are not important for this application.

Although a single loop around all three stages (and almost identical poles) would have saved a

small amount of power, aside from the complication of tracking the location of stabilizing zeros

with bias current, the CMFB loop would have been too slow to be power-cycled at each sample.

The sizes of important ZCD transistors for each stage is given in Table 4.1. The gain and bandwidth

71

IN-

IN+

VCM

OUT+

OUT-

M0

M1

M2

BIAS

Figure 4.10: ZCD preamplifier stage. Thick devices are LVT.

of each stage was initially specified considering the results in Sec. 3.1, but the final transistor sizes

were determined by a Matlab/Spectre script to minimize absolute delay with respect to power.

Simulated ZCD performance shows, for example, that with a power consumption of 490 nW, the

preamplifier provides 71 dB of gain and has a bandwidth of 1.2 MHz.

Uni-Directional Latch

Because the ZCD must track a continuous input, a true regenerative latch is not an option. However,

based on our chosen topology in Fig. 2.4 we do know that during normal operation, the ZCDs

bound the input within an interval: this means that until a crossing occurs, both ZCDs have VIN+ <

VIN−, and their outputs are low. Referring to the timing diagram in Fig. 2.7, once a ZCD crossing

occurs, the circuit enters REFRESH and when it returns to T RACK, the condition VIN+ <VIN− is

again true. Based on this observation, we can design a circuit which latches in only one direction,

72

zcdRST

IN+

IN-

OUT

2.4

0.13

1.2

0.13

0.2

0.130.2

4

0.3

0.13

0.2

0.13

0.2

2

0.15

0.13M1bM1a

M2bM2a

M3

M4

Figure 4.11: ZCD uni-directional latch.

i.e. while VIN+ < VIN− the output is low and can go high as VIN+ ≥ VIN−, but once VIN+ > VIN−

the output is stuck high and cannot go low again without a reset.

The circuit shown in Fig. 4.11 exhibits a uni-directional latching behavior. The input four-

transistor cell comprised of M1a,b) and M2a,b form a high-gain complementary self-biased

differential amplifier cell [45]; this stage can be visualized as two differential pairs connected drain-

drain, with N- and P- tail current sources M3 and M4, respectively. The remaining transistors in

this stack are switches used to disable the bias current sink, and add to the bias current source when

a crossing occurs, thereby speeding up the transition and eliminating the static bias current once

the cell latches. In some way, the input stack can be thought of as an analog amplifier stuffed into

a NAND gate. The zcdRST signal is used to reset the latch while in REFRESH.

73

4.3.2 Current Bias DACs

Figure 4.12 shows the simulated delay of the complete ZCD (preamplifier and latch) vs. input

slope, for 20 logarithmically spaced bias currents ranging from 230 nA to 4.3 µA. Due to the

voltage division in the capacitive feedback DAC (Fig. 2.6), the swing at the ZCD input is half that

of the input itself, so the maximum input slope expected for a 20 kHz full-scale (where full-scale

is 0.8V peak-peak, equal to VDD) input is around 25 V/ms. As shown in the plot, most of the bias

currents can not achieve a 62 ns delay with a 25 V/ms input slope, which is what would have been

required had this ZCD been used in an LCADC without AR.

From the calculation in (4.1) the system has a minimum TGRAN of almost 1 µs. Of this, 200 ns is

budgeted for the ZCD delay; the remaining time is given to the asynchronous logic, capacitor DAC

reset and settling periods. From Sec. 3.1, we know the ZCD is allowed 60 ns dispersion before the

SDR is affected. This narrow dispersion can be maintained down to low slopes by biasing the ZCD

with five different currents depending on input slope. The circuit shown in Fig. 4.13 measures the

time between consecutive samples by counting edges from the delay line shown in Fig. 4.6 and

uses the count value to select a bias current for the ZCD for the next sample. Immediately after a

sample has occurred, the T RIG edge has not propagated down the delay line and the edge counter

output is reset, thus selecting a low bias current. If another level crossing occurs right away, it

would mean the input is fast-moving, or high-slope, which makes the low bias current appropriate.

As time passes the delay line propagates, incrementing the edge count. Should a level crossing

occur at a later time, the bias current latched in for the next sample is correspondingly higher. In

74

101

102

103

104

101

102

103

Input Slope (V/s)

Del

ay (

ns)

Figure 4.12: Total ZCD delay with respect to input slope for bias currents between 230 nA and4.3 µA. The higher bias currents are lower lines (indicating faster response).

this manner, the ZCD bias current varies inversely with the approximate slope of the input; the

longer the interval between level crossings, the lower the slope of the input signal.

The ZCD delay thus traverses different bias current lines depending on input slope such that

the overall ZCD delay dispersion is reduced, as seen in Fig. 4.14.

Figure 4.15 shows the bias current DACs for the ZCD preamplifiers. Eight identical 6-bit

PMOS current DACs all drive identical NMOS diodes, and these voltages are selected by an 8:1

multiplexer. The two multiplexers are controlled by SELH[2:0] and SELL[2:0], which are gener-

75

t t t t

TRIG

EDGE COUNTERRST

COUNT

time between samples

TRIGCOUNT

0 1 2

bias to ZCD

BIAS

0

BIAS

1

BIAS

2

increasing

bias current

Figure 4.13: A delay-line based slope measurement quantizes the bias current to five discretelevels.

ated according to the scheme shown in Fig. 4.13. The multiplexer outputs BIASH and BIASL are

the bias voltages for the upper and lower ZCDs, respectively. The eight DACs all share the same

current reference, the second of two master references brought from off-chip (the first was used

in the timing generation in Sec. 4.2). The seven bias control words BIASi[5:0] for the DACs are

programmed externally though the SPI interface. Eight DACs were chosen since five were used

in the initial system planning, but in hardware five levels and eight levels both encode to 3 bits so

more flexibility was gained with little overhead. Although it is wasteful to have eight current DACs

active if only one is used at any moment, the ZCDs need to switch bias currents within a few hun-

dred nanoseconds, which would normally require a high bandwidth single current DAC. Instead,

each DAC is decoupled with large capacitors (not shown in schematic) across the diode connected

NMOS devices to minimize the voltage step due to instantaneous charge sharing the multiplexer

76

101

102

103

104

101

102

103

Input Slope (V/s)

Del

ay (

ns)

Bias Current = 0.85µABias Current = 0.61µABias Current = 0.45µABias Current = 0.3µABias Current = 0.24µAAdaptive Bias Current

Figure 4.14: Adaptive current bias keeps delay within dotted lines (60 ns dispersion) until veryslow slopes.

connection changes and a new DAC is connected to the ZCD; the DAC itself is, however, low

bandwidth.

4.4 Segmented Capacitor Feedback DAC

In Fig. 2.5, the feedback DAC is implemented as a single capacitor array, which level-shifts the

input voltage to within the (−VLSB/2,VLSB/2) interval compared by the ZCDs. In the case of

77

REF2

0.5

0.25

0.5

16×BIAS0[5:0]

BIAS1[5:0]

BIAS7[5:0]

SELH[2:0]

SELL[2:0]

BIASH

BIASL

Figure 4.15: Binary-weighted arrays of PMOS transistors forming the current bias DACs for theZCDs. The bus of 8 bias voltages is multiplexed onto the BIASH and BIASL wires.

adaptive resolution, the comparison interval is no longer fixed to (−VLSB/2,VLSB/2). Rather than

adjusting the VL,VH boundaries, this variable interval is implemented by using a pair of feedback

DACs whose digital codes are offset by the width of the variable interval. Since the capacitor DAC

is comprised of unit capacitance elements, the variable resolution is easily implemented by adding

more unit elements. The drawback of this scheme is that two capacitor DACs are required, which

are a significant portion of the total area. The schematic of the dual capacitor DAC architecture

is shown in Fig. 4.16. The signals dacEN and dacRST are controlled as before in Fig. 2.7, and

VCM, REF+ and REF− are the DAC reference voltages, identical to the references in a normal

SAR ADC. Both DACs share the same binary code, while the thermometric code differs between

them. In the AR scheme, the comparison interval converges over time to the highest resolution in

8 LSB steps, where the leading boundary of the interval will have decremented toward the signal

until it is 0.5VLSB from the reconstructed output as shown in Fig. 4.3. Thus each capacitor DAC

78

+

-

+

-

vx(t)

BINARY[4:0]THERMH[23:0]THERML[23:0]

VCM

dacEN

dacRST

REF+

REF-

dacRST

IN

×27

×28

BINARY[4:0]

dacEN×2

3×1 ×1

THERM[23:0]

dacEN

to ZCD

VCM

signals from asynchronous controller

dacRSTdacRST

+VLSB 2

-VLSB 2

CA

P

DA

C

CA

P

DA

C

Figure 4.16: Adaptive resolution implemented with two capacitor DACs (detail of one shown ininset, both are identical), to give a 24×LSB thermometric range.

needs a thermometric range of ±8 LSB from the final value. Therefore, the thermometric LSB

DAC is given a range of 3×8 LSB, so that regardless of the LCADC output value there is an 8 LSB

thermometric range above and below it. For example, if the LCADC output is 8M + 0 (meaning

the output is divisible by 8) then the binary code BINARY equals M, and either T HERMH or

T HERML (depending if the previous crossing was upward or downward) counts from ±7 to 0.

Similarly, if the LCADC output is 8M + 7, then BINARY = M as before, but now T HERMH

or T HERML will count from 14 to 7 (if the previous crossing was upward) or from 0 to 7 (if

the previous crossing was downward). Thus the total continuous range of the thermometric DAC

needs to be from -7 to 14 or 21 LSBs. 24 LSBs are used since it makes the layout of the DAC more

symmetric.

The capacitor DACs are comprised of 20 fF unit cells arranged in a common centroid layout

with the thermometric LSB DACs located in the center. The structure of the unit cell shown in

Fig. 4.17 uses the vertical parallel plate structure [46]. A vertical plate formed of stacked wires

79

METAL 5

METAL 4

METAL 3

METAL 2

VIA

3.5

6

VIEW ALONG

X-AXIS

VIEW ALONG

Y-AXIS

Figure 4.17: Structure of a unit capacitance cell. Projected views from both sides are shown aswell.

in metals one atop another with a dense array of vias, to simulate a vertical sheet of metal in

the IC. In this capacitor, the top plate formed by vertical plates of metal3-metal5 interleaved with

plates of metal2-metal5 and entirely enclosed within a basket of metal2-metal5 forming the bottom

plate (imagine a car battery, with the casing as part of the bottom plate). Minimizing the top

plate parasitic to ground is critical because the lines driving the bottom plates are routed within

the capacitor array, and the single-ended switching of these lines can couple onto the top plate,

causing code-dependent offsets. The bottom M1 layer is reserved for routing, as each thermometric

capacitor has its associated switch underneath. In the binary DAC the switches are outside the array

and under the unit capacitors are dummy switches. Due to the aforementioned lack of parasitic

extraction, estimating the impact of line coupling was difficult so a unit size of 20 fF was chosen

as a conservative compromise between minimizing the impact of coupling and low power.

80

To estimate the worst case power consumption, we know that a full-scale input exercises the

full range of the DAC sequentially. That means, ignoring the time-stamps of the level crossings, if

we just inspected the sequence of codes sent to the DAC we would see a ramp from 0 to 2N then

back to 0, repeating twice per period of the full-scale input sine wave. For a DAC with 2N unit

capacitance Cu elements, suppose the input code is 0 ≤ n ≤ 2N − 1. Then the DAC is configured

as n×Cu capacitors to VDD (256− n)×Cu capacitors to ground. Since the DAC is always reset

to VDD/2 before each new output value, this takes a total of nCuVDD/2 charge from the supply.

When this DAC is reset to VDD/2, the VDD/2 reference generator must source or sink energy; the

total charge taken from the supply during this reset phase is (2N − 2n)CuVDD/2. Combining the

charging and reset phases to get the total charge used for DAC code n, we get (2N − n)CuVDD/2.

Since there are two code ramps per period of full-scale input (in one period there are 2×2N level

crossings) the total power consumption is

Power = 2VDD

2N

∑n=1

(n

VDD

2Cu

)× fin (4.2)

For a full scale 20 kHz input and N = 8, Cu = 20 fF, VDD = 1 V, this means 13 µW of power.

Fortunately the AR algorithm reduces the maximum sampling rate at 20 kHz by almost 10×, so

the power consumed by charging the capacitors is a few µW, within the 10 µW budget. It should

be noted that based on published SAR ADCs in these processes, for 8-bit matching the capacitor

could have been made smaller than 4 fF [47–51], but without an extraction tool 20 fF was chosen

for margin.

81

4.5 Automatic Offset Calibration with Oscillator

So far the generation of the (−VLSB/2,VLSB/2) interval has been assumed, as has been the matching

of the two ZCD offsets. In reality, for the device sizes given in Table 4.1 the simulated 1-sigma

input-referred offset of the ZCD is 6 mV, which is about 2VLSB. In order to generate the VLSB/2

voltages and compensate the ZCD offsets, an 8-bit R-string DAC is used. The calibration DAC

is a dual-tap (one for each VL and VH , which go to the lower and upper ZCD, respectively), 8-bit

version of [52] but is otherwise identical. The structure of the DAC and its tap decoder are shown

in Fig. 4.18. The tradeoff between dynamic range and precision is chosen so that the calibration

DAC has LSB/16 precision, and can compensate up to 2σ of ZCD offset. This precision/range

tradeoff is not hardwired into the DAC, but set by the choice of external bias resistors connected to

V HIGH and V LOW .

In order to automatically calibrate the ZCD offsets, the existing ZCDs are used in a pair of

successive-approximation loops, as shown in Fig. 4.19. In the figure, the capacitor DACs are

abstracted as voltage sources since their voltages do not change during a conversion cycle. The

ZCDs are modeled as ideal ZCDs with offset voltage sources VOH , VOL in series with their inputs.

During offset calibration, the capacitor DACs are held in reset at voltage VCM (“0” with respect

to the input range), which is equivalent to binary code “01111111”. The digital logic then per-

forms two parallel SAR conversions (one for each ZCD) using the calibration DAC, and the digital

conversion results DH0 and DL0 represent the ZCD offset voltages themselves, VOH and VOL quan-

tized to 8 bits (since the calibration DAC has 8-bit resolution, not because the LCADC uses 8-bit

82

COLUMN DECODE

RO

W D

EC

OD

E

VH

VL

VHIGH

VLOW

CH[7:4]

COL H COL L

ROW H

ROW L

CL[7:4]

CH[3:0] CL[3:0]

250

VH VL

Figure 4.18: Schematic of the R-string calibration DAC with two output taps. The inset showsone of the 256 identical unit elements.

capacitor DACs). Once the ZCD offsets are found, the process repeats but with the capacitor DAC

outputs set to 2VLSB and−2VLSB (codes “10000001” and “01111101”). This produces at the end of

conversion DH1 and DL1, the digitized versions of VOH +2VLSB and VOL−2VLSB. Then the correct

DAC codes to generate the (−VLSB/2,VLSB/2) thresholds (accounting for the ZCD offsets) are:

DH = DH0 +DH1−DH0

4(4.3)

DL = DL0−DL0−DL1

4(4.4)

Since these are binary bytes, the divide-by-4 is just a right shift two places. The choice of using

two LSB shifts to calculate the VLSB/2 value, where a single LSB shift would have sufficed, is to

reduce the rounding error. This unfortunately limits the dynamic range of the offset calibration,

83

+

-

+

-

+-

+-

DIGITAL

Set to 0, then -2VLSB

Set to 0, then +2VLSB

+ -

+-

VOH

VOL

CAP DACS

CAL

DAC

UP

DN

Figure 4.19: Two-step SAR calibration loop using the existing ZCDs as comparators. The capac-itor DACs are modeled as voltage sources, and the offset voltages are shown as voltage sources inseries with the ZCD inputs. First the ZCD offsets are calibrated, then the VLSB step.

since the calibration DAC needs a range of 4VLSB +VOH +VOL. For this reason the direction of the

2VLSB shift can be chosen, i.e. if VOH +2VLSB is too positive for the conversion range and the SAR

code DH1 is “11111111”, then the conversion repeats but with VOH −2VLSB, which IS in range of

the calibration DAC (the math to produce the calibrated value adjusts accordingly in the logic).

The same is true for the lower SAR loop.

4.5.1 Relaxation Oscillator

The SAR calibration loops require a clock to operate. Although each calibration loop is completed

in a finite number of cycles and therefore could be driven with edges from a delay line, the nodes

VL and VH driven by the calibration DAC outputs are heavily decoupled and therefore have very

84

50k

200

0.5

10

0.54

0.2

2

0.13

OSCTUNE[3:0]

oscENOSC

1

0.13

1

0.13

DEVICES WITH NO SIZE ARE


M2

M1

Figure 4.20: Relaxation oscillator with tunable 4-bit MOS capacitor load.

low bandwidth, so the delay elements would need to be very slow. Instead, a relaxation oscillator

and 8-bit clock scaler are integrated to drive the calibration loop. The oscillator shown in Fig.

4.20 is based on a current source charging a capacitance with a tunable MOS capacitor bank that

generates voltage ramps that are periodically reset by the NOR gate in the middle. The switching

transition point of the stack with M1 and the stack with M2 are different, due to the diode below

M2. The difference between these two switching points is the amplitude of the sawtooth wave on

the MOS capacitor load. The narrow pulses that result from the NOR periodic reset are fed to a

pass-gate divide-by-2 to create a 50% duty cycle clock. This clock is further divided by an 8-bit

clock scaler to drive the calibration routine. This bizarre implementation of a standard relaxation

oscillator is due to the fact that the layout of the oscillator is basically a delay cell from Sec. 4.2

with changes in metal1 and metal2, and so the oscillator was achieved with little effort (necessary

to meet the tapeout deadline).

85

IN

GATE

OUT

VIN

VGATE

EN

EN

VIN

VGATE

EN

(a) (b)

Figure 4.21: Traditional switch drivers (a) bring the gate to 0 when off. The proposed driver (b)clamps the switch gate to the source when off.

4.6 Other Circuits

4.6.1 Bootstrapped Input Switch

Because the input full-scale voltage is equal to VDD, were the input sampling switch in Fig. 4.1 im-

plemented with a simple pass transistor, it would have an overdrive voltage that varies significantly

with the input. Not only would this cause an input-dependent Ron (this also plagues differential

sampling networks), but signal-dependent charge injection as well. The solution is a boosted sam-

pling switch. Most differential boosted switches drive the switch gate to vIN +VDD when on, and

down to 0 when off [53, 54], as shown in Fig. 4.21(a). If these switches were used in this imple-

mentation, while the on-resistance nonlinearity would be eliminated, the input-varying change in

the switch vGS during the off-on transition would still cause input dependent charge injection. The

solution is a bootstrap circuit shown in Fig. 4.21(b) whose “on” voltage is vIN +VDD, and whose

“off” voltage is vIN (such that the VGS of the switch transistor is 0 when off). The schematic of

86

EN

independent N-wells

160 fF

ALL HIGH VOLTAGE DEVICES


1

0.13

M2

M1analog input to CAP DAC

Figure 4.22: Gate drive boostrapping circuit. The input node is connected to the analog input, andthe output is connected to the capacitive feedback DACs.

this circuit is shown in Fig. 4.22, and follows closely the classic design of [54]. The only mean-

ingful difference is that the gate of the pass device is not shorted to ground but clamped to its own

source when the switch is turned off. Unfortunately, with the modified “off” voltage no way was

found to use only low-voltage devices. Aside from the added cost of using high-voltage transis-

tors, this method has the disadvantage that it requires the source impedance to be much lower (AC

impedance) than the impedance of the switched capacitor DAC following it (refer to Fig. 4.1),

otherwise the charge injection of clamping switch M2 will flow to the right [55]. With a 0.8 V

overdrive voltage, the nominal Ron was simulated to be 1 kΩ.

87

BINARY

LSBDECODERDECAP

ONE MSB GROUP

4×

4×

4×

4×

DU

MM

Y

DU

MM

Y

DU

MM

Y

DU

MM

Y

CLK

CLK

DATA

I- I+

0.8

3

0.67

0.13

0.31

0.13

(a) (b)

UNIT

UNIT

UNIT

UNIT

Figure 4.23: Differential segmented current steering DAC used for testing the LCADC. Top layoutshown in (a) is comprised of 256 identical unit cells, whose schematic is shown in (b). The notation4× means a group of four unit steering cells.

4.6.2 Current-Steering Reconstruction DAC

To test the LCADC, the digital bitstream needs to be reconstructed. From the stipulation that no

high-speed re-sampling be used, the only recourse is a zero-order hold reconstruction using a DAC.

In order to clearly see the performance of the LCADC and not the DAC, the DAC is purposely

designed for 10-bit linearity even though it has 8 bits. The DAC is a differential current-steering

design with open drain outputs. In order to minimize glitching a fully thermometric structure is

preferred, but this requires a large routing area. As a compromise, a segmented 4/4 architecture

was chosen. Each of the thermometric MSB banks is comprised of 16 replicas of a unit cell, all

driven by the common thermometric signal. The 16 unit cells are grouped as four banks of four,

as shown in Fig. 4.23(a). The groups of four cells are arranged in a common centroid layout, with

the middle row between the two halves occupied by decoupling capacitance, the binary-weighted

88

LSB bank, and the thermometric MSB decoder. Each unit cell shown in Fig. 4.23(b) is a simple

current steering open-drain switch. The CLK signal is not necessarily a synchronous clock, but a

signal which indicates when the data arriving at the DAC is valid, to prevent glitching in the DAC

output. This signal is produced by the LCADC to interface with other asynchronous circuits that

use handshaking protocols. In the DAC the CLK signal is delayed to account for the propagation

delay through the thermometric decoder, and driven to every cell in an H-tree (not shown in the

layout diagram) to ensure identical switching times of all cells–this is also to reduce glitches in the

output. The steering pair also has charge-injection cancelation devices, although their effectiveness

is questionable since no extraction was performed. A PMOS current source device was chosen as

PMOS devices had better matching for small sizes, and were sized to give a 1-σ mismatch of

LSB/10, to allow the nonlinearity of the LCADC to dominate. The integrated noise power over a

20kHz bandwidth is simulated to be more than 20 dB lower than that expected from the LCADC,

and the simulated maximum frequency of operation is > 200 MHz, although this number was

achieved without parasitics. Since the minimum TGRAN for an LCADC without AR is 62 ns (16

MHz instantaneous clock speed) even when accounting for parasitics the bandwidth should be

sufficient.

4.6.3 Asynchronous Controller

The AR controller and ZCD bias controller are written in verilog and synthesized with Design

Compiler, and verified with Prime Time. In order to address the concerns of dynamic dissipation

89

discussed in Sec. 2.4, entirely hand-written RTL was attempted but it was found that the automated

synthesized logic was superior. The logic was still optimized by hand to minimize large code-

dependent delays: in order to minimize power all minimum-sized logic gates were used, which

results in a propagation delay of several ns.2 It was possible to use standard digital flows even

though this block deals with asynchronous signals because the combinational logic in the timing

block (Fig. 4.6) ensures that minimum bounds on pulse widths and pulse frequency are met. The

asynchronous digital logic contains 461 standard cells and occupies 9,000 µm2.

4.6.4 SPI and Calibration Logic

The IC has a total of 25 control bytes, which are interfaced through a standard SPI 1,0 bus. The

calibration logic and SPI logic as well as various testing functions are merged into one synthesized

module, separate from the asynchronous controller, which occupies the lower portion of the IC.

Since both the SPI and calibration logic are powered down during normal use, their supply is

separated from the rest of the IC (shared with the ESD supply) and not counted toward the power

budget. The combined logic block contains approximately 1,400 standard cells, and occupies

45,000 (µm)2.

2Although all such optimizations are possible (and done better) by Design Compiler, but this was done by handdue to lack of familiarity with the tool.

Chapter 5

LCADC Measurements

5.1 Setup

The IC, whose die photo is shown in Fig. 5.1, contains both the LCADC and its testing DAC.

Since the IC is bond pad limited, in order to save area the bias currents and 8-bit PCM data bus

of the LCADC and DAC are multiplexed onto the same pads. The IC powers up in DAC mode so

that the 8-bit data bus is an input, and through the SPI interface it is put into LCADC mode. Since

the bias pins are also shared it is impossible to use both the LCADC and DAC on one IC, so the

testing board contains two copies of the IC. Although the DAC was purposely over-designed so

that it is not the performance-limiting block in the measurement chain, it should be noted that as

all measurements were performed with the test DAC, the numbers are conservative representations

of the actual LCADC performance. Aside from a few non-critical bugs detailed in Appendix B,

the IC functioned as expected.

90

91

Figure 5.1: Die microphotograph. The output DAC is not counted toward the active area incomparison in Table 6.1.

Because the bias currents needed by the LCADC were in the range of 10’s of nanoamperes

while the ESD diode leakage of the foundry-provided pads was several times larger, the bias cur-

rents from off chip were fed into a 300:1 current divider at the pad before propagating to the

LCADC core. Therefore, the bias current through the external resistor is not counted in the total

power consumption since it would normally be generated on-chip.

The schematic for the circuit used to test the IC and obtain all the measurements is shown in Fig.

92

5.2. The testing board contains many other features for audio processing, regulated supplies for the

V REF+ and V REF− capacitor DAC supplies, etc. but those were not used (when measurements

were taken). Although two potentiometers are shown in the schematic, only the potentiometer

for the ZCD bias current was adjusted to obtain the best performance per-die, then fixed for all

the following comprehensive measurements. In the calibration performance measurement none of

the external components was adjusted, to get an accurate assessment of the calibration algorithm

itself. The testing DAC current outputs drive an LT6204-based differential-to-single-ended buffer

with 60 dB of gain (1 kΩ transresistance) and a simulated bandwidth of 20 MHz, to show clearly

the steps in the LCADC output.

The entire testing procedure, save for swapping dies is automated through Matlab. The two

ICs reside on a common SPI bus that communicates with a custom IC testing interface. This USB

powered interface provides two variable supplies, fixed high voltage supplies and data storage, and

supports multiple I/O standards. The interface is controlled through a Matlab program. All the

measurement instruments communicate also with Matlab through a National Instruments USB-

GPIB cable. The two boards are shown in Fig. 5.3.

To capture spectral data, an Agilent E4446A spectrum analyzer was set to capture a 30 kHz

bandwidth with a resolution bandwidth of 10 Hz. The instrument was set to take two averages,

and the raw data was processed in Matlab with a peak-search routine to filter the tones from the

noise (since in an LCADC the spurious tones come in easily predictable locations) to provide both

SNR and SDR numbers. The power in each spectrum analyzer bin was normalized by the spectrum

analyzer’s RBW filter shape, to accurately estimate the total energy in the band of interest. It is still

93

VLOW

VHIGH

BIASZCD

D[7:0]

SAMP

RST

VIN

VCM

VREF+

VREF-

BIASTIME

AVDD

DVDD

AVDD

VSS

VSS

AVDD

VSS

VSS

AVDD

AVDD

VSS

VTL

VTH

CLK

VDDZCD

VDDODAC

VDDCDAC

VDDCAL

VDDTIME

VSSANA

VSSDIG

VDDSPI

VDDASYNC

AVDD

VSS

AVDD

VSS

DVDD

VSS

ANALOG INPUT

DIGITAL OUTPUT

10kΩ

5kΩ

5kΩ

10kΩ

200kΩ

200kΩ

1.5MΩ

1.5MΩ

10kΩ

2.2μF

2.2μF

2.2μF

47μF

47μF

47μF

47μF

10μF +

-

LT1880

Figure 5.2: Schematic of the LCADC portion of the testing board used to produce the presentedresults.

possible that a low-amplitude tone is confused for noise and vice versa, but ultimately the SNDR

data is correct since it is the ratio of the fundamental to everything else in the spectrum within the

band of interest. For power measurements, Agilent E34401A multimeters were spliced in-line with

the individual block supplies and for each data point the average of 4 sequential measurements was

taken.

94

Figure 5.3: Test board and IC interface board.

5.2 Results

The following results are all obtained after the calibration routine was run on the IC at startup.

Since the calibration routine is fully self contained, running the routine is just part of the chip

programming sequence through the SPI interface. Both the adaptive resolution and ZCD biasing

schemes were enabled (unless otherwise specified). The noise and distortion of the LCADC was

integrated over a 20 Hz–20 kHz bandwidth. All measurements were performed at room tempera-

ture, with the IC running from a 0.8 V regulated supply.

95

Figure 5.4: Test DAC reconstructing the LCADC output for a 1 kHz, small-amplitude input.

5.3 LCADC System Performance

A typical waveform plot of the LCADC output is shown in Fig. 5.4. A small amplitude input was

chosen to clearly show the quantization steps. Also visible is a small amount of glitching around

a few transitions. As explained in [35], a larger hysteresis window would give greater glitch im-

munity at the cost of poorer SNDR. The superposition of output over input was not meant to be

indicative, it just so happened that the gain of the buffer-LCADC-DAC-buffer path was approx-

imately 1/2. The sharp staircase-like output waveform shows that the output DAC bandwidth is

significantly higher than the 20 kHz signals it processes (as designed).

As motivated in the introduction of this work, this data converter is less suited for processing

continuous-energy signals such as sinusoids than it is for converting bursty, high peak-to-average

ratio signals. Figure 5.5 shows the time-domain output of the LCADC with a cardiac pulse signal

96

Figure 5.5: DAC reconstruction of LCADC output processing a cardiac pulse signal. At the signalpeaks and valleys the quantization step is 2.8 mV. During the high-slew portions of the signal thequantization step increases, to a maximum of 14 mV (5× minimum).

from an Agilent 33220A arbitrary waveform generator. As designed, the LCADC takes large

quantization steps during fast input transients, and takes smaller steps when the input is moving

slowly.

Two unprocessed spectra of -3dBFS inputs are shown in Figs. 5.6 and 5.7, with the peak of

the next highest tone (the second harmonic, which is present due to the single-ended architecture)

indicated by the waveform marker. In both cases, the locations of the spurious tones are at integer

multiples of the fundamental indicating that such tones are harmonics, as expected. A -3dBFS

input amplitude means that only 190 of the 256 possible digital codes are exercised by the input; the

reason the reduced amplitude was chosen is that for amplitudes greater than -3dBFS the distortion

97

Figure 5.6: Spectrum of a single tone, 300 Hz -3dBFS input.

drastically increases, as shown in the plot of SNDR versus amplitude in Fig. 5.8 for a fixed 1

kHz input. It is important to point out that in spite of this circuit impairment the SNDR of the

converter exceeds the theoretical limit for a conventional 8-bit ADC. Many tests were performed

to isolate the cause of the impairment but the results were inconclusive. The first test was to disable

both the adaptive resolution and ZCD biasing schemes, but the problem persisted. The digital data

from the LCADC was captured with a logic analyzer at 500 MS/s and examined for glitches or

saturation in the digital logic, but none were found. This suggests the problem is not in the digital

algorithms. The next theory was that one of the clamping switches, or perhaps one of the junction

diodes in the bootstrapped switch driver transistors (schematic in Fig. 4.22) begins to conduct at

high input amplitudes. To test this, the supply voltage was increased to 2 V, the hypothesis being

that if the problem is indeed a diode conducting, then that occurs at a fixed voltage relative to the

98

Figure 5.7: Spectrum of a single tone, 1 kHz -3dBFS input.

supply rail whereas the input full-scale is proportional to the supply, so with a larger input full-

scale the problem should occur at a higher (proportional) amplitude. Unfortunately, the amplitude

where distortion worsened tracked relative to the supply (full-scale input range) thus invalidating

that hypothesis. In a second test of the same theory, just the capacitor DAC reference voltages

were adjusted, but then distortion due to the switches not being fully on or off was encountered

(verified by simulations). The ZCDs were ruled out since by the time the signal arrives at the ZCD,

it has been level-shifted by the capacitor DAC so regardless of input magnitude the input remains

around VCM. The remaining culprit was the input bootstrap circuit (an intrinsic problem with the

design). A possible theory was that somehow the gate bootstrap voltage is not allowed to increase

far beyond the supply, so for inputs nearing the positive supply the switch on resistance increases.

This idea is partially supported by the observation that the large increase in distortion is caused by

99

−30 −25 −20 −15 −10 −5 0

20

25

30

35

40

45

50

55

60

Input Amplitude (dBFS)

(dB

)

SNDR Conventional 8b ADCSNDRSNRSDR

Figure 5.8: Signal fidelity as a function of input amplitude for a 1 kHz tone input.

a rise in the even harmonics. Unfortunately, this was not reproducible in simulation nor could a

suitable test be devised to confirm this theory. However, as the bootstrap circuit is one of the only

circuits in this design which is not a common or proven architecture1, it seems plausible. Since

the maximum useable input is -3 dB below the digital full-scale output, it would make sense to

re-define the full-scale input amplitude as that which exercises 190 out of 256 digital codes. It was

1Although it was based on [54], bootstrapping circuits are difficult to guarantee (and test reliability) so typically ahandful of proven designs are ported unmodified from one design to another, especially in industry.

100

102

103

104

5

10

15

20

25

30

35

40

45

50

55

60

Input Frequency (Hz)

(dB

)

SNDR Conventional 8b ADCSNDRSNRSDR

Figure 5.9: SNDR as a function of input frequency for a -3 dBFS input.

decided to leave the 0 dBFS reference as absolute with respect to the digital code output, so as to

be mindful of this performance loss.

The SNDR performance over frequency is shown in Fig. 5.9. The sharp jumps in the SDR

are due to transitions between resolutions, but once again the SNDR is generally higher than that

possible with a conventional 8-bit ADC. It was experimentally found that using only 3 different

resolutions (1×LSB, 3×LSB, and 5×LSB) gave the best performance/power ratio. While the loss

of the lower resolution settings required a faster loop response, the increase in power in the ZCDs

101

102

103

104

0

1

2

3

4

5

6

7

8

9


Pow

er (µ

W)

Total PowerDigitalTimingZCDCap DACCal DAC

Figure 5.10: Power consumption as a function of frequency for a -3 dBFS input tone.

was countered by a large savings in power in the digital controller. Increasing the ZCD power

slightly allowed its response time plus the digital logic delays to be under 300 ns. The delay line

was set to accommodate the most stringent TGRAN condition: for a full-scale, 20kHz input when

the LCADC resolution is 5×LSB, TGRAN = 310 ns.

As seen in Fig. 5.10, the digital logic is the dominant power consumer. The fact that more

adaptive resolution steps were implemented than actually used resulted in a slight efficiency loss,

due to having unused delay elements in the delay line and digital logic overhead. Without adaptive

102

resolution, the IC power consumption is 11 µW at a 4 kHz input frequency; this is compared to 6.5

µW with the adaptive algorithm enabled. Unfortunately comparison at higher input frequencies,

where the benefit of adaptive resolution would be even more pronounced, is impossible since the

IC is not able to sample signals above 4 kHz without the adaptive algorithm.

The power plot shows that the ZCD power barely changes. There are a few reasons for this.

One, the static component (the ZCD bias current) is already very low, as seen in Fig. 4.14. More-

over the variation in bias current needed is less than 1 µA. Also, the reduced static consumption at

higher frequencies is canceled by the increased dynamic component, with the bias currents switch-

ing more frequently as sample rate increases, which accounts for the dip then rise in ZCD power

as frequency increases.

A plot of the average sampling rate vs. input frequency for the LCADC in Fig. 5.11 shows

the operation of the AR algorithm. When the LCADC is set to fixed 1× LSB resolution, it hits

the maximum sampling rate (set by the delay line) within the band of interest, and at 3× LSB

resolution it is almost to the point of saturating right at the band edge. When the AR is enabled,

the sampling rate varies without ever saturating. At a maximum input frequency of 20 kHz the

average adaptive sample rate is a little over 1 MS/s. Without adaptive resolution the sample rate

for the same-amplitude input would be 8 MS/s, which means the adaptive resolution algorithm

achieves a maximum of 8× reduction in samples produced. The SNDR for the same setup in

Fig. 5.12 shows that once the sampling rate of the fixed resolutions saturates the distortion kills

performance, while with the AR enabled SNDR is maintained.

103

102

103

104

101

102

103


Ave

rage

Sam

ple

Rat

e (k

Hz)

LSB=1x, (8b)LSB=3x, (6.4b)LSB=5x, (5.7b)Adaptive Resolution

Figure 5.11: Average sampling rate in function of input frequency.

The energy per conversion defined by the classical formula

energy/conversion =Power

2ENOB fs(5.1)

is used2, where we define fs = 2 fBW as with oversampled converters. The ENOB is computed with

ENOB =SNDR−1.76

6.02(5.2)

2Arguably a different metric should be devised to highlight the unique strengths of the LCADC, but for a directcomparison the standard equation was used.

104

102

103

104

20

25

30

35

40

45

50

55

60


SN

DR

(dB

)

LSB=1x, (8b)LSB=3x, (6.4b)LSB=5x, (5.7b)Adaptive Resolution

Figure 5.12: SNDR when the AR is disabled, and enabled.

and fBW = 20 kHz. The metric is plotted over frequency in Fig. 5.13, achieving a best case of 210

fJ/conv with a 200 Hz input.

As discussed in Sec. 4.6, the input bootstrap switch injects extra charge back into the input

each time the clamp switch M2 turns on to disable the pass switch M1 (referring to Fig. 4.22). It

was assumed that the driving resistance of the source was low enough that the source could absorb

this charge, but it is interesting to see what happens when this is not true (and how much source

resistance is needed). Figure 5.14 plots SNR, SDR and SNDR vs. an additional resistance placed

105

102

103

104

100

200

300

400

500

600

700

800

900


(fJ/

conv

)

Figure 5.13: Energy per conversion as defined in (5.1) with respect to frequency.

at the output of the LT1880 input buffer driving the LCADC (board schematic shown in Fig. 5.2).

For a series resistance less than 1 kΩ, the performance is unaffected.

All the measurements provided in this section were performed on three dies, with practically

identical performance. SNDR measurements (identical to the measurement presented in Fig. 5.9)

were taken on 12 dies, and all were found to have identical performance except one, which suffered

a 2-3 dB performance deficit. A summary of the LCADC performance is given in Table 5.1. As

mentioned, the actual current in the bias resistors was not counted since the only reason large

106

102

103

104

42

44

46

48

50

52

54

56

58

60

62

Input Source Resistance (Ω)

(dB

)

SNDRSNRSDR

Figure 5.14: Signal fidelity with varying source resistance.

currents were used was to avoid measurement uncertainty due to unknown leakage through the

ESD diodes. Also, the power dissipated through the voltage divider setting the VCM node was

not counted. The reason is that after the measurements were taken a 2 MΩ potentiometer with a

bypass capacitor was used to set the VCM node and was found to have identical performance to the

20 kΩ resistor divider shown in Fig. 5.2. This is because the VCM node does not draw or source

static current (up to mismatched leakage currents) so even a large resistor divider with enough

decoupling will work properly.

107

Process ST 130 nm 1P6M. Options: LVT, high res. polyDie Area 1.7 (mm)2, active area 0.36 (mm)2

Supply Voltage 0.8 V analog, 0.8 V digital

Quiescent Power Leakage 0.6 µWBias 2.0 µW

Total Power(full scale input)

100 Hz 3.0 µW1 kHz 4.9 µW

10 kHz 7.4 µW

Sample Rate Minimum < 10 S/sMaximum 2 MS/s

Resolution Minimum 4 bitMaximum 8 bit

Tested Bandwidth 20 Hz to 20 kHzFull-Scale Input 0.54 V single-ended peak-peak

SNDR 47 dB to 54 dB (over frequency)SNR 54 dB to 58 dB (over frequency)

Intermodulation > 48 dB, two tones spaced 1kHz swept across BWJ/conv 210–880 fJ/conv, 210 fJ/conv at 200 Hz

Table 5.1: LCADC system performance summary. For the full-scale input definition, 0.54 V is thevoltage required to traverse 190 digital output codes. This range avoids the large-input-amplitudedistortion problem.

5.4 Calibration Algorithm

To quantify the performance of the calibration algorithm, the two bias potentiometers on the test

PCB were fixed, and 12 dies were calibrated. The post-calibration SNDR is plotted in Fig. 5.15.

The calibration routine provides four values to the LCADC: the offset and VLSB/2 values for the

upper and lower ZCDs. Perturbations around each calibrated point were generated and manually

provided to the LCADC. As shown, the calibration algorithm achieves in almost all cases the peak

SNDR, with sample 3 being the die which exhibited inferior performance overall. The SNDR is

108

1 2 3 4 5 6 7 8 9 10 11 12

30

35

40

45

50

Sample Number

SN

DR

(dB

)

Self−Calibration ValuesPerturbed Values

Figure 5.15: Calibrated SNDR of a dozen dies.

slightly lower than that shown in Figs. 5.8 and 5.9 because the ZCD bias current was fixed between

dies. If the ZCD bias current is tuned per die, an extra 2-3 dB is gained.

Because the calibration algorithm provides the ZCD offset, the offsets for 24 samples (2 ZCDs

per die) are given in Table 5.2. The measured σ = 5.9 mV agrees with the simulated mismatch.

109

Die # Upper ZCD Lower ZCD1 -9.16 0.1042 4.712 -8.433 10.49 -4.294 -1.95 -7.205 5.70 -3.246 -5.35 3.427 3.76 1.398 6.25 7.229 -2.07 -11.3810 3.76 4.6111 5.28 -3.6112 2.25 6.02

Table 5.2: ZCD offsets in mV. σ = 5.9 mV.

5.5 Delay Elements

A replica delay chain was placed on the die for testing purposes. The 11 delay elements are each

controlled by a 4-bit DAC. Each delay cell was found to have practically identical performance,

and its delay over the digital tuning range, for a range of analog bias currents and supply voltages,

is plotted in Fig. 5.16. Since the delay is determined by VT , the delay shows insensitivity toward

power supply voltage (but would vary with process and temperature). The periodic kinks in the

tuning range were present in all the delays, and are traced back to metal 2 routing over the bias

DACs, which is replicated in each DAC.

The jitter of the delay cell was measured by using the measure functions of an Agilent 54832D

sampling oscilloscope capturing at 2 GS/s to measure the time between the input and output edges

for one delay cell. The bias current was set to roughly 24 nA (due to the current dividers driving the

110

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

103

TIMEi[3:0]

Del

ay (

ns)

16 nA33 nA50 nA67 nA83 nA100 nA

Figure 5.16: Delay tuning range for a delay cell, vs. the 4-bit T IMEi[3:0] control value. Themultiple lines at each color are the delay for VDD = 0.8,1.0,1.2 V.

internal bias nodes, it is not possible to know the exact bias current to the cell) and 2000 samples

were taken. The jitter histogram in shown in Fig. 5.17.

Since the current into the timing VDD supply pin includes the power of the delay element DACs

as well as the delay elements and the asynchronous combinational logic, the power of a single delay

was calculated by measuring the change in the timing supply power versus the average sample rate

for a varying low input frequency (to ensure that all delay elements are triggered each sample) and

was found to be 43 fJ/delay. This number also includes the power of the combinational logic shown

111

1.01 1.015 1.02 1.025 1.03 1.035 1.04 1.045 1.05 1.0550

20

40

60

80

100

120

140

Delay (µs)

Occ

urra

nce

Figure 5.17: Delay jitter histogram, N = 2000, µ = 1.033 µs, σ = 4.1 ns.

in Fig. 4.6, but separating that is impossible. The delay element performance is summarized in

Table 5.3.

5.6 Oscillator

The calibration oscillator has a 4-bit tuning word OSCTUNE[3:0] to account for PVT variations.

The tuning curves for 12 dies and three supply voltages are plotted in Fig. 5.18.

112

Parameter This Work [43]Technology 130 nm 90 nm

Area 130 (µm)2 36 (µm)2

Energy 43 fJ/delay 50 fJ/delayRange 700 ns – 4 µs 5 ns – 1 us

Mismatch (1σ) 0.14 ns -Jitter 4.1 ns @ 1 µs (0.41%) 6.1 ns @ 1.82 µs (0.3%)

Table 5.3: Low-speed delay element comparison, results are from measurement except for simu-lated mismatch. The tuning range of this cell is reduced due to the digital 4-bit control. Its area isalso much larger due to the 4-bit bias DAC.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1

2

3

OSCTUNE[3:0]

Osc

illat

ion

Fre

quen

cy (

MH

z)

Figure 5.18: Calibration oscillator frequency vs. the oscillator tuning code OSCTUNE[3:0] for12 dies and VDD = 0.8,1.0,1.2 V.

Chapter 6

Conclusions and Suggestions for Future

Work

In this thesis, performance limitations and constraints on ultra-low power LCADCs were in-

vestigated. An 8-bit, 20kHz bandwidth LCADC implemented in 130 nm CMOS was used to

demonstrate ultra-low power operation of an LCADC. We now look at how this design compares

against existing state-of-the-art designs in both level-crossing (asynchronous) and conventional

(synchronous) ADC categories.

113

114

6.1 Performance Comparison

6.1.1 Level-Crossing ADCs

The measured performance in Table 5.1 surpasses previous state-of-the art LCADC designs in

Table 6.1, even without the use of digital post processing, and features the highest level of system

integration. Calibration as well as most of the bias generation is all contained in the IC, and unlike

?? the asynchronous digital controller is also on chip.

To be fair, the LCADC in [10] was just a small part of a much larger design, so direct compar-

ison of the LCADCs only does not do justice to the huge effort in producing the whole working

system. Works [27] and [19] both use high-order polynomial reconstruction with non-realtime

post-processing of the data. Because this IC was designed to achieve equivalent SNDR without

post-processing, the system was designed such that each block contributes equally to the overall

SNDR degradation. Using post-processing on the raw PCM data from this ADC only results in an

improvement of 4 dB in SNDR on average, as both the noise of the delay elements and distortion

from the input sampling switch are comparable to the quantization distortion.

6.1.2 Synchronous ADCs

In spite of the relatively good performance against its peers, this LCADC is still far surpassed

by a state-of-the-art SAR ADC design from two years prior [49] by a factor of 50 in overall en-

ergy/conversion. If we were to impose the condition on the input signal that it is only active

115

Work [27]† [8] [10] [19]† [This work]Year 2005 2006 2008 2011 2011Tech 130 nm 0.25 µm 90 nm 0.18 µm 130 nmArea 0.1 (mm)2 0.25 (mm)2 0.06 (mm)2 0.96 (mm)2 0.36 (mm)2

Resolution 4 bit 6 bit 8 bit 8 bit 8 bitAdaptive no yes

Postprocess yes no yes noBandwidth 160 kHz 55 kHz 10 kHz 1 kHz 20 kHz

SNDR 60 dB 53 dB 58 dB 52 dB 47-54 dBPower 180 µW 17 mW 50 µW 25 µW 2-8 µW

Energy/conv 560 fJ/conv 345 pJ/conv 3.14 pJ/conv 31 pJ/conv 200-850 fJ/conv

Table 6.1: Comparison of this work to published LCADCs. A † indicates that offline post-processing was used to achieve the SNDR numbers reported.

(non-zero) 1% of the time, to model a “bursty” signal, then the average power consumption of

the LCADC would drop to about 2.6µW. The synchronous SAR ADC on the other hand would

continue to consume the same amount of power regardless of input, and so the energy/conversion

advantage of [49] in this scenario drops to a factor of 40. Of course this comparison is not en-

tirely fair to the SAR ADC since many techniques to power cycle ADCs for signals with high

peak-to-average ratio have been developed as well.

Continuing the comparison between this work and the SAR ADC in [49], the SAR ADC is im-

plemented in 65 nm technology, two nodes more advanced. Were the LCADC also implemented in

such technology we can expect the digital power consumption to decrease by almost 4× (compar-

ing simulations of the asynchronous logic synthesized in both 130 nm and 65 nm CMOS) thanks

not only to the device scaling but also the availability of more advanced logic cells (separate fam-

ilies of HVT, low leakage, and low power). The size of the SAR array would also decrease by

116

around 40%, not only because of the reduced metal pitch but because a re-design with a layout

parasitic extractor would allow a more aggressive unit capacitor size. With a 4 fF unit cell instead

of the 20 fF cell used in this design, the power consumption of the capacitor DAC would drop

by half (the switch drivers of the capacitor bottom plates consume power which does not scale

equally with the capacitor size). With just the energy saved in the digital logic and capacitor DAC,

the overall power consumption of the IC would drop by almost 40%. Ultimately however these are

unfair speculations, as hindsight is 20/20 and if the designers of [49] were given the opportunity to

redesign the IC it too would perform better.

More generally, even though this LCADC offers favorable performance against other LCADCs,

for this circuit to be profitable in a complete system would require a system entirely without a

clock. Given the availability of a clock, either [49] or [51] would offer superior performance in

any practical situation. For example, when sampling at 100 kS/s [51] consumes 300 nW (half of

just the leakage power of this design), yet provides similar SNDR and offers 2.5× the effective

bandwidth. In the face of this comparison, it ultimately rests on the system designer to determine

the optimal balance between power consumption in the ADC/LCADC and the necessity of clock

generation circuitry, while being mindful of the expected types of inputs the system must process.

6.2 Suggestions for Future Work

In Chaps. 2 and 3 the zero-crossing detector (ZCD) was identified to be the main performance

bottleneck in achieving high-performance LCADCs. The architecture in 4 was built around this

117

premise, and as is visible from the power consumption breakdown in Fig. 5.10, the ZCD is no

longer the dominant power consumer. If anything, the presented architecture takes the opposite

extreme, in the sense that the ZCD is allowed significantly lower performance (and power con-

sumption) at the expense of digital correction algorithms, which dominate the power budget. This

architecture is optimized for a finer technology, where digital power consumption scales down

while analog bias currents remain roughly constant. It would therefore be interesting to see what

performance would be possible from this design in 65 nm CMOS.

It is clear from the rate of improvement among the designs presented in Table 6.1 that still

much is left to be discovered about LCADCs. Unlike SAR ADCs (the current FOM leaders

for Nyquist ADCs) where the state-of-the-art improves only incrementally year over year, this

LCADC is a substantial improvement over ones before, and the LCADC which follows this one

will surely leapfrog the presented performance. Nevertheless, the limitations on LCADCs (as a

subset of ZCD-based circuits) raises the question whether or not, for any given application, an

LCADC-based system will outperform a synchronous one. The answer to this question is appli-

cation specific; based on raw performance metrics developed for synchronous ADCs, the LCADC

is not yet competitive. However, the classic FOM metric in Eq. (5.1) does not take any signal

properties into account, while the presented LCADC does. As discussed, the relative performance

disparity decreases when high peak-average ratio signals are considered. Therefore, perhaps most

of the work to be done is in finding the right application for this type of data converter. Possible

applications may be found in areas such as biomedical electronics and sensor networks.

Bibliography

[1] P. Ellis, “Extension of phase plane analysis to quantized systems,” Automatic Control, IRETransactions on, vol. 4, no. 2, pp. 43 – 54, November 1959.

[2] R. Dorf, M. Farren, and C. Phillips, “Adaptive sampling frequency for sampled-data controlsystems,” Automatic Control, IRE Transactions on, vol. 7, no. 1, pp. 38 – 47, January 1962.

[3] R. Tomovic and G. Bekey, “Adaptive sampling based on amplitude sensitivity,” AutomaticControl, IEEE Transactions on, vol. 11, no. 2, pp. 282 – 284, April 1966.

[4] J. Mark and T. Todd, “A Nonuniform Sampling Approach to Data Compression,” Communi-cations, IEEE Transactions on, vol. 29, no. 1, pp. 24 – 32, January 1981.

[5] Y. Tsividis, “Continuous-time digital signal processing,” Electronics Letters, vol. 39, no. 21,pp. 1551 – 1552, October 2003.

[6] ——, “Digital signal processing in continuous time: a possibility for avoiding aliasing and re-ducing quantization error,” in Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP ’04). IEEE International Conference on, vol. 2, May 2004, pp. 589–592.

[7] E. Allier, G. Sicard, L. Fesquet, and M. Renaudin, “A new class of asynchronous A/D con-verters based on time quantization,” in Asynchronous Circuits and Systems, 2003. Proceed-ings of the Ninth International Symposium on, May 2003, pp. 196 – 205.

[8] Y. Li, K. Shepard, and Y. Tsividis, “A Continuous-Time Programmable Digital FIR Filter,”Solid-State Circuits, IEEE Journal of, vol. 41, no. 11, pp. 2512 –2520, November 2006.

[9] Z. Zhao and A. Prodic, “Continuous-Time Digital Controller for High-Frequency DC-DCConverters,” Power Electronics, IEEE Transactions on, vol. 23, no. 2, pp. 564 –573, March2008.

[10] B. Schell and Y. Tsividis, “A Continuous-Time ADC/DSP/DAC System With No Clock andWith Activity-Dependent Power Dissipation,” Solid-State Circuits, IEEE Journal of, vol. 43,no. 11, pp. 2472 –2481, November 2008.

118

119

[11] J. Pangjun and S. Sapatnekar, “Low-power clock distribution using multiple voltages and re-duced swings,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 10,no. 3, pp. 309 –318, June 2002.

[12] Y. Tsividis, “Event-Driven Data Acquisition and Digital Signal Processing–A Tutorial,” Cir-cuits and Systems II: Express Briefs, IEEE Transactions on, vol. 57, no. 8, pp. 577 –581,August 2010.

[13] Y. Palaskas, Y. Tsividis, V. Prodanov, and V. Boccuzzi, “A ”divide and conquer” techniquefor implementing wide dynamic range continuous-time filters,” Solid-State Circuits, IEEEJournal of, vol. 39, no. 2, pp. 297 – 307, February 2004.

[14] M. Ozgun, Y. Tsividis, and G. Burra, “Dynamic power optimization of active filters withapplication to zero-IF receivers,” Solid-State Circuits, IEEE Journal of, vol. 41, no. 6, pp.1344 – 1352, June 2006.

[15] A. Yoshizawa and Y. Tsividis, “A Channel-Select Filter With Agile Blocker Detection andAdaptive Power Dissipation,” Solid-State Circuits, IEEE Journal of, vol. 42, no. 5, pp. 1090–1099, May 2007.

[16] A. Wang and A. Chandrakasan, “A 180-mV subthreshold FFT processor using a minimumenergy design methodology,” Solid-State Circuits, IEEE Journal of, vol. 40, no. 1, pp. 310 –319, January 2005.

[17] J. Howard, S. Dighe, S. Vangal, G. Ruhl, N. Borkar, S. Jain, V. Erraguntla, M. Konow,M. Riepen, M. Gries, G. Droege, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, and R. VanDer Wijngaart, “A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling,” Solid-State Circuits, IEEE Journalof, vol. 46, no. 1, pp. 173 –183, January 2011.

[18] Z. Yuhua, Q. Longhua, L. Qiang, Q. Peide, and Z. Lei, “A dynamic frequency scaling so-lution to dpm in embedded linux systems,” in Information Reuse Integration, 2009. IEEEInternational Conference on, August 2009, pp. 256 –261.

[19] M. Trakimas and S. Sonkusale, “An Adaptive Resolution Asynchronous ADC Architecturefor Data Compression in Energy Constrained Sensing Applications,” Circuits and Systems I:Regular Papers, IEEE Transactions on, vol. 58, no. 5, pp. 921 –934, May 2011.

[20] M. Neugebauer and K. Kabitzsch, “A new protocol for a low power sensor network,” inPerformance, Computing, and Communications, 2004 IEEE International Conference on,2004, pp. 393 – 399.

[21] K. Kozmin, “Data acquisition and signal conditioning for low power measurement systems,”Ph.D. dissertation, Lulea University of Technology, 2008.

120

[22] K. Guan and A. Singer, “Opportunistic sampling by level-crossing,” in Acoustics, Speech andSignal Processing, 2007. IEEE International Conference on, vol. 3, April 2007, pp. III–1513–III–1516.

[23] R. Plassche, CMOS Integrated Analog-to-Digital and Digital-to-Analog Converters. KluwerAcademic Publishers, 2003.

[24] M. Kurchuk, C. Weltin-Wu, D. Morche, and Y. Tsividis, “GHz-range continuous-time pro-grammable digital FIR with power dissipation that automatically adapts to signal activity,”in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE Interna-tional, February 2011, pp. 232 –234.

[25] Interface Integrated Circuits 1974, National Semiconductor, 1974.

[26] F. Akopyan, R. Manohar, and A. Apsel, “A level-crossing flash asynchronous analog-to-digital converter,” in Asynchronous Circuits and Systems, 2006. 12th IEEE InternationalSymposium on, March 2006, pp. 11 pp. –22.

[27] E. Allier, J. Goulier, G. Sicard, A. Dezzani, E. Andre, and M. Renaudin, “A 120nm lowpower asynchronous ADC,” in Low Power Electronics and Design, 2005. Proceedings of theInternational Symposium on, August 2005, pp. 60 – 65.

[28] J. McCreary and P. Gray, “All-MOS charge redistribution analog-to-digital conversion tech-niques. I,” Solid-State Circuits, IEEE Journal of, vol. 10, no. 6, pp. 371 – 379, December1975.

[29] LMV7219 Datasheet, National Semiconductor, 2004. [Online]. Available: http://www.national.com/ds/LM/LMV7219.pdf

[30] S.-K. Shin, Y.-S. You, S.-H. Lee, K.-H. Moon, J.-W. Kim, L. Brooks, and H.-S. Lee, “Afully-differential zero-crossing-based 1.2V 10b 26MS/s pipelined ADC in 65nm CMOS,” inVLSI Circuits. 2008 IEEE Symposium on, June 2008, pp. 218 –219.

[31] B. Hershberg, S. Weaver, and U.-K. Moon, “Design of a Split-CLS Pipelined ADC With FullSignal Swing Using an Accurate But Fractional Signal Swing Opamp,” Solid-State Circuits,IEEE Journal of, vol. 45, no. 12, pp. 2623 –2633, December 2010.

[32] P. Sotiriadis and A. Chandrakasan, “Low power bus coding techniques considering inter-wirecapacitances ,” in Custom Integrated Circuits Conference. Proceedings of the IEEE 2000,2000, pp. 507 –510.

[33] T. C. Sepke, “Comparator Design and Analysis for Comparator-Based Switched-CapacitorCircuits,” Ph.D. dissertation, Massachusetts Institute of Technology, September 2006.

121

[34] R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth, “On the Lambert W Function,”Advances in Computational Mathematics, vol. 5, pp. 329 –359, 1996.

[35] B. Schell, “Continuous-Time Digital Signal Processors: Analysis and Implementation,”Ph.D. dissertation, Columbia University, May 2008.

[36] B. P. Lathi, Signal Processing and Linear Systems. University of Waterloo, 1998.

[37] W. Gardner, Cyclostationarity in Communications and Signal Processing. IEEE Press, 1994.

[38] P. Kloeden and E. Platen, Numerical Solution of Stochastic Differential Equations. Springer,1991.

[39] M. Kurchuk, “Signal Encoding and Digital Signal Processing in Continuous Time,” Ph.D.dissertation, Columbia University, May 2011.

[40] A. Hajimiri and T. Lee, “A general theory of phase noise in electrical oscillators,” Solid-StateCircuits, IEEE Journal of, vol. 33, no. 2, pp. 179 –194, February 1998.

[41] M. Perrott, M. Trott, and C. Sodini, “A modeling approach for Σ∆ fractional-N frequencysynthesizers allowing straightforward noise analysis,” Solid-State Circuits, IEEE Journal of,vol. 37, no. 8, pp. 1028 – 1038, August 2002.

[42] M. Kurchuk and Y. Tsividis, “Signal-Dependent Variable-Resolution Clockless A/DConversion With Application to Continuous-Time Digital Signal Processing,” Circuits andSystems I: Regular Papers, IEEE Transactions on, vol. 57, no. 5, pp. 982 –991, May 2010.

[43] B. Schell and Y. Tsividis, “A Low Power Tunable Delay Element Suitable for AsynchronousDelays of Burst Information,” Solid-State Circuits, IEEE Journal of, vol. 43, no. 5, pp. 1227–1234, May 2008.

[44] M. Kurchuk and Y. Tsividis, “Energy-efficient asynchronous delay element with wide con-trollability,” in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Sym-posium on, June 2010, pp. 3837 –3840.

[45] M. Bazes, “Two novel fully complementary self-biased CMOS differential amplifiers,” Solid-State Circuits, IEEE Journal of, vol. 26, no. 2, pp. 165 –168, February 1991.

[46] R. Aparicio and A. Hajimiri, “Capacity limits and matching properties of integrated capaci-tors,” Solid-State Circuits, IEEE Journal of, vol. 37, no. 3, pp. 384 –393, March 2002.

[47] Y.-Z. Lin, C.-C. Liu, G.-Y. Huang, Y.-T. Shyu, and S.-J. Chang, “A 9-bit 150-MS/s 1.53-mWsubranged SAR ADC in 90-nm CMOS,” in VLSI Circuits. 2010 IEEE Symposium on, June2010, pp. 243 –244.

122

[48] C.-C. Liu, S.-J. Chang, G.-Y. Huang, and Y.-Z. Lin, “A 10-bit 50-MS/s SAR ADC With aMonotonic Capacitor Switching Procedure,” Solid-State Circuits, IEEE Journal of, vol. 45,no. 4, pp. 731 –740, April 2010.

[49] M. van Elzakker, E. van Tuijl, P. Geraedts, D. Schinkel, E. Klumperink, and B. Nauta, “A10-bit Charge-Redistribution ADC Consuming 1.9 µW at 1 MS/s,” Solid-State Circuits, IEEEJournal of, vol. 45, no. 5, pp. 1007 –1015, May 2010.

[50] G. Promitzer, “12-bit low-power fully differential switched capacitor noncalibrating succes-sive approximation ADC with 1 MS/s,” Solid-State Circuits, IEEE Journal of, vol. 36, no. 7,pp. 1138 –1143, July 2001.

[51] P. Harpe, C. Zhou, Y. Bi, N. van der Meijs, X. Wang, K. Philips, G. Dolmans, and H. de Groot,“A 26 muW 8 bit 10 MS/s Asynchronous SAR ADC for Low Energy Radios,” Solid-StateCircuits, IEEE Journal of, vol. 46, no. 7, pp. 1585 –1595, July 2011.

[52] M. Pelgrom, “A 10-b 50-MHz CMOS D/A converter with 75-Ω buffer,” Solid-State Circuits,IEEE Journal of, vol. 25, no. 6, pp. 1347 –1352, December 1990.

[53] M. Dessouky and A. Kaiser, “Input switch configuration suitable for rail-to-rail operation ofswitched op amp circuits,” Electronics Letters, vol. 35, no. 1, pp. 8 –10, January 1999.

[54] A. Abo and P. Gray, “A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter,”Solid-State Circuits, IEEE Journal of, vol. 34, no. 5, pp. 599 –606, May 1999.

[55] G. Wegmann, E. Vittoz, and F. Rahali, “Charge injection in analog MOS switches,” Solid-State Circuits, IEEE Journal of, vol. 22, no. 6, pp. 1091 – 1097, December 1987.

Appendix A

Derivation of Distortion Equations

This appendix provides the supporting derivations for quantities used in chap. 3.

A.1 The Fourier Components cm,n

We first compute the Fourier coefficients cm,n: the indices indicate the nth Fourier coefficient of the

T Rm(t) signal. If the input is a full-scale frequency-normalized sine of the form x(t) = sin(2πt),

then the output of the mth level T Rm(t) within the [0,1) interval is

T Rm(t) =

12N−1 , t1 < t(mod1)< t2

− 12N−1 , t3 < t(mod1)< t4

0, otherwise.

(A.1)

123

124

where we have used the shorthand

t1 =arcsin(Tm)

2π

t2 =12− arcsin(Tm)

2π

t3 =12+

arcsin(Tm)

2π

t4 = 1− arcsin(Tm)

2π

(A.2)

From this, we can compute the Fourier coefficients cm,n as

cm,n =∫ 1

0T Rm(t)e−2π jntdt (A.3)

Since T Rm(t) is non-zero over two intervals, we can decompose the integral:

∫ 1

0T Rm(t)e−2π jntdt =

12N−1

[−∫ t2

t1e−2π jntdt−

∫ t4

t3e−2π jntdt

](A.4)

=1

2π jn2N−1

[e−2π jnt∣∣t4

t3− e−2π jnt∣∣t2

t1

](A.5)

=1

2π jn2N−1

[e−2π jne jnAm− e−π jne− jnAm− e−π jne jnAm + e− jnAm

](A.6)

= (1− (−1)n)e jnAm + e− jnAm

2π jn2N−1 (A.7)

125

where Am = arcsin(Tm). This can be further reduced to

cm,n =

2cos(narcsin(Tm))

π jn2N−1 , n ∈ [1,3,5 · · ·)

0, otherwise.(A.8)

A.2 Non-Uniform Delay Impact on Harmonic Distortion

If fn was the nth Fourier component of vxq(t) without ZCD delay dispersion (see Eq. (3.16)),

then denoting the nth Fourier coefficient of vxq(t) (with ZCD delay dispersion) as gn, then we can

express gn as:

gn = ∑m

e−2π jndmcm,n, (A.9)

where dm are the ZCD delays corresponding to the T Rm levels. This equation follows from Eq.

(3.23). The power of the nth harmonic with delay dispersion distortion is equal to the squared norm

of gn:

|gn|2 = ∑m

e−2π jndmcm,n∑m

e−2π jndmcm,n (A.10)

= ∑m

e−2π jndmcm,n ∑m

e2π jndmcm,n (A.11)

= ∑m|cm,n|2 + ∑

r 6=se−2π jn(dr−ds)cr,ncs,n (A.12)

For shorthand we define dr,s ≡ dr− ds. We observe that the terms of the second summation

can be grouped into pairs of the form e−2π jndr,scr,ncs,n + e2π jndr,scr,ncs,n, which is equivalent to

126

z+ z = 2R z, so

|gn|2 = ∑m|cm,n|2 + ∑

r 6=sR

e−2π jn(dr,s)cr,ncs,n

(A.13)

To make sure, if we consider the case with zero (or constant) delay through the ZCD, all the terms

dr,s are zero, hence all the cross terms reduce to

|gn|2 = ∑m|cm,n|2 + ∑

r 6=sR cr,ncs,n= | fn|2 (A.14)

the last equality can be seen by squaring Eq. (3.17).

To give this relevance, the spectrum of an LCADC has dominant distortion components close

to the fundamental (3rd, 5th, etc.) and very far out in frequency. Since the far ones are easily

filtered, we focus on the close-in spurs, which have a longer periodicity. Therefore, let us make the

assumption that the ZCD delay dispersion is much smaller than the full period of the input1, i.e.

max |dr,s| 2π, where 2π is the period of the frequency-normalized input. If we consider the nth

harmonic for small n, it would carry over that nmax |dr,s| 2π, so we can then approximate

e−2π jn(dr,s) = cos(2πndr,s)− j sin(2πndr,s) (A.15)

≈ 1−2π2n2d2

r,s− j2πndr,s (A.16)

1For LCADCs based on feedback DACs (i.e. not flash LCADC) the maximum delay around the entire loop, whichis also an upper bound on the absolute ZCD delay, must be a fraction of the input period, and usually even a fractionof the time granularity. Therefore, the assumption is valid.

127

When we substitute this back into Eq. (A.13), we get

|gn|2 = ∑m|cm,n|2 + ∑

r 6=sR e−2π jndr,scr,ncs,n (A.17)

= ∑m|cm,n|2 + ∑

r 6=s

[(1−2π

2n2d2r,s)

R cr,ncs,n+2πndr,sIcr,ncs,n]

(A.18)

= ∑m|cm,n|2 + ∑

r 6=sR cr,ncs,n−2π

2n2∑r 6=s

d2r,sR cr,ncs,n+2πn ∑

r 6=sdr,sIcr,ncs,n (A.19)

= | fn|2−2π2n2

∑r 6=s

d2r,sR cr,ncs,n+2πn ∑

r 6=sdr,sIcr,ncs,n (A.20)

Therefore, we can write

|gn|2 = | fn|2 + en, (A.21)

where

en = 2πn ∑r 6=s

dr,sIcr,ncs,n−2π2n2

∑r 6=s

d2r,sR cr,ncs,n. (A.22)

A.3 Input Slope at T Rm(t) Transitions

At a transition of the T Rm(t) signal, we know that the input voltage to the quantizer is ±Tm. There

are four transitions per period of the T Rm(t) signal, whose transition times are given in eqs. A.2.

Since the slope of a full-scale frequency-normalized sine input is

v′x(t) = 2πcos(2πt) (A.23)

128

if we plug in t1 · · · t4 from Eq. (A.1) we get

|v′m|= 2πcos(arcsin(Tm)) (A.24)

Where v′m is the derivative of the input at the transitions of the T Rm(t) signal. It is assumed that the

ZCD has a response time that varies only with the magnitude of input derivative, not input polarity.

Appendix B

Silicon Errata

The following is a list of the bugs so far discovered in the LCADC.

1. All power domains are isolated from each other with only one back-to-back diode, and there-

fore cannot be powered off independently (without drawing several mA).

2. Layout mismatch due to stress effects of metal routing over active areas in the current DACs

of the timing generator cause non-monotonicity in the delay tuning curves. The routing was

necessary to save layout area.

3. Even when SPI is inactive (chip select CS high) serial data out (SDO) is still low-impedance.

In the situation where the IC is on a shared SPI bus, an external tristate buffer is needed on

SDO to share the output line. This is not a problem when the IC communicates directly with

the SPI master.

129

130

4. Under certain circumstances, when the analog supply is ramped slowly (such as in the pres-

ence of large external decoupling and/or high supply series resistance) the input switch boot-

strap driver may not initialize properly. In normal operation, this disables the chip since the

input sampling capacitor is discharged and the ZCD never triggers a sample, so the input

bootstrap driver is never initialized. A solution is to run the calibration algorithm imme-

diately after power-on/reset, since the calibration routine force-initializes the high-voltage

generator. The other solution is to put an RC timeout on the reset pin as was done in the

testing board.

5. An undetermined source of nonlinearity (suspected to be in the bootstrapped input switch)

limits the usable range of DAC output codes to be around 190/256. This is discussed in detail

in Chap. 5.

6. Under rare conditions (if the calibration mode is enabled precisely when the asynchronous

logic is in certain state), the input bootstrapped switch may not disconnect the input from

the ADC during calibration, resulting in erroneous calibration values. It has been found that

this event occurs so rarely that simply running the calibration three times and taking the two

results most similar is sufficient.

7. A ground-referenced bias voltage is sent between power domains in the timing replica test

circuit. Therefore, any disturbances between the two grounds is coupled directly into the test

circuit. Under normal conditions this should not prevent proper operation, although it will

increase timing jitter in the test outputs. The main function of the test circuits is to collect

131

data for this thesis; this error is not present in the main timing generator and therefore does

not degrade ADC performance.

8. The SPI input pads do not have hysteresis and are thus prone to registering a ringing line as

erroneous clock edges. With proper termination (a shunt of 100 Ω and 220 pF was used for

a 1-m 100-Ω ribbon cable) the SPI bus will operate reliably at 12 Mb/s.

Design and Optimization of Low-power Level-crossing ADCs

Documents

Transcript of Design and Optimization of Low-power Level-crossing ADCs