A Column-Row-Parallel ASIC Architecture for 3D Wearable ...

A Column-Row-Parallel ASIC Architecture for 3D

Wearable / Portable Medical Ultrasonic Imaging

by

Kailiang Chen

B.E., Tsinghua University (2007)S.M., Massachusetts Institute of Technology (2009)

Submitted to the Department of Electrical Engineering and ComputerScience

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

February 2014

c© Massachusetts Institute of Technology 2014. All rights reserved.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Electrical Engineering and Computer Science

January 31, 2014

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Charles G. Sodini

LeBel Professor of Electrical EngineeringThesis Supervisor

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Anantha P. Chandrakasan

Joseph F. and Nancy P. Keithley Professor of Electrical EngineeringThesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Leslie A. Kolodziejski

Chair, Department Committee on Graduate Students

A Column-Row-Parallel ASIC Architecture for 3D Wearable

/ Portable Medical Ultrasonic Imaging

by

Kailiang Chen

Submitted to the Department of Electrical Engineering and Computer Scienceon January 31, 2014, in partial fulfillment of the

requirements for the degree ofDoctor of Philosophy

Abstract

This work presents a scalable Column-Row-Parallel ASIC architecture for 3D wear-able / portable medical ultrasound. It leverages programmable electronic addressingto achieve linear scaling for both hardware interconnection and software data acqui-sition. A 16x16 transceiver ASIC is fabricated and flip-chip bonded to a 16x16 ca-pacitive micromachined ultrasonic transducer (CMUT) to demonstrate the compact,low-power front-end assembly. A 3D plane-wave coherent compounding algorithm isdesigned for fast volume rate (62.5 volume/s), high quality 3D ultrasonic imaging.An interleaved checker board pattern with I&Q excitations is also proposed for ul-trasonic harmonic imaging, reducing transmitted second harmonic distortion by over20dB, applicable to nonlinear transducers and circuits with arbitrary pulse shapes.

Each transceiver circuit is element-matched to its CMUT element. The highvoltage transmitter employs a 3-level pulse-shaping technique with charge recyclingto enhance the power efficiency, requiring minimum off-chip components. Comparedto traditional 2-level pulsers, 50% more acoustic power delivery is obtained with thesame total power dissipation. The receiver is implemented with a transimpedanceamplifier topology and achieves a lowest noise efficiency factor in the literature (2.1

compared to a previously reported lowest of 3.6, in unit of mPa ·√mW/Hz). A

source follower stage is specially designed to combine the analog outputs of receivers inparallel, improving output SNR as parallelization increases and offering flexibility forimaging algorithm design. Lastly, fault-tolerance is incorporated into the transceiverto deal with faulty elements within the 2D MEMS transducer array, increasing yieldfor the system assembly.

Thesis Supervisor: Charles G. SodiniTitle: LeBel Professor of Electrical Engineering

Thesis Supervisor: Anantha P. ChandrakasanTitle: Joseph F. and Nancy P. Keithley Professor of Electrical Engineering

3

Acknowledgments

Finishing my Ph.D. is not possible without the enduring love from my parents and

wife. I would like to thank them for all their support. Recently we have been through

difficult moments together, but I look forward to the good days to come.

I feel extremely fortunate to work under the joint supervision of Prof. Charlie

Sodini and Prof. Anantha Chandrakasan. I am grateful to Charlie, who is a great

teacher for me inside and outside of school. I learned from him to always try to seek

for insight and intuition behind a problem. I also learned from him to be down-to-

earth, yet persistent, both in research and in life. I enjoyed our conversations, softball

games played together for MTL, Redsox games, and of course, the Hong Kong trip.

All of them are unforgettable.

I would like to express my gratitude to Anantha. Even as the Department Head

with an incredibly busy schedule, I was able to receive ample guidance from him. He

is always resourceful and creative, which sets me a standard for a good researcher.

I would like to thank Prof. Greg Wornell for being in my thesis committee and

providing insights about imaging system trade-offs; Prof. Harry Lee for providing

many clever circuit design ideas; Dr. Kai Thomenius for teaching me a lot of ul-

trasonics know-how; Dr. Brian Brandt for continued support for my test setup and

career development; Prof. Thomas Heldt, Tom O’Dwyer, Dr. Dennis Buss, Dr. Peter

Holloway, and Mr. Haiyang Zhu for many useful technical discussions. I am thankful

for all their help to my project.

I am grateful to people who helped me with the hardware system assembly, which

is the key to the successful project demonstration. The ASIC fabrication is gener-

ously made possible through the TSMC University Shuttle Program. The CMUT

samples are obtained from Prof. Butrus (Pierre) Khuri-Yakub’s research group at

Stanford University; students Byung Chul Lee, Anshuman Bhuyan, and Jung Woo

Choe offered me many handy tips to work with the device. The CMUT-PCB-ASIC

flip-chip bonding assembly was done with the help of Dr. Helen Kim and MIT Lin-

coln Laboratory. The acrylic oil tank and the 3D translation stage were designed and

5

built with the assistance of MIT Central Machine Shop.

It has been a pleasant journey because of my colleagues in the Sodini/Lee lab

and the Anantha group. In particular, I would like to thank Bonnie Lam, Sabino

Pietrangelo, Joohyun Seo, and Katherine Smyth for a lot of intriguing discussions

about ultrasonics. Also, I would like to thank Sunghyuk Lee, SungWon Chung, Wei

Li, and Marcus Yip for the tremendous help during my tape-outs. Daniel Piedra,

Allen Hsu, Bin Lu, and Jerome Lin taught me how to operate a probe station to take

accurate measurements on a bare silicon die. Moreover, I would like to thank David

He, Amanda Gaudreau, Philip Godoy, Jack Chu, Grant Anderson, Doyeon Yoon, Xi

Yang, Eric Winokur, Maggie Delano, Daniel Kumar, Bruno Do Valle, and many more

for being great labmates with whom I could hang out and have fun. Last but not

least, Coleen Milley and Margaret Flaherty have been very supportive in logistics,

who always make sure everything in lab runs smoothly.

This project is funded by the C2S2 Focus Center, one of six research centers

funded under the Focus Center Research Program (FCRP), a Semiconductor Research

Corporation entity; Texas Instruments; and the MIT Center for Integrated Circuits

and Systems (CICS).

6

Contents

1 Introduction 23

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.2 The Challenge for Implementing a 3D Wearable / Portable Ultrasonic

Imaging Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Background Information 29

2.1 Ultrasonic Imaging Modes . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 The Beam-formation Principle . . . . . . . . . . . . . . . . . . . . . . 32

2.3 Ultrasonic Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4 Field II Simulation Program . . . . . . . . . . . . . . . . . . . . . . . 36

3 The Column-Row-Parallel Architecture for 3D Ultrasonic Imaging 39

3.1 The Prior Art of Architectures for 3D Ultrasonic Imaging . . . . . . . 39

3.2 The Motivation of the Column-Row-Parallel ASIC Architecture . . . 42

3.3 The Column-Row-Parallel ASIC Architecture . . . . . . . . . . . . . 44

3.4 The Functionality of the Column-Row-Parallel Architecture . . . . . 49

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4 3D Ultrasonic Imaging System Experiments 55

4.1 The Hardware System Assembly . . . . . . . . . . . . . . . . . . . . . 55

4.1.1 The PCB-CMUT Connection . . . . . . . . . . . . . . . . . . 58

7

4.1.2 The PCB-ASIC Connection . . . . . . . . . . . . . . . . . . . 60

4.1.3 The Flip-Chip Bonding Assembly Process . . . . . . . . . . . 62

4.1.4 Mounting onto the Oil Tank . . . . . . . . . . . . . . . . . . . 66

4.2 Plane-wave Coherent Compounding for Fast Volume Rate 3D Ultra-

sonic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2.1 PWCC for 2D Imaging . . . . . . . . . . . . . . . . . . . . . . 69

4.2.2 Extending PWCC to 3D Imaging on the Column-Row-Parallel

Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2.3 PWCC3D Results: Simulations and Measurements . . . . . . 77

4.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3 Interleaved Checker Board Tx Apertures with I&Q Excitations for HD2

Reduction in Ultrasonic Harmonic Imaging . . . . . . . . . . . . . . . 88

4.3.1 THI Principle and Previous Methods . . . . . . . . . . . . . . 89

4.3.2 Tx HD2 Suppression on the Column-Row-Parallel Architecture 91

4.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 93

4.4 Annular Ring Apertures for Forward-looking Imaging Applications . . 96

4.4.1 Annular Ring Apertures on Column-Row-Parallel Architecture 96

4.4.2 Annular Ring Imaging Results . . . . . . . . . . . . . . . . . . 99

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Design of the 16x16 Ultrasonic Transceiver Array ASIC with Column-

Row-Parallel Architecture 103

5.1 High-Level Description of the Ultrasonic Imaging Transceiver Circuits

and the Architecture Logic Implementation . . . . . . . . . . . . . . . 103

5.2 Tx Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.2.1 Multi-Level Pulsing for Efficient CMUT Driver . . . . . . . . 108

5.2.2 3-Level Pulser Circuit Design . . . . . . . . . . . . . . . . . . 111

5.2.3 Tx Path Design for 2D Ultrasonic Transducer Arrays . . . . . 114

5.3 Rx Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.3.1 LNA Optimization Methodology for CMUT . . . . . . . . . . 116

8

5.3.2 LNA Transistor-Level Implementation . . . . . . . . . . . . . 120

5.3.3 Rx Path Design for 2D Ultrasonic Transducer Arrays . . . . . 122

5.4 Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.5 The Fault-Tolerant ASIC Design for Faulty MEMS Devices . . . . . . 131

6 ASIC Characterization 137

6.1 Tx Ultrasonic Power and Efficiency Measurement . . . . . . . . . . . 137

6.1.1 Measuring Acoustic Output Power . . . . . . . . . . . . . . . 138

6.1.2 Measuring Tx Efficiency . . . . . . . . . . . . . . . . . . . . . 141

6.2 LNA Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.3 The Tx Beam-Steering Experiment . . . . . . . . . . . . . . . . . . . 149

6.4 The Pulse-Echo Experiment . . . . . . . . . . . . . . . . . . . . . . . 151

7 Conclusion 155

7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 155

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

9

List of Figures

2-1 The typical signals and the operation for B-mode ultrasound. . . . . . 30

2-2 Simplified block diagram of a ultrasound BF system, figure courtesy

of [27]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2-3 A typical Field II flow diagram for ultrasonic system behavioral simu-

lation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3-1 Column-parallel architecture implementations in the literature: (a) a

1D transducer array mechanically translated to scan the 3D space, ele-

vation beam-formation is done by a synthetic virtual source technique,

figure courtesy of [3]; (b) a 2D array operated to receive row-by-row,

elevation beam-formation is done by sub-array delay-and-sum across

the column using analog delay lines, figure courtesy of [55]. . . . . . . 41

3-2 The column-row addressing scheme implemented on a 256x256 2D

transducer array: (a) row-by-row transmit addressing; (b) column-

by-column receive addressing; (c) the “Maltese cross” beam-pattern.

Figure courtesy of [38]. . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3-3 A column-row addressing architecture implemented at the circuit-level,

with column and row interconnections that reduce the system channel

count and provide maximum flexibility for algorithms. . . . . . . . . . 44

3-4 Column-Row-Parallel architecture block diagram, the CMUT and ASIC

chips are stacked vertically. . . . . . . . . . . . . . . . . . . . . . . . . 45

11

3-5 (a) The block-level implementation of one transceiver channel and (b)

the per-element logic implementation. Column and row select logic

is implemented with shift registers that can be reprogrammed in “N”

time (implementation detail will be shown in Figure 5-2). . . . . . . . 47

3-6 (a) Tx input port multiplexing, implemented with digital logic; (b) Rx

output port multiplexing, implemented with analog pass-gates. . . . . 49

3-7 The architecture configured in a column-parallel mode for the Tx aper-

ture. The configuration is broken down and illustrated in steps (a)

through (d) to help understanding. Two rows are activated as the Tx

aperture and beam-formation along azimuth (X) direction is achieved. 51

3-8 The architecture configured in a row-parallel mode for the Rx aperture.

Five columns are activated as the Rx aperture and beam-formation

along elevation (Y) direction is achieved. . . . . . . . . . . . . . . . . 52

3-9 More use examples of the proposed architecture: (a) a diagonal Rx

aperture; (b) a checker board Tx aperture for ultrasonic harmonic

imaging; (c) & (d) annular ring Tx and Rx apertures for forward-

looking ultrasonic imaging applications. . . . . . . . . . . . . . . . . . 53

4-1 System integration diagram showing the flip-chip bonding connection

between CMUT and ASIC through a PCB interposer. The figure also

shows the mechanical setup for imaging experiments, including an oil

tank and a 3D translation stage. . . . . . . . . . . . . . . . . . . . . . 56

4-2 The picture of the hardware system setup. . . . . . . . . . . . . . . . 57

4-3 The block diagram of the hardware system setup. . . . . . . . . . . . 57

4-4 The 16x16 CMUT die drawings: (a) the footprint of the CMUT; (b)

the CMUT flip-chip bonding pad metal structure drawing, courtesy

of [40]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

12

4-5 The two different PCB designs made to fit CMUT footprints: (a) the

PCB version A’s footprint for CMUT with a gap distance of 250µm;

(b) the PCB version B’s footprint for CMUT with a gap distance of

373.75µm, only 1x16 pads are made on the PCB side due to space

limitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4-6 The drawing of a PCB pad defined with a solder mask, and bumped

with a solder ball. The PCB pad is used to do flip-chip bonding to the

CMUT die. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4-7 The ASIC die drawings: (a) the footprint of the ASIC, containing the

center 18x16 pads to be element-matched and connected to CMUT

through the PCB interposer, and the surrounding I/O pads; (b) the

PCB interposer layout design that allows the ASIC I/O pads to be

routed out to the PCB edges. . . . . . . . . . . . . . . . . . . . . . . 61

4-8 The ASIC flip-chip bonding pad metal structure drawings: (a) the hor-

izontal view of a flip-chip bonding pad in ASIC; (b) the cross-sectional

view of the ASIC flip-chip bonding pad. . . . . . . . . . . . . . . . . 62

4-9 The CMUT-PCB-ASIC two-step flip-chip bonding process: (a) first

step, the bonding between PCB and ASIC; (b) second step, the bond-

ing between PCB and CMUT, with ASIC already bonded to PCB. . . 63

4-10 The CMUT-ASIC connection result pictures: (a) the bonded PCB-

ASIC assembly shows good connectivity; (b) the solder bumps at the

PCB’s CMUT side is reflowed after PCB-ASIC bonding, any deforma-

tion would be restored. . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4-11 The PCB-CMUT bonding connection is verified by pulling off the test

CMUT die from the PCB after bonding and reflow. (a) & (b) show the

CMUT connection posts remain on the PCB after the pull, indicating

good connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4-12 The finished CMUT-PCB-ASIC assembly: (a) cross-sectional view of

the sandwich stack; (b) CMUT side assembly picture; (c) ASIC side

assembly picture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

13

4-13 The acrylic tank drawings: (a) the tank dimension drawing; (b) the

mounting between the oil tank and the CMUT-PCB-ASIC assembly. 66

4-14 The illustration of how PWCC works for 2D ultrasonic imaging, cour-

tesy of [68]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4-15 The principle of coherent compounding used in PWCC, courtesy of [68]:

(a) the imaging space; (b) the beam-formation delay calculation when

the transmitted plane-wave is normal to the transducer surface (α =

0o); (c) the beam-formation delay calculation when the transmitted

plane-wave is steered to an angle of α. . . . . . . . . . . . . . . . . . 70

4-16 The signal processing flow for PWCC3D on the Column-Row-Parallel

architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4-17 The PWCC3D implementation on the Column-Row-Parallel architec-

ture: (a) Tx beam-steering along azimuth (X) direction using column-

parallel mode; (b) Tx beam-steering along elevation (Y) direction using

row-parallel mode; (c)-(e) Rx signal acquisition, sweeping through 16

rows for each transmit angle. . . . . . . . . . . . . . . . . . . . . . . . 76

4-18 The sequence of operation to implement PWCC3D on the Column-

Row-Parallel architecture. . . . . . . . . . . . . . . . . . . . . . . . . 77

4-19 The setup of the wire phantom imaging experiment using PWCC3D

algorithm: (a) a single plane-wave is transmitted to image the wire

phantom; (b) five different Tx angles are used along the azimuth di-

rection for PWCC3D. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4-20 Simulation results of a wire phantom: (a) vertical cross-sectional im-

age produced from single angle plane-wave insonification; (b) verti-

cal cross-sectional image produced from 5-angle coherent compounded

plane-wave insonification; (c) lateral resolution plot from single plane-

wave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizon-

tal cross-sectional image from single plane-wave; (f) horizontal cross-

sectional image from 5-angle plane-waves. . . . . . . . . . . . . . . . . 80

14

4-21 Measurement results of a wire phantom: (a) vertical cross-sectional

image produced from single angle plane-wave insonification; (b) verti-

cal cross-sectional image produced from 5-angle coherent compounded

plane-wave insonification; (c) lateral resolution plot from single plane-

wave; (d) lateral resolution plot from 5-angle plane-waves; (e) horizon-

tal cross-sectional image from single plane-wave; (f) horizontal cross-

sectional image from 5-angle plane-waves. . . . . . . . . . . . . . . . . 81

4-22 The setup of the ring phantom imaging experiment using PWCC3D

algorithm: (a) a single plane-wave is transmitted to image the phan-

tom; (b) five different Tx angles are used along the azimuth direction

and another five Tx angles along the elevation direction to image the

phantom with PWCC3D. . . . . . . . . . . . . . . . . . . . . . . . . . 82

4-23 Measured horizontal cross-sectional images of a ring phantom: (a)

single-angle Tx plane-wave; (b) 5-angle Tx plane-wave compounding

along azimuth direction; (c) 5-angle Tx plane-wave compounding along

elevation direction; (d) compounding across all 5-angle azimuth and 5-

angle elevation directions. . . . . . . . . . . . . . . . . . . . . . . . . 83

4-24 Measured vertical cross-sectional images of a ring phantom: (a) single-

angle Tx plane-wave; (b) compounding across all 5-angle azimuth and

5-angle elevation directions; (c) lateral resolution plot of ring image

from single-angle Tx plane-wave; (d) lateral resolution plot of ring

image from 5-angle X and 5-angle Y plane-waves. . . . . . . . . . . . 84

4-25 Simulated XZ cross-sectional images showing the three cysts in one

slice image: (a) image generated from single-angle plane-wave; (b)

image generated from 5 azimuth-angle and 5 elevation-angle plane-

waves compounded; (c) the cross-sectional image location in 3D space. 85

4-26 Simulated YZ cross-sectional images showing the cyst at (−3, 0, 25)mm:

(a) image generated from single-angle plane-wave; (b) image generated

from 5 azimuth-angle and 5 elevation-angle plane-waves compounded;

(c) the cross-sectional image location in 3D space. . . . . . . . . . . . 86

15

4-27 Simulated YZ cross-sectional images showing the cyst at (0, 0, 35)mm:




4-28 Simulated YZ cross-sectional images showing the cyst at (3, 0, 45)mm:




4-29 Implementation of checker board Tx aperture on the proposed archi-

tecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4-30 Simulation comparison between the conventional and I&Q methods:

(a) fundamental component spatial intensity for conventional; (b) fun-

damental component spatial intensity for I&Q; (c) HD2 spatial inten-

sity for conventional; (d) HD2 spatial intensity for I&Q. . . . . . . . . 94

4-31 Annular ring mode imaging implemented in Column-Row-Parallel ar-

chitecture: (a) Tx and Rx aperture setup; (b) Tx aperture imple-

mented in the proposed architecture, all active elements are driven

in-phase; (c) Rx aperture with the biggest ring shape, all active el-

ements’ analog outputs are combined; (d) Rx aperture with the 2nd

ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture

with the smallest ring shape. . . . . . . . . . . . . . . . . . . . . . . . 97

4-32 Annular ring mode dynamic beam-formation scheme. . . . . . . . . . 98

4-33 Annular ring configuration example, off-center: (a) Tx and Rx aperture

setup; (b) Tx aperture implemented in the proposed architecture; (c)

Rx aperture with the biggest ring shape; (d) Rx aperture with the 2nd

ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture

with the smallest ring shape. . . . . . . . . . . . . . . . . . . . . . . . 100

16

4-34 Cross-section slices of the wire phantom 3D images from simulation

and measurement: (a) simulated XZ slice; (b) measured XZ slice; (c)

simulated YZ slice; (d) measured YZ slice; (e) simulated XY slice; (f)

measured XY slice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5-1 A re-plot of Figure 3-5 in Section 3.3. (a) The block-level implementa-

tion of one transceiver channel and (b) the per-element logic implemen-

tation. Column and row select logic is implemented with shift registers

that can be reprogrammed in “N” time (implementation detail will be

shown in Figure 5-2). . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5-2 Circuit implementation for the logic control: (a) multiplexing for per-

element enable bits; (b) Tx row / column selection logic; (c) Rx row /

column selection logic. . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5-3 (a) The transmitter load model of a CMUT element used in this work.

(b) An exemplary 2-level square wave pulse applied onto CMUT. (c)

An exemplary 3-level pulse applied onto CMUT. . . . . . . . . . . . . 109

5-4 Circuit schematic of the four-channel 3-level pulsers with the middle-

voltage generation (all transistors are high voltage devices). . . . . . . 111

5-5 The digital control circuits for the pulser: (a) the signal flow and block

diagrams; (b) the non-overlapping signal generator; (c) the level shifter

implementation; (d) the control signal timing diagram. . . . . . . . . 113

5-6 Tx design for the 2D array: (a) 2D pulser schematic; (b) MUX imple-

mentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5-7 Small signal model and noise sources of the CMUT element and the

LNA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5-8 Transfer functions when the LNA optimality condition is reached. . . 118

5-9 Transfer function examples when the LNA optimality condition of fi ≈

fp is not reached: (a) fi < fp, (b) fp < fi. . . . . . . . . . . . . . . . . 118

5-10 Transfer function examples: (a) fi < fp, (b) fi ≈ fp, (c) fi > fp. . . . 120

17

5-11 The LNA schematic, implemented in the TIA topology. All transistors

are low voltage devices except the HV Rx Switch M10. . . . . . . . . 121

5-12 Design optimization for input stage transistors: (a) transistors are sized

at the boundary of strong and weak inversion; (b) transistor width is

optimized for the lowest noise figure. . . . . . . . . . . . . . . . . . . 122

5-13 The signal and noise combining with two Rx channels in parallel: (a)

two channels on the same line, shown in Thevenin’s equivalent circuit

at LNA outputs; (b) two channels on the same line, shown in Norton’s

equivalent circuit at LNA outputs (c) two channels combined, showing

the resultant signal and noise amplitudes. . . . . . . . . . . . . . . . . 124

5-14 The LNA schematic, implemented in the TIA topology. All transistors

are low voltage devices except the HV Rx Switch M10. “vip” node is

also buffered with a source follower to output (not shown). . . . . . . 127

5-15 Parallelism with even more Rx channels by utilizing intermediate line

buffers to preserve the circuit performance. . . . . . . . . . . . . . . . 129

5-16 The biasing circuit for the 2D array. . . . . . . . . . . . . . . . . . . . 130

5-17 The technique used for detecting and isolating the short CMUT el-

ements: (a) front-end transistors in each channel and their control

voltages; (b) the effective circuit connection of all 256 channels with

CMUT elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5-18 Two successful 16x16 CMUT-ASIC assemblies with short CMUT ele-

ments (marked in red) isolated by the ASIC. The rest of the elements

are functional and their sensitivity performance is expressed by the

brightness of the elements, which will be described in detail in Section

6.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6-1 The photo of the lab setup for measuring the acoustic output power

and the Tx efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6-2 Acoustic output power and Tx efficiency measurement setup. . . . . . 140

18

6-3 Normalized RMS pressure along the transducer axial axis, measure-

ment vs. simulation. The measurement deviates from the simulation

in the near field because the hydrophone tip is too close to the trans-

ducer surface, distorting the pressure field. . . . . . . . . . . . . . . . 140

6-4 (a) Tx efficiency measurement setup and pulse shape definition. (b)

Measured time-domain waveform of the optimal 3-level 3.3MHz pulses,

∆=20ns, ∆/T=0.067 . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6-5 Tx efficiency measurement results using different 3-level pulse shapes

by varying the ∆/T ratio and at different frequencies. . . . . . . . . . 143

6-6 The die photo of the four-channel ultrasonic imaging transceiver test

chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6-7 The die photo of the 256-channel 16x16 2D ultrasonic imaging transceiver

test chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6-8 (a) Measured ultrasonic lateral beam profile, steered to the center

(broadside). (b) Measured beam profile, with 30ns delay between chan-

nels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6-9 The setup of the pulse-echo experiment for characterizing the complete

ultrasound channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6-10 The key waveforms from the pulse-echo experiment, showing the ul-

trasound channel characteristics. (a) The transmitted pulse waveform.

(b) The received echo waveform. (c) The spectrum of the received echo

waveform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

6-11 A re-plot of Figure 5-18 in Section 5.5. Two successful 16x16 CMUT-

ASIC assemblies with short CMUT elements (marked in red) isolated

by the ASIC. The rest of the elements are functional and their sensi-

tivity performance is expressed by the brightness of the elements. . . 153

7-1 Four 16x16 ASICs tiled together for a 32x32 imaging front-end. . . . 159

19

7-2 CMUT-ASIC assembly alternatives to eliminate the interposer PCB:

(a) TSV technology for interconnecting ASIC I/Os to the main test-

ing PCB; (b) Applying flip-chip bonding technology for CMUT-ASIC

interconnection and wire-bonding for ASIC I/Os. . . . . . . . . . . . 159

20

List of Tables

4.1 Simulated HD2 improvement of the I&Q method. . . . . . . . . . . . 95

4.2 Measured HD2 improvement of the I&Q method. . . . . . . . . . . . 95

5.1 SNR improvement from Rx channel parallelism, theory prediction and

measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.1 Measured Power and Efficiency Comparison at 3.3MHz for the 1D

ASIC and CMUT (40pF capacitance per element) . . . . . . . . . . . 143

6.2 Measured Optimal 3-level Pulser Performance Summary for the 1D

ASIC and CMUT (40pF capacitance per element) . . . . . . . . . . . 144

6.3 Measured Optimal 3-level Pulser Performance Summary for the 2D

ASIC and CMUT (2pF capacitance per element) . . . . . . . . . . . . 144

6.4 CMUT Pulser Performance Comparison . . . . . . . . . . . . . . . . 145

6.5 Measured LNA Performance Summary for the 1D ASIC [5] . . . . . . 145

6.6 Measured LNA Performance Summary for the 2D ASIC . . . . . . . . 146

6.7 CMUT LNA Performance Comparison . . . . . . . . . . . . . . . . . 147

21

Chapter 1

Introduction

1.1 Motivation

Ultrasonic imaging is an important modality for medical diagnosis. Compared to

other imaging modalities, ultrasound is relatively low cost, harmless to human health,

and has decent resolution. Modern ultrasonic imaging systems are becoming increas-

ingly complex and powerful, yet compact, benefiting from Moore’s law [1]. Laptop-size

ultrasound systems have gained comparable performance to the traditional cart-size

machines; hand-held devices, such as the GE Vscan [2], indicates the trend toward

highly integrated ultrasonic imaging solutions to enable portable or even wearable

ultrasound applications in hospital and at home.

Traditional 2D medical ultrasonic imaging systems have been in wide use for

decades. A 2D imaging system uses a 1D ultrasonic transducer probe and gener-

ates rectangular or sector-shape 2D cross-sectional images of human tissue or organs.

These systems exist predominantly in hospital settings where professional sonogra-

phers are available to operate the system. They would carefully angle and position the

probe against the human body, so as to produce satisfactory 2D medical images for

diagnosis. This process is manual and requires extensive training for the operators,

adding complexity and extra cost to the diagnostic procedure.

On the other hand, 3D medical ultrasonic imaging systems provide a full view

of human tissue or organs in space, rather than cross-sectional views in 2D imaging

23

systems. The 3D volumetric image data represent a more comprehensive set of data

which could be more easily interpreted to help locate target of medical interest. As a

result, the manual search of the “best” 2D slice image performed by the sonographers

holding a 1D probe is possible to be substituted with an automated search algorithm

in a 3D imaging system. Furthermore, by leveraging advanced microelectronics tech-

nology, a compact and low-power ultrasonic hardware system can be built to enable

wearable / portable self-monitoring ultrasonic imaging devices at home. Therefore,

one could imagine an automated imaging system that continuously tracks human tis-

sue or organs of interest and produces long-term medical information with minimum

reliance on experienced sonographers.

1.2 The Challenge for Implementing a 3D Wear-

able / Portable Ultrasonic Imaging Device

A typical 1D array for a 2D imaging system has an element count of as high as

one thousand. The interconnection from the transducer elements to the interfacing

electronics are co-axial cables. When it comes to 3D imaging systems, 1D ultrasonic

transducer arrays had been used historically to acquire the 3D volumetric data, by

being mechanically translated [3] or rotated [4] to cover the whole 3D space. A slice

of 2D image is formed at each physical position of the 1D array. Multiple 2D slice

images are stitched together to form the 3D volumetric image. These mechanical

approaches have many disadvantages. For example, the image resolution tends to be

poor due to the relatively large incremental step size of the mechanical movement;

the image frame rate or volume rate could be limited by the mechanical movement

speed; the system integration tends to be bulky and system power consumption is

high because a mechanical motor is needed.

More recently, 2D ultrasonic transducer arrays made from a micromachining pro-

cess have become more available and proven to be more suitable for 3D ultrasonic

imaging. As a result, the mechanical movement is replaced by electrical addressing;

24

the coarse motor stepping is replaced by the much finer element-to-element spacing;

the image frame rate or volume rate is no longer limited by the speed of mechanical

movement; and system size and power are reduced to allow long-term wearable /

portable hardware solutions.

However, an electronic system working with a 2D array is much harder to be

built. Most notably, the interconnection between a 2D transducer array and its sup-

porting electronics is a bottleneck. Because a NxN 2D transducer array contains N2

transducer elements, if a dedicated electronic channel is provided for each transducer

element to control the transmit and receive operation, the active channel count of

the electronic integrated circuits is also N2. Therefore, as the transducer array size

grows, it is very difficult to keep up with the N2 growth of active channels. The

hardware complexity, instantaneous power dissipation, and interconnect count would

quickly become unmanageable.

1.3 Contribution

To overcome the interconnect problem in interfacing to a 2D ultrasonic transducer

array for 3D ultrasonic imaging, this thesis proposes new solutions at the circuit,

architecture and algorithm levels.

At the circuit-level, the analog front-end (AFE) transmitter (Tx) and receiver (Rx)

circuits need to be optimized for power efficiency, performance and size, in order to

work optimally with the ultrasonic transducer elements [5, 6]. For the transmitter, a

3-level pulse-shaping high voltage pulser is designed to drive the transducer elements

with improved power efficiency and minimum off-chip components. For the receiver,

a low-noise amplifier (LNA) is implemented with a transimpedance amplifier (TIA)

topology to achieve excellent noise, power and bandwidth trade-offs, offering a low

power, high efficiency receiver solution. The transceiver front-end circuit is designed

to be element-matched to the transducer, replacing traditional cable connections with

flip-chip bonding assembly between the 2D transducer die and the 2D electronics

ASIC die. The compact, cable-less assembly avoids excessive parasitic capacitance

25

from the cable and leads to an integrated, low-power solution for wearable / portable

applications.

At the architecture-level, the addressing and control mechanism for the 2D array

of elements needs to be designed carefully to not only reduce hardware and inter-

connect complexity, but also to maintain enough support for software flexibility. A

Column-Row-Parallel architecture is proposed to reduce the AFE interconnect re-

quirement from N2 to N . At the same time, the highly programmable architecture

design guarantees strong support for system-level algorithm needs. It is compatible

to existing widely used beam-formation algorithms, and provides possibilities of using

the 2D array differently for new applications.

At the algorithm-level, beam-formation algorithms are also indispensable to com-

press and generate beamformed ultrasonic data to form the 3D volumetric images.

The algorithm design is tightly connected with architecture design and we propose

new ways of using the 2D array to achieve fast volume rate imaging with adequate

image quality, as well as a new way of reducing transmitter second harmonic distor-

tion (HD2). Extensive in-vitro experiments have been carried out to validate and

evaluate the beam-formation algorithms and hardware system performance, includ-

ing various 3D imaging algorithms, ultrasonic harmonic imaging mode, Tx efficiency

characterization, and pulse-echo characterization [5–7].

1.4 Thesis Organization

This thesis is organized into the following chapters:

Chapter 2 introduces the needed background information for the discussion of 3D

ultrasonic imaging systems in this thesis. This includes a brief description of various

ultrasonic imaging modes, the beam-formation principle, and the transducer types.

Chapter 3 first lists previous solutions to 3D ultrasonic imaging. A different ar-

chitecture that offers better system trade-offs is motivated. The overview of the

proposed Column-Row-Parallel architecture is then described, which shows the po-

tential to reduce hardware interconnection complexity while maintaining software

26

flexibility. Several examples of operation illustrate the architecture functionality to

perform column-parallel addressing, row-parallel addressing, or special patterns.

Chapter 4 presents ultrasonic imaging applications that show what the Column-

Row-Parallel architecture is capable of, without going into circuit details yet. It starts

with the hardware system assembly description. The CMUT-PCB-ASIC flip-chip

bonding assembly process is discussed in detail and the whole electrical + mechanical

test setup is shown. Three Column-Row-Parallel application examples are given af-

terwards. 3D Plane-wave coherent compounding (PWCC3D) algorithm is proposed

and demonstrated as a fast volume rate, high quality 3D imaging solution. Annular

ring aperture mode is presented for forward-looking intravascular ultrasound (IVUS)

and intracardiac echocardiography (ICE) applications. And a checker board pattern

is used for second harmonic suppression for ultrasonic harmonic imaging mode.

Chapter 5 provides circuit design details for a 16x16 Column-Row-Parallel test

chip working with a 16x16 CMUT. The implementation of architecture control logic,

transmitter, receiver, and biasing circuits are described. The transmitter and re-

ceiver circuit design reflects the optimization considerations for the specific target

transducer, in which the sensory interface for capacitive source / load is used. On the

other hand, the control logic and the biasing circuits reflect the architecture imple-

mentation, which is general to different transducer types. The last section explains

the fault-tolerance against transducer defects incorporated by the transceiver circuit

implementation, which is critical for front-end electronics working with MEMS de-

vices with large element count.

Chapter 6 shows various circuit characterizations, which are complementary to

the system experiments described in Chapter 4. The transmitter and the receiver are

characterized as individual blocks; their circuit performance is summarized. Several

acoustic / electrical characterizations are also carried out, including the Tx beam-

steering demonstration, and pulse-echo experiment.

Finally, Chapter 7 concludes the work with a summary of contributions and lists

directions for future work.

27

Chapter 2

Background Information

This chapter provides the needed background information about ultrasonics, in prepa-

ration for the discussion of 3D ultrasonic imaging systems.

2.1 Ultrasonic Imaging Modes

Ultrasonic imaging systems are generally active imaging systems. The system stim-

ulates the transducers to transmit ultrasonic waves into the medium (human body);

the reflected ultrasonic echoes are then received and processed to generate images,

which visualize the medium [8–10] or provide flow information through Doppler pro-

cessing [11–15].

Medical ultrasound systems use different “imaging modes” to assist various diag-

noses [8,9]. For visualization of the tissue anatomy, the most common imaging modes

include A, B, C and M modes [9,10]. The B-mode is the most common mode and its

typical operation is shown in Figure 2-1. The imaging system uses a 1D transducer

array and pulsed ultrasonic waves to probe the tissue medium, in order to acquire

a 2D grayscale image of the tissue. At time 0, the transmitter circuit drives the

transducer to emit the ultrasonic pulse as shown by the red pulse. The pulse travels

through the tissue at the sound speed c, typically 1540m/s in human soft tissue [16].

When it hits some medium interfaces, the mechanical impedance mismatch at each

interface generates reflected ultrasonic waves. An interface at depth Z leads to a

29

Time

0 Ttd T+td

Z

A Medium Interface

(Mechanical Impedance Mismatch)

Tp

Z

(t=0) (t=td)

Z

The B-mode Image

Figure 2-1: The typical signals and the operation for B-mode ultrasound.

received ultrasonic echo at time td = 2Z/c, as shown by the blue pulse. Because

the echo amplitude is proportional to how large the mechanical impedance mismatch

is, the amplitude information is translated to the grayscale intensity of pixels in the

image. Meanwhile, the time delay from the received echo to the transmit instance (td)

translates to the depth, indicating the interface location in the image. A simplified

grayscale image is also shown in the figure.

The transmit-receive action is repeated after time T , such that the B-mode image

can be continuously updated in time. The period T is called the pulse repetition

period (PRP), and it needs to be long enough to ensure that all ultrasonic echoes

from the previous transmission are back. Given that the ultrasonic wave travels at the

sound speed of about 1540m/s and the typical image depth of 7.5cm, one transmit-

receive repetition will take approximately 100µs (2 × 7.5cm ÷ 1540m/s = 97µs).

The reciprocal of PRP is called the pulse repetition frequency (PRF), which is the

number of pulses per second. It is a term frequently used in active imaging systems

such as the ultrasound, sonar or radar systems. A typical PRF in ultrasound is 10kHz

corresponding to the 100µs PRP. Depending on applications, commonly used PRFs

can be from 5 to 20kHz.

30

The red transmit pulse shown in Figure 2-1 is composed of 2 bursts of sinusoids

with a cycle period of Tp. While it shows a typical case, the sinusoidal pulse shape

can be replaced by other pulse shapes, such as discrete level pulses, which will be

discussed in this thesis. The number of bursts in one transmission can also be variable

depending on applications. Generally speaking, more bursts lead to stronger reflected

echoes, while less bursts lead to better image axial resolution because of the shorter

pulse duration. B-mode imaging commonly employs 2-5 bursts per transmission; and

PW Doppler imaging (see next paragraph) employs as many as 20 bursts to improve

signal strength in the received echoes.

Besides direct visualization of tissue anatomy, the Doppler effect is used to ob-

tain blood flow velocity information inside human body [17]. There are mainly three

Doppler modes: Continuous Wave (CW), Pulsed Wave (PW) and Color Flow Mode

(CFM) Doppler [11–15]. The CW Doppler is the earliest mode, which transmits con-

tinuous ultrasonic waves into human body and detects Doppler frequency shift from

the echo waves [13]. It is simple and reliable, but lacks range information. The PW

Doppler improves upon the CW mode by repeatedly sending pulsed ultrasonic waves

into the medium [14]. The time of flight of the received echoes contains the range

information, and the slight timing difference between consecutive echo pulses reflects

the object movement1. Sub-sampling at the PRF is usually carried out before the

spectrum analysis for the PW Doppler frequency shift [11]. The CFM Doppler is used

to present velocity information as a color-coded image, which is often overlaid on top

of a B-mode image. Time-domain autocorrelation based signal processing techniques

are often used to speed up the CFM processing [15]. The velocity estimation accuracy

is good enough for color-coded visualization.

Many more imaging modes exist. For example, the Harmonic Imaging mode uses

the second harmonic of the pulse to provide high resolution images [18–22]; the Power

Mode Doppler visualizes the magnitude of Doppler signal, rather than the frequency

1It is important to point out that in the PW mode, the Doppler effect does not come from thefrequency shift of a single received echo pulse, since a short pulse is broadband, and therefore itis difficult to detect the small Doppler frequency shift (typically less than 100KHz). Besides, thefrequency-dependent attenuation through the tissue complicates the task even more. Instead, it isthe velocity-dependent time delay across several pulses, that carries the velocity information.

31

Figure 2-2: Simplified block diagram of a ultrasound BF system, figure courtesyof [27].

shift, to help identify the existence of low flows and velocities [23]. Furthermore,

many imaging modes are used together as Duplex or Triplex modes for the best

visualization [24,25].

2.2 The Beam-formation Principle

Beam-formation (BF) is heavily involved in ultrasonic imaging, to increase the signal-

to-noise ratio (SNR), to focus the ultrasound beam to deliver more power, and to steer

the beam to scan the imaging space [8,9,12,26,27]. The beamforming algorithms are

based on the delay-and-sum principle, which is shown in Figure 2-2. When a focus is

specified, delays are calculated for each ultrasound channel, so that the pulses from

different channels travel the same distance between the corresponding transducer

elements and the focus.

The implementation of beam-formation can be either analog or digital, and the

beam-formation can be achieved at both the transmitting and receiving paths. Be-

cause of the denser integration, higher flexibility, and lower power consumption, dig-

ital beamforming is favored in modern systems.

Ultrasonic imaging systems are often operating at both the near field (or Fresnel

zone) and the far field (or Fraunhofer zone) regions [28–30]. For a round-shape, non-

focused, single element transducer, the boundary between the near field and the far

32

field regions is usually defined at2:

L =D2

4 · λ, (2.1)

in which the D is the diameter of the transducer surface and the λ is the ultrasound

wavelength.

In the near field, the pressure amplitude varies drastically, with many local max-

imums and minimums. This complex characteristic is caused by the constructive

and destructive interference wave patterns of ultrasound beam. In the far field, the

pressure amplitude decreases monotonically with distance and the ultrasound beam

diverges at the angle θ defined as: sin (θ) = 1.22 λD.

At the boundary of the near and far field, where the distance is roughly given

by Equation (2.1), the maximum pressure amplitude, or equivalently the maximum

ultrasound intensity, is reached; and the beamwidth is minimized at the same time.

According to [28–30], the effective beamwidth is approximately equal to half of the

transducer diameter D; the pressure amplitude is therefore about 2 times of the

pressure amplitude at the transducer surface.

Because of this unique property, it is advantageous for ultrasonic imaging to op-

erate close to the near and far field interface, for best SNR and lateral resolution. As

a simple numerical example, a typical single element transducer for an intracranial

pressure (ICP) measurement has a diameter of about 1.5cm [32,33]. The typical op-

erating frequency is 2MHz and the typical ultrasound speed in human soft tissue is

1540m/s [16], giving a wavelength of 0.77mm. The interface distance calculated from

Equation (2.1) is therefore 7.3cm, which is about the same distance from the target

brain blood vessel to the transducer3.

Because the system operates heavily in near field region, time-domain techniques

for beamforming and processing are common in ultrasonic imaging. Consequently,

2Depending on applications, there are many different definitions [31]. The one used in this articleis most widely used in medical ultrasound area.

3For transducers with more complex shapes and structures, the equations presented above willbe slightly different by some factors. But the effective aperture size D can be used to approximatethe element diameter, and the conclusions about near field and far field more or less stay the same.

33

the ultrasound pulses are short-duration, wideband signals to facilitate time based

algorithms.

In additional to the basic delay-and-sum beam-formation principle, several tech-

niques are often used to improve the visualization, creating a more homogeneous

image quality throughout the full depth [8,9,12]. They have been applied to imaging

experiments of our work.

• Dynamic focusing: Instead of a fixed array delay pattern for a fixed focal

point in the space, the dynamic focusing technique implements a continuously

moving focal point across different imaging depth. The array elements are con-

trolled to focus signals at a shallow depth at the beginning; as time progresses

(corresponding to depth increase), the array delay pattern is gradually modified

to move the focal point into deeper depth until the end of the imaging depth.

Compared to a single focal point, dynamic focusing generates high detail res-

olution and high contrast resolution for all depths. It can be relatively easily

implemented by a digital beamformer at the receive side.

• Constant F-number imaging: F-number (F#) is the ratio of focal length

(f) to the imaging aperture diameter (D), as in (2.2).

F# =f

D. (2.2)

It is an important concept in optics, photography, and ultrasound. In ultra-

sound, the constant F-number imaging technique keeps a constant F# by grad-

ually enlarging the active aperture (D) as the focused imaging depth (f) grows

larger. The result of this technique is a constant lateral resolution and it is

often used in conjunction with the dynamic focusing technique.

2.3 Ultrasonic Transducers

Currently, 1D ultrasonic transducer arrays for 2D medical ultrasound images is the

common practice [8, 12, 34–36]. The transducer arrays are usually built with piezo-

34

electric materials. Element count of an array can be as high as one thousand. The

interconnection to the electronics are co-axial cables.

3D ultrasonic imaging can be achieved by translating or rotating a 1D transducer

array over the space [3, 4], but the accuracy and speed is limited by the mechanical

movements. As a result, 2D transducer arrays and the supporting 2D electronics are

more desirable for 3D ultrasonic imaging. There are commercial 3D imaging systems

utilizing 2D transducer arrays. For example, Philips Matrix X6-1 is a 2D array that

contains 9,212 elements [37]. However, cables are still needed for the interconnections

between the transducer probe and the data acquisition system, which might not be

the best solution for 3D imaging, due to the high channel count. Additionally, the 2D

transducers have been built from piezoelectric materials [37,38], where manual dicing

is often needed to separate individual array elements. The interconnection and yield

problems are challenging as the array gets larger and the element size gets smaller.

The capacitive micromachined ultrasonic transducer (CMUT) [39–41] is an alter-

native to the traditional piezoelectric transducers (PZTs). The CMUT technology of-

fers advantages such as improved bandwidth, ease of fabricating large arrays, and po-

tential for integration with electronics with the through-silicon vias (TSVs) [40,42,43]

or monolithic CMUT-CMOS integration [44–46].

But there are also challenges for CMUT. Most importantly, the output power and

efficiency are still relatively low, partly due to the large parasitic device capacitance.

The primary reason for the large parasitic capacitance is the physical structure of

the CMUT element, which forms a parallel-plate capacitor [41]. As a result, the

transmitter and receiver circuitry that interfaces to CMUT is different from that

for PZT. They need to be designed appropriately to prevent excessive performance

degradation caused by the load that is much more capacitive and higher impedance.

The piezoelectric micromachined ultrasonic transducers (PMUTs) also emerge as

another possible 2D transducer solution for 3D imaging [47–51]. It combines the

piezoelectric material with micromachining techniques, trying to exploit the benefits

from both worlds. The piezoelectric material tends to provide transduction with

relatively high efficiency and good linearity, while the micromachining process helps

35

create fine-pitched 2D arrays with higher yield and reliability. As a technology in its

early research phase, it has shown initial success of a 5x5 working array [47]. More

works are being done to address problems with this technology, including how to

enhance the device bandwidth to generate images with better axial resolution; and

how to reduce the intrinsic device parasitic capacitance from the high permittivity of

the piezoelectric material [48,49,51].

In this thesis, we design block-level circuits for CMUT, but our architecture and

system innovations are not limited to a particular transducer type, as will be discussed

in succeeding chapters.

2.4 Field II Simulation Program

In our work, we make heavy use of the Field II Simulation Program [52,53] to model

the complete hardware and software setup. Field II is a behavioral simulation package

running under MATLAB (The MathWorks, Natick, MA) Environment. Figure 2-3

shows a typical Field II simulation flow diagram. The users have the freedom of

defining the ultrasonic phantom (i.e. the medium being imaged by the system),

transducer property, pulsing / receiving methods, beam-formation algorithms, and

image processing / display methods. Based on the user definition, Field II simulates

the ultrasound transducer fields and ultrasonic imaging using linear acoustics.

The phantom definition is realized by specifying point scatterers in space with

different reflecting amplitudes. It can be a simple single scatterer phantom that

characterizes the point spread function of an imaging system; or complex shapes

defined by a set of scatterers. Moving structures can also be instantiated by a sequence

of phantoms with slight position changes over time, which is useful in simulations for

ultrasonic Doppler systems.

The transducers are defined with the type, frequency response and active aperture.

The transducer types include 1D, 1.5D, 2D arrays, as well as curved arrays with

concave or convex shapes. The transducer element dimensions can be freely specified

and the element frequency response is described by its impulse response. Transmit

36

Figure 2-3: A typical Field II flow diagram for ultrasonic system behavioral simula-tion.

and receive apertures are defined separately, while the active elements are selectable

within the array. Two other properties associated with the active apertures are the

focus and apodization. Through the focus specification, the beam-formation delays

can be automatically calculated for each element in an aperture. The apodization

gives amplitude weights for signals at different transducer elements. Both focus and

apodization can be a function of time, in which dynamic focusing / apodization is

realized.

The pulsing excitation for the transducer is supplied to the array by a time-domain

pulse waveform. Based on the pulsation, phantom definition and transducer property,

the received echo waveforms from every element in the Rx aperture are produced by

the Field II simulator. Beam-formation is performed on the collected echo waveforms;

and the beamformed waveforms can then be used to construct a 2D or 3D image, or

further processed for Doppler information.

With the ultrasonic field simulation, Field II helps verify the acoustical physics

and visually show the ultrasonic pressure field generated by the transducer. With

37

the capability of incorporating different beam-formation algorithms, it allows the

development and validation of new architecture-level and system-level ideas. It could

also be used to model non-ideality from circuits and transducers, so that a practical

understanding of the real imaging system can be achieved. As will be seen in the

following chapters, Field II simulation plays an important role in the thesis work.

38

Chapter 3

The Column-Row-Parallel

Architecture for 3D Ultrasonic

Imaging

This chapter describes our approach to solve the challenges in realizing a 3D medical

ultrasonic imaging system. The analog front-end architectural trade-offs are first dis-

cussed and the design process of the Column-Row-Parallel architecture is presented.

The implementation of the proposed architecture is then shown, which is both scalable

for hardware realization and flexible for software algorithm support. The functionality

of the implemented architecture is then described.

3.1 The Prior Art of Architectures for 3D Ultra-

sonic Imaging

A 2D NxN transducer array is often used to acquire 3D volumetric data, where the

architecture of the front-end circuit interfacing to the transducer array is an important

design consideration.

The most straightforward way to interconnect to a 2D transducer array is to use

a fully-parallel architecture, but it is not very scalable for hardware implementation.

39

A fully-parallel architecture requires N2 active transceivers that are operating at

the same time. As a result, it requires N2 independent input control lines for the

transmitter array and N2 output data lines for the receiver array. As the array size

grows bigger, the required channel count will be correspondingly larger and this is

difficult to scale up economically.

On the other extreme, a serialized system could be used to save channel count,

but it is usually too slow for data acquisition. One could serialize the input control

lines and/or the output data lines of the aforementioned fully-parallel system, so

that the number of interconnect lines needed is reduced. Due to the large number

of channels to be serialized, the data rate requirement would become too high to be

practical, following a similar N2 scaling trend. Alternatively, one could use a single-

channel transceiver to sweep the 2D array, one element at a time. The transceiver is

connected to each element by multiplexing and it repeatedly transmits and receives

ultrasound with different elements in the array to acquire a full data set [40]. Given

that one transmit-receive repetition could take as long as 100µs (Section 2.1), and

that the total time consumed to gather one full data set increases with N2 trend, the

image frame rate would greatly suffer as the array size continues to grow bigger.

Therefore, to alleviate the conflict between hardware complexity and data acqui-

sition speed in 3D ultrasonic imaging systems, there is a lot of research on various

sub-array architectures that lie in between the fully-parallel architecture and the se-

rialized single-channel architecture. In [43], the diagonal elements in a full 2D array

are used to form the receive aperture, while the rest of the 2D elements are used

to form the transmit aperture. At the transmitter side, it is close to a fully-parallel

architecture because almost all elements are being used. To provide the transmit

beam-formation delay pattern for all transmitters, the digital delay values are seri-

ally streamed in to program each transmitter. It saves the interconnection but slows

down the programming speed. At the receive side, the output channel count is re-

duced to N from N2 because only the diagonal sub-array elements are used. This

diagonal sub-array approach leads to an elevated side-lobe level that degrades the

image contrast. Similarly, [54] investigated possibilities of various sparsely sampled

40

Figure 3-1: Column-parallel architecture implementations in the literature: (a) a1D transducer array mechanically translated to scan the 3D space, elevation beam-formation is done by a synthetic virtual source technique, figure courtesy of [3]; (b)a 2D array operated to receive row-by-row, elevation beam-formation is done bysub-array delay-and-sum across the column using analog delay lines, figure courtesyof [55].

2D aperture patterns. But because the sub-array is fixed once the pattern is chosen,

the reduction of active elements generally leads to higher side-lobes and worse image

resolution performance.

To avoid a fixed sub-array pattern selection, another sub-array idea of using either

3x3 or 5x5 elements is described in [37]. The sub-arrays are programmable and each

sub-array performs beam-formation to compress the received data into one channel,

reducing the overall channel count by a factor of 9 or 25. To maintain the image

quality and avoid introducing artifacts, programmable delay patterns for the sub-

array are required. This requirement directly translates into analog delay lines in a

hardware implementation, which tends to be bulky and power hungry.

In [3, 4], a conventional 1D transducer array is used as a sub-array and is me-

chanically translated or rotated to achieve synthetic 3D imaging, as shown in Figure

3-1(a). The active channel count is reduced to N and the synthetic beam-formation

technique could produce good image quality, as long as the object being imaged is

static or moving at a much slower speed than the image frame rate, to avoid mo-

41

tion artifact. The major drawback in this solution is the mechanical implementation,

which is both a bottleneck for frame rate due to the slow movement speed, and a bot-

tleneck for power saving due to the large amount of power needed to drive a motor.

More recently, to replace the mechanical translation, an electrical scanning front-end

architecture is implemented as shown in Figure 3-1(b) [55–57]. The receiver channels

are turned on row-by-row to collect reflected ultrasound echoes. By activating differ-

ent rows of transducer elements over consecutive ultrasound transmits, it effectively

mimics the translation of a 1D transducer array, but much faster and lower power.

3.2 The Motivation of the Column-Row-Parallel

ASIC Architecture

The work in [3] and [56,57] both employ row-by-row (i.e. column-parallel) operation

to reduce number of active channels from N2 to N . The 3D image quality from

the column-parallel architecture is very good in the azimuth (X) direction because

each row can perform full beam-formation along the azimuth direction. However,

the beam-formation along the elevation (Y) direction is poor. Techniques such as

synthetic virtual source [3] are used to enhance the focusing in elevation with limited

success in Figure 3-1(a). Analog delay lines are also attempted to realize elevational

beam-focusing to achieve good imaging performance in Figure 3-1(b) [55]. But for the

same reason mentioned in the previous section, the analog delay lines lead to large

power and silicon area overhead, making system integration difficult.

To cover both azimuth and elevation directions for 3D volumetric imaging, a

column-row addressing scheme has been implemented for a 2D transducer design as

shown in Figure 3-2 [38, 58–60]. By dicing the transducer top plate row-by-row and

dicing the bottom plate column-by-column, the transducer can be driven row-by-row

in transmit (Figure 3-2(a)) and column-by-column in receive (Figure 3-2(b)). The

combined “Maltese cross” shaped beam-pattern (Figure 3-2(c)) makes it suitable to

carry out beam-formation both in azimuth and elevation directions. At the same

42

Figure 3-2: The column-row addressing scheme implemented on a 256x256 2D trans-ducer array: (a) row-by-row transmit addressing; (b) column-by-column receive ad-dressing; (c) the “Maltese cross” beam-pattern. Figure courtesy of [38].

time, the interconnection complexity for the array is still kept at a linear growth

(2*N).

The column-row addressing implemented on the transducer-level has shown po-

tential to be a balanced architecture solution for both good image performance and

hardware scalability. However it still suffers from a lack of flexibility, because the

transducer array is hard-wired to be divided into rows and columns. The limitation

of only addressing the elements by one row or one column at a time provides limited

freedom for the supporting algorithm design. On the other hand, if one could im-

plement a similar column-row addressing architecture at the circuit-level instead of

at the transducer-level, as depicted in Figure 3-3, the element addressing mechanism

could be much more flexible. With the highly programmable control support from the

electronics, various sub-array patterns could be possible on the same system, allowing

more versatile functionality and more design freedom at the system-level.

43

Figure 3-3: A column-row addressing architecture implemented at the circuit-level,with column and row interconnections that reduce the system channel count andprovide maximum flexibility for algorithms.

3.3 The Column-Row-Parallel ASIC Architecture

In our work, a Column-Row-Parallel architecture is implemented at the circuit-level

with much more diverse functionality and a better trade-off between complexity and

speed. Figure 3-3 in the previous section is a conceptual drawing of the proposed

architecture, while Figure 3-4 shows a detailed picture. 2D CMUTs are chosen as

the target transducer arrays for this work, because of its ease of integration and

scalability [39,40,43]. But the same architecture design can be applied to other types

of 2D ultrasonic transducers easily.

As shown in Figure 3-4, a 2D CMUT (16x16 transducer arrays are used in this

work) is DC biased at 30-50V from the common top membrane and each CMUT

element’s bottom pad is connected to its corresponding ASIC channel. The DC bias

network is provided off-chip with the resistor and the capacitor being shared across

all CMUT elements in the array [40,41]. As indicated by both Figure 3-3 and Figure

3-4, there is a transmitter (Tx) pulser, a receiver (Rx) low noise amplifier (LNA),

and a receiver high voltage (HV) protection switch per electronic channel, under each

44

Shared ExternalBiasing

CMUT

ASIC

Gate Dr

Column Select Logic

Gate DrDelay

Gate DrDelay

BUF

Delay

Gate Dr

Delay

Column Circuitry

BUF

BUFBUF

Rx

Rx Rx

Rx

Figure 3-4: Column-Row-Parallel architecture block diagram, the CMUT and ASICchips are stacked vertically.

45

CMUT element. The total silicon layout area of a transceiver is designed to be the

same as a CMUT element’s area, which is 250µm× 250µm in this work, so that the

ASIC channels can be element-matched to the CMUT pitch. The Tx pulser gate

drivers and Rx buffer amplifiers are placed at the ASIC perimeter to interface to the

transceiver array. There are 16 copies of Tx drivers and Rx buffers at the column

side and another 16 copies at the row side, reducing the ASIC I/Os down to “N”1.

Zooming into one transceiver channel located at ith column and the jth row, Figure

3-5 shows that Tx and Rx operations are independent and time-multiplexed. The

control inputs of the transceiver channel include: the ith column select signals (Tc[i],

Rc[i]) supplied from the column side, the jth row select signals (Tr[j], Rr[j]) from the

row side, and the local per-element enable bits (T en, R en). The column and row

select signals are designed to be only active at one side, they cannot be asserted at

the same time. The signals are input to the per-element logic unit, shown in Figure

3-5(b), to generate corresponding internal switch controls including: Tr, Tc, Rr, Rc,

and RxSw.

Tr and Tc determine whether the Tx pulser is driven by the column side or the

row side, or none, in which case the pulser is turned off. When the Tx element [i, j]

is enabled (T en = 1) and the jth Tx row is selected (Tr[j] = 1), the internal switch

control signal Tr becomes high and the Tx pulser gate drive signals are supplied

from the Column Gate Driver[i]. The array’s Tx path is in column-parallel mode.

When the Tx element [i, j] is enabled (T en = 1) and the ith Tx column is selected

(Tc[i] = 1), the internal switch control signal Tc becomes high and the Tx pulser

gate drive signals are supplied from the Row Gate Driver[j]. The array’s Tx path

is in row-parallel mode. When the Tx element [i, j] is disabled (T en = 0); or when

neither Tx row or Tx column is selected (Tr[j] = Tc[i] = 0), both Tr and Tc are low

and the Tx pulser is turned off, ignoring gate drive signals from both column and row

gate drivers.

Similarly, Rr and Rc determine whether the Rx LNA outputs its analog signal

to the column side or the row side, or none, in which case the LNA is turned off.

1Figure 3-6 shows the I/Os are N instead of 2N .

46

Transceiver[ i, j ]

ColumnGate Driver[ i ]

Tr[ j ]

Rr[ j ]

Tc[ i ] Rc[ i ]

T

R

Column BUF[ i ]

RowBUF[ j ]

RowGate Driver[ j ]

T_en R_en

T_enTc[ i ]

T_enTr[ j ]

R_enRc[ i ]+Rr[ j ]

R_enRc[ i ]

R_enRr[ j ]

b

b

Figure 3-5: (a) The block-level implementation of one transceiver channel and (b) theper-element logic implementation. Column and row select logic is implemented withshift registers that can be reprogrammed in “N” time (implementation detail will beshown in Figure 5-2).

47

When the Rx element [i, j] is enabled (R en = 1) and the jth Rx row is selected

(Rr[j] = 1), the internal switch control signal Rr becomes high and the Rx LNA

output is connected to the Column Buffer[i]. The array’s Rx path is in column-

parallel mode. When the Rx element [i, j] is enabled (R en = 1) and the ith Rx

column is selected (Rc[i] = 1), the internal switch control signal Rc becomes high

and the Rx LNA output is connected to the Row Buffer[j]. The array’s Rx path

is in row-parallel mode. When the Rx element [i, j] is disabled (R en = 0); or when

neither Rx row or Rx column is selected (Rr[j] = Rc[i] = 0), both Rr and Rc are low

and the Rx LNA is turned off, presenting as high output impedance to both column

and row buffers.

The Rx HV protection switch protects low voltage Rx electronics from high voltage

Tx transients. An additional internal control signal, RxSw, is generated to control

the gate of the protection switch. Whenever the Rx LNA is activated and connected

to either column or row buffer, the HV switch is turned on (RxSw = 1) to allow

CMUT signal to reach LNA for amplification. The HV switch is off when the LNA

is not activated, and it also remains off during Tx pulsing to isolate the high voltage

pulsing transients.

The detailed circuit implementation for generating column / row select signals as

well as the per-element enable bits will be the topic of Chapter 5. But as a high-level

description, these selection and enable bits are stored in shift registers (SR’s) which

can be programmed serially. The column and row select signals are 16-bit long for the

16 columns and rows, while the per-element enable bits are 512-bit long, accounting

for 1-bit Tx enabling and 1-bit Rx enabling for each CMUT element in the 16x16

array. Furthermore, two multiplexed banks for each control set are implemented.

For example, there are two multiplexed 512-bit SR banks for per-element enable bit

programming. One SR bank can be used in normal operation while the other bank

is being reprogrammed. Alternatively, two SR banks can be both initiated so that

one could quickly alternate between the two banks to achieve fast aperture switching

between two pre-defined aperture patterns.

Lastly, because either column side or row side will be activated at one time, the

48

RowBUF [ 0 ]

Column BUF [ 15 ]

RowBUF [ 15 ]

Column BUF [ 0 ]

Rx_OUT [ 0 ] Rx_OUT [ 15 ]

Row GateDriver[ 0 ]

Column GateDriver[ 15 ]

Row GateDriver[ 15 ]

Column GateDriver[ 0 ]

Tx_IN [ 0 ]

Tx_IN [ 15 ]

Figure 3-6: (a) Tx input port multiplexing, implemented with digital logic; (b) Rxoutput port multiplexing, implemented with analog pass-gates.

column and row circuits share I/O ports by multiplexing, as shown in Figure 3-62.

For Tx, the multiplexing switches are implemented with digital logic gates; for Rx,

the multiplexing switches are implemented with analog pass-gates for analog signal

outputs. In this way, the input ports for Tx beamforming control and output ports

for Rx received waveforms are both 16 instead of 32 for a 16x16 array, saving the chip

I/O count considerably. And the chip’s interface scaling trend becomes N (rather

than 2N), which is the same trend as a 1D array for 2D imaging.

3.4 The Functionality of the Column-Row-Parallel

Architecture

In this section, a few examples will be utilized to help understand how the proposed

Column-Row-Parallel ASIC architecture could be used for 3D ultrasonic imaging.

Figure 3-7 shows an exemplary configuration of a column-parallel mode Tx aper-

ture on the 16x16 CMUT-ASIC system. Note that the exemplary configuration is

broken down and illustrated in steps to help understanding, but the actual ASIC

2This implementation detail is not shown in most other block diagram figures to avoid compli-cation.

49

configuration is carried out as a whole in one step. In this example, two of the 16

row select signals are turned on so that the two rows of Tx elements are activated, as

shown by the red squares in Figure 3-7(a). Because the array is operating in column-

parallel mode, all elements along the same column are in parallel as shown by the

red column connection lines in Figure 3-7(b). The elements on the same column are

driven by a shared Tx column gate driver as in Figure 3-7(c). Because the 16 column

gate drivers can be controlled independently, by supplying the driver signals with

different delay timings, the 16 Tx columns emit ultrasonic waves at slightly different

timing with respect to each other. This delay pattern could be configured to perform

ultrasonic beam-focusing or beam-steering along the azimuth direction, as shown in

Figure 3-7(d).

Figure 3-8 shows another exemplary configuration, in which a row-parallel mode

Rx aperture is programmed on the 16x16 CMUT-ASIC system. Five columns are

activated in this example by the column select signals, and each five Rx elements

on the same row are in parallel. Their outputs are combined in the analog domain,

which is buffered by a shared Rx row buffer. The 16 analog outputs are digitized by

off-chip ADCs. Afterwards, the digitized channel data can be processed digitally to

perform beam-formation along the elevation direction.

As mentioned in previous section already, the Tx and Rx paths are completely

independent. Therefore, the Tx path can be configured into a row-parallel mode

and the Rx path can be in column-parallel mode too. The number of active rows or

columns can also be programmable depending on the need. The reprogramming time

for the active rows and columns is fast, because the row and column select signals are

generated at the side of the array and it only takes N clock cycles, making it scalable

as the array size grows.

When multiple rows (the case for columns is similar) are activated, they operate

in parallel and effectively behave as a “thicker” row compared to when only one row

is activated. The azimuth beam-focusing is the same while the additional elevation

thickness could provide larger signal strength. This feature offers freedom at the

system-level. As will be seen in Chapter 4, different number of rows or columns

50

Ro

w S

elec

t L

og

ic

Column Select Logic

D0

Column Tx Drivers:Beamform Delays

D1 D15

Tx beamformin X (azimuth)

Ro

w S

elec

t L

og

ic

Column Select Logic

D0

Column Tx Drivers:Beamform delays

D1 D15

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

Figure 3-7: The architecture configured in a column-parallel mode for the Tx aperture.The configuration is broken down and illustrated in steps (a) through (d) to helpunderstanding. Two rows are activated as the Tx aperture and beam-formation alongazimuth (X) direction is achieved.

51

Figure 3-8: The architecture configured in a row-parallel mode for the Rx aperture.Five columns are activated as the Rx aperture and beam-formation along elevation(Y) direction is achieved.

can be selected for transmit or receive, to achieve the desired imaging requirements

(volume rate, resolution, etc.). The innovative circuit structures realizing the feature

are discussed in Chapter 5.

In addition to row-by-row or column-by-column operations, the array can also

be programmed into more complex aperture patterns for specific ultrasonic imaging

applications. This programming is accomplished through the proper use of the per-

element enable bits under each element for both Tx and Rx paths.

For example, in Figure 3-9(a), only the diagonal elements are configured with a

Rx per-element enable bit of 1 (R en = 1) while all other Rx element’s enable bits

are 0. The system is in row-parallel mode and all 16 column select signals are on,

so that the 16 diagonal Rx elements receive ultrasound echoes and output to the

16 row buffers. In this way, a diagonal Rx aperture is formed, achieving the same

functionality as described in [43].

Figure 3-9(b) shows another example, where a checker board pattern is activated

52

Ro

w S

elec

t L

og

ic

Column Select Logic

DD

D

Ro

w S

elec

t L

og

ic

Column Select Logic

D D D

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

To External ADCs

To

Ext

ern

al A

DC

s

Figure 3-9: More use examples of the proposed architecture: (a) a diagonal Rxaperture; (b) a checker board Tx aperture for ultrasonic harmonic imaging; (c) & (d)annular ring Tx and Rx apertures for forward-looking ultrasonic imaging applications.

53

for Tx path. All 16 Tx row select signals are activated while Tx per-element enable

bits define the checker board pattern inside the array. The column Tx gate drivers

supply the same delay profile for the 16 columns so that effectively all activated Tx

elements emit ultrasound pulses in-phase. This checker board Tx aperture could help

reduce second harmonic generation for the emitted ultrasound pressure field, which

is useful in ultrasonic harmonic imaging applications. It will be discussed in more

detail in Chapter 4.

The annular ring apertures are shown in Figure 3-9(c) and (d) for Tx and Rx

paths respectively. The ring shapes are adjustable in both column-parallel or row-

parallel modes. The Tx elements activated for the annular ring are driven in-phase

as indicated by the same delay values supplied by the row gate drivers in Figure 3-

9(c). Similarly, the Rx annular ring outputs the received ultrasound echoes through

the column buffers. The digitized waveforms from different channels can be summed

in-phase to form a single annular ring Rx waveform. The application of annular ring

apertures for forward-looking 3D ultrasonic imaging will be presented in Chapter 4.

3.5 Summary

The Column-Row-Parallel architecture provides both scalability and flexibility. First,

column and row select signals are fast to be reprogrammed, which are linearly scalable

as the 2D array size grows bigger (“N” scaling trend). They can activate rows or

columns for beam-formation in azimuth (X) or elevation (Y) directions. Second,

per-element enable bits offer fine granularity to form application-specific patterns,

such as the diagonal, checker board and the annular ring apertures. Moreover, each

control set has two multiplexed SR banks, which allow normal operation based on

one bank while reprogramming the other, or fast aperture switching between two

pre-programmed banks. Lastly, the architecture is compatible with many existing

beam-formation schemes [38, 40, 43, 55, 58], while offering new possibilities as will be

shown later.

54

Chapter 4

3D Ultrasonic Imaging System

Experiments

In this chapter, the system-level 3D ultrasonic imaging experiments are described.

The imaging system is assembled based on our custom designed prototype analog

front-end chip implementing the proposed Column-Row-Parallel architecture, inter-

facing to a 16x16 2D CMUT transducer array. The detailed design, implementation

and characterization of the AFE chip will be described in Chapter 5 and 6, but here

we will focus on the system-level capability of the proposed architecture and various

beam-formation algorithms suitable for the architecture.

4.1 The Hardware System Assembly

The experiments are conducted based on the real integrated hardware system, in

which a 16x16 CMUT chip and a 16x16 AFE custom chip are integrated as a complete

3D ultrasonic imaging front-end. As mentioned in Chapter 3, the layout area of

each AFE transceiver channel is element-matched to each CMUT element with a

size of 250µm × 250µm, so that the 16x16 array area of the CMUT and the ASIC

is matched and can be vertically integrated. For integration, each CMUT element

provides the electrical interconnection using a through silicon via (TSV) to a bonding

pad at the bottom side of the die, as has been described by many papers from CMUT

55

Figure 4-1: System integration diagram showing the flip-chip bonding connectionbetween CMUT and ASIC through a PCB interposer. The figure also shows themechanical setup for imaging experiments, including an oil tank and a 3D translationstage.

literature [39–43,61]. Each AFE channel of the ASIC also provides a flip-chip bonding

pad. Solder balls have been placed onto all ASIC pads with a solder bumping process

as one of the final steps in ASIC fabrication by the foundry. Figure 4-1 shows how the

CMUT and ASIC are integrated together to form a 3D ultrasonic imaging front-end

system. To interconnect to both the CMUT die and the ASIC die while providing

footprint flexibility, a PCB interposer is fabricated and used to do flip-chip bonding to

CMUT and ASIC at both sides respectively [62, 63]. The PCB vias directly connect

an individual CMUT element to its ASIC transceiver channel. The CMUT-PCB-

ASIC assembly is then plugged into the main testing PCB for measurements. The

oil tank contains vegetable oil as an in-vitro approximation to human fat and a 3D

translation stage is made to help hold various measurement tools or imaging phantoms

for experiments.

The actual test setup picture is shown in Figure 4-2, and the corresponding block

56

Tank withvegetable oil

3D Translation

Stage16-channel Data

Acquisition System

Main Testing

PCB

Holder

A Metal Ring Phantom on top of CMUT

Figure 4-2: The picture of the hardware system setup.

Phantom & Measurement

Setup

FPGA Control:ASIC Initialization

DC-DC Converter ControlTx / Rx SwitchingTx BeamformingRx Gain Control

Column / Row Mode SelectColumn / Row Select

PC:Rx Beamforming3D Image Display

16-ch Data Acquisition

Power Supplies(HV, analog, digital, etc.)

Main Testing PCB

Figure 4-3: The block diagram of the hardware system setup.

57

16x16 pads to individual CMUT

elements’ bottom plate

4.5mm

CMUT

4mm

2x16 pads to CMUT common top membrane

Pitch is 250um

Gap is250um or 373.75um

Height~0.5μm

CMUT Flip-Chip Bonding Pad Drawing

Figure 4-4: The 16x16 CMUT die drawings: (a) the footprint of the CMUT; (b) theCMUT flip-chip bonding pad metal structure drawing, courtesy of [40].

diagram is shown in Figure 4-3 to give an abstract view.

4.1.1 The PCB-CMUT Connection

At the PCB-CMUT connection side, the footprint of the CMUT samples comes with

two different possible configurations. As shown by Figure 4-4(a), the CMUT main

elements’ pad array is 16x16 with the pitch of 250µm, and there is an additional 2x16

pad array used for connection to the common top membrane to provide the DC bias

voltage, or the CMUT’s “ground”. The gap between the “ground” and the main array

can be either 250µm or 373.75µm, depending on the specific CMUT batches made

at the supplier. This necessitates the need for the PCB interposer, so that different

PCBs can be designed to fit different footprints. The CMUT flip-chip bonding pad

metal structure is also shown in Figure 4-4(b). The pad metal stack is composed of

Ti-Cu-Au and the pad diameter is 50µm at a pitch of 250µm.

Two PCB designs are correspondingly made to adapt to the two different CMUT

footprints, as shown in Figure 4-5(a) and (b). In version A, the gap between the

“ground” and the main array is 250µm, and all 18x16 pads are made in a pitch of

58

PCB version ACMUT footprint

PCB version BCMUT footprint

Figure 4-5: The two different PCB designs made to fit CMUT footprints: (a) thePCB version A’s footprint for CMUT with a gap distance of 250µm; (b) the PCBversion B’s footprint for CMUT with a gap distance of 373.75µm, only 1x16 pads aremade on the PCB side due to space limitations.

250µm. In version B, because the gap is 373.75µm, only 1x16 pads for “ground”,

instead of 2x16 pads, are laid out on the PCB due to space limitations. But because

the 2x16 pads are redundant, the omission still allows correct electrical connection

between PCB and CMUT. All PCB pads’ pitch is also 250µm.

Because the CMUT pads are not solder bumped at its initial fabrication, and

it is difficult to do solder bumping on an individual die, we need to perform solder

bumping for the pads on the PCB side, so that the flip-chip bonding can still be made

between PCB and CMUT. To accommodate PCB solder bumping, the PCB pads are

made with electroless nickel immersion gold (ENIG) with a metal stack of Ni-Cu-Au.

The pad diameter is 190µm. Because the pad pitch is 250µm, it leaves a clearance

of 60µm between two pads. The pads are drilled into vias of 150µm diameter with

a mechanical drill1. The vias are filled and plated with ENIG at both sides of the

PCB. The solder mask is then covered onto the pad with laser direct imaging (LDI)

technology, to define a solder mask thickness of roughly 13µm and a pad opening

1A laser drill could produce even smaller drills, but the smaller holes cannot be epoxy filled andplated over. As a result, 150µm mechanical drills are used.

59

CMUT die: 4.5mmX4mmAll pitch = 250um

PCB pad design and solder bumping drawings

PCB: size ~2inchX2inch; thickness ~30mil, FR4Pad open = 4mil (100um)Solder Mask Thickness = 0.5milPad finish: ENIG (Ni-Cu-Au)Pad size = 7.5mil (190um)

Figure 4-6: The drawing of a PCB pad defined with a solder mask, and bumped witha solder ball. The PCB pad is used to do flip-chip bonding to the CMUT die.

diameter of 100µm. The solder mask thickness and the pad opening size is defined

such that a solder ball diameter of 100µm can be placed onto the PCB pad. The

drawing of a PCB pad bumped with a solder ball is shown in Figure 4-6. The PCB

interposer is fabricated with FR4 material with a thickness of 0.76mm. The solder

balls have a commonly used composition of 63% Sn and 37% Pb, with a diameter of

100µm. Both versions of the PCB are fabricated by Sierra Circuits, Inc., Sunnyvale,

CA; and the PCB solder bumping is performed by Pac Tech - Packaging Technologies,

Santa Clara, CA.

4.1.2 The PCB-ASIC Connection

At the PCB-ASIC connection side, the ASIC die is already solder bumped. Therefore,

the PCB pads at the ASIC side are without solder bumps and are used to do flip-chip

bonding directly. The ASIC footprint is shown in Figure 4-7(a). The center area of

the ASIC is occupied by a grid of 18x16 pads, which are 16x16 AFE channels and the

2x16 CMUT biasing pads. They are element-matched to the CMUT die’s connecting

pads through the PCB interposer. The perimeter of the ASIC are a ring of pads

with 2-pad width, which are used as the I/O pads, providing ASIC’s power supplies,

ground, input controls and output signals. These ASIC I/O pads are also flip-chip

bonded to the PCB interposer, and are further routed to the four edges of the PCB,

60

16x16 AFE channels

Surrounding 2x pad rings are for ASIC I/Os

6mm

Pitch is 250um

5.5m

m

2x16 pads providing

CMUT bias

18x16 pads connecting to CMUT

through PCBSurrounding 2x

pad rings

Figure 4-7: The ASIC die drawings: (a) the footprint of the ASIC, containing thecenter 18x16 pads to be element-matched and connected to CMUT through the PCBinterposer, and the surrounding I/O pads; (b) the PCB interposer layout design thatallows the ASIC I/O pads to be routed out to the PCB edges.

for interconnection to the main testing PCB, as shown in Figure 4-7(b). The fact that

the I/O pad ring is of 2-pad width ensures that only a 2-layer PCB design is needed.

Since the PCB interposer is of fine pitch at 250µm, and that the wire spacing is as

tight as 60µm, keeping the PCB layer requirement to the minimum can help reduce

the manufacturing cost greatly.

In Figure 4-8, the ASIC flip-chip bonding pad’s metal structure is depicted. The

structure is made using the dedicated metal layers for flip-chip bonding pads provided

by the silicon process, in which MD is the redistribution metal layer for routing

between the ASIC’s top metal (M6) to the flip-chip bonding pads, and the Under

Bump Metallurgy (UBM) is the material forming the pad structure under the solder

bump.

61

(a) (b)

Pad size (UBM) = “C” = 80um

Pad open size = “A” = 50um

Solder ball diameter ≈ 100um

Solder ball heightafter bumping ≈ 80um

ASIC Flip-Chip Bonding Pad Drawings

Figure 4-8: The ASIC flip-chip bonding pad metal structure drawings: (a) the hor-izontal view of a flip-chip bonding pad in ASIC; (b) the cross-sectional view of theASIC flip-chip bonding pad.

4.1.3 The Flip-Chip Bonding Assembly Process

The bonding process is performed on a FC150 flip-chip bonder (SET Corporation SA,

Smart Equipment Technology, France). The process contains the flip-chip bonding

steps between PCB-CMUT and PCB-ASIC respectively. Each side’s assembly is first

tested and verified to be working with spare CMUT and ASIC chips, in which different

process parameters, such as the tacky flux, the bonding force, reflow temperature

profile, etc., are tweaked for an optimal result. Afterwards, a two-step bonding process

is performed to obtain the full assembly.

First Step: Bonding between PCB-ASIC

The first step is the flip-chip bonding between PCB and ASIC. As shown in Figure

4-9(a), the PCB is picked up by the arm (chip holder) of the flip-chip bonder, with

ASIC-side PCB pads facing down; and the ASIC is fixed horizontally by the chuck

(substrate holder) of the flip-chip bonder, with the solder-bumped ASIC pads facing

up. Tacky flux is applied onto the ASIC chip. 3000 grams of bonding force is applied

62

ASIC

CMUT

PCBASIC

PCB

Arm (chip holder)

Chuck (substrate holder)

Arm (chip holder)

Chuck (substrate holder)

(a) (b)

The CMUT-PCB-ASIC Two-Step Bonding Process

Figure 4-9: The CMUT-PCB-ASIC two-step flip-chip bonding process: (a) first step,the bonding between PCB and ASIC; (b) second step, the bonding between PCB andCMUT, with ASIC already bonded to PCB.

and the bonded assembly is reflowed in a Centrotherm Reflow Oven, with a peak

temperature of 215oC and a dwell time of 12 seconds. The reflow is done in N2

atmosphere. After that, the half-finished assembly is cleaned in propanol over night.

The PCB-ASIC connection shows a success rate of close to 100%, and a picture

of the connection is shown in Figure 4-10(a). Optionally, PCB-ASIC connection can

be verified by doing electrical tests on ASIC through the PCB interconnections. If

the ASIC operates as expected, the connections are very likely to be good since all

perimeter I/O pad connections are verified to be normal. During the flip-chip bonding,

because the arm vacuum holder holds the PCB by its CMUT-side, the solder bumps

at the PCB’s CMUT side are slightly deformed. However, since the bonded assembly

goes through a reflow process, any solder ball deformation is restored after the reflow.

Figure 4-10(b) shows the solder balls at PCB’s CMUT side after the reflow, and it

shows good uniformity in shape.

Second Step: Bonding between PCB-CMUT

The second step is the flip-chip bonding between PCB and CMUT. As shown in

Figure 4-9(b), the CMUT die is picked up by the arm, with its pads facing down

63

(a) (b)

PCB-ASIC Connection PCB’s CMUT-side Solder Bumps after Reflow

Figure 4-10: The CMUT-ASIC connection result pictures: (a) the bonded PCB-ASICassembly shows good connectivity; (b) the solder bumps at the PCB’s CMUT side isreflowed after PCB-ASIC bonding, any deformation would be restored.

Figure 4-11: The PCB-CMUT bonding connection is verified by pulling off the testCMUT die from the PCB after bonding and reflow. (a) & (b) show the CMUTconnection posts remain on the PCB after the pull, indicating good connectivity.

64

Figure 4-12: The finished CMUT-PCB-ASIC assembly: (a) cross-sectional view of thesandwich stack; (b) CMUT side assembly picture; (c) ASIC side assembly picture.

and its membrane surface touching the vacuum holder. The PCB-ASIC assembly

is fixed horizontally by the chuck, with the solder-bumped PCB’s CMUT-side pads

facing up. Tacky flux is applied onto the PCB’s CMUT side. 2000 grams of bonding

force is applied and the bonded assembly is reflowed in N2 atmosphere with a peak

temperature of 215oC and a dwell time of 12 seconds. The complete assembly is

thus finished. Underfill is not applied to either steps, and no significant mechanical

degradation has been observed over the testing period.

The PCB-CMUT connection took us a few trials before reaching a fully-functional

16x16 array. Electrical characterization and fault-tolerant design techniques are also

key factors leading to the fully-functional array, which will be discussed in more detail

in Section 5.5. Mechanically, flip-chip bonding trials had been performed on spare

dummy CMUT chips to obtain a correct bonding force. When bonding forces of

2000-3000 grams were applied, the PCB-CMUT bonding produced best results. It

was verified by pulling off test CMUT die from the PCB after bonding and reflow,

65

ASIC

CMUT

Figure 4-13: The acrylic tank drawings: (a) the tank dimension drawing; (b) themounting between the oil tank and the CMUT-PCB-ASIC assembly.

as shown in Figure 4-11(a) and (b). The CMUT was removed with great force, and

after its removal, the majority of CMUT TSV posts remain connected with the PCB

pads, indicating a strong bonding result.

Figure 4-12(a) shows a complete sandwich stack of the CMUT-PCB-ASIC assem-

bly. Also in Figure 4-12(b) and (c), the CMUT side and ASIC side assembly pictures

are shown. It has also been proven over time that although the arm vacuum holder is

holding the CMUT by its membrane surface during the PCB-CMUT bonding process,

it does not break the CMUT device or affect its operation afterwards.

4.1.4 Mounting onto the Oil Tank

The CMUT-PCB-ASIC assembly is closely mounted onto an acrylic oil tank, so that

the assembly can be directly used to perform in-vitro imaging experiments. As shown

66

in Figure 4-13(a), the tank is a cube with a side length of approximately 3 inches.

The tank is designed to be mounted on top of the CMUT-PCB-ASIC assembly and

its bottom has a hole in the center to expose the CMUT chip. There are threaded

screw holes at tank’s bottom plane so that the PCB can be screw mounted under the

tank. To help improve sealing, rubber gasket is inserted between the tank and PCB,

and silicone industrial sealant (General Electric RTV 110 Series) is applied to both

sides of the gasket.

To help hold imaging phantoms or measurement tools, a 3D translation stage is

also added into the hardware setup, as already been shown in Figure 4-1 and Figure

4-2. The translation stage is fixed with respect to the oil tank to avoid any relative

movement.

As a final comment for this section, the 16x16 CMUT device still contains a few

defective, non-functional elements. The ASIC is designed to be fault-tolerant to the

CMUT defects, as will be discussed in Section 5.5, so that the defective elements

are disabled while the remaining functional elements operate normally. This fault-

tolerant design strategy has been a key factor ensuring the successful assemblies.

Meanwhile, because the number of non-functional elements are limited (less than 10

in some of the best assemblies), their effect on the imaging quality is not severe. The

imaging experiments presented in Sections 4.2 and 4.4 are carried out on the full

16x16 assemblies with defects. To minimize the loss of elements, digital interpolation

has been implemented on the received signals, although the transmitter side does not

have the interpolation capability2. When the loss of elements are not acceptable, as

is the case in Section 4.3, a 10x10 sub-array with all functional elements are used for

the experimental demonstration.

2If transmitter side interpolation is also desired, a pulser design implemented as a linear amplifiercan be used. The pulse amplitude and phase in neighboring channels of the missing one can beadjusted, to implement the interpolation. More discussions are in Section 7.2.

67

4.2 Plane-wave Coherent Compounding for Fast

Volume Rate 3D Ultrasonic Imaging

For 3D ultrasonic imaging, we face not only the challenge of massive channel count as a

hardware limitation (as already been discussed in Chapter 3), but also the challenge of

a proper imaging scheme so that the 3D space can be imaged with satisfactory quality

and volume rate. A real-time imaging system needs good image quality (resolution,

contrast, etc.) for visualization and high volume rate to avoid severe motion blurring,

but these two considerations are conflicting requirements that are especially hard

to reconcile in 3D imaging. Comparing a 3D imaging system with a 2D array to a

2D imaging system with a 1D array, both the number of transceiver channels and

the image spatial span are significantly increased. More channels translates to more

data to collect and process, more image spatial span translates to the necessity of

transmitting and receiving from more ultrasonic beams to cover the whole volumetric

space.

Previously, efforts have been made to achieve fast volume rate in 3D imaging

systems by transmitting a “fat” ultrasonic beam at one time and doing parallel pro-

cessing of 8 ultrasonic beams in the receive mode [64,65]. This parallel beamforming

technique is called Explososcan and it achieves a volume rate of 8 volume/s, with 32

transmit channels and 32 receive channels on a 289-element 2D array (17x17). More

recent studies have pushed this concept to the extreme, by transmitting a plane-

wave ultrasonic beam and doing massively parallel beam-formation at the receive

end. The method is called plane-wave coherent compounding (PWCC) [66–69]. The

plane-wave emission illuminates a large space with one transmission, decreasing the

data acquisition time and greatly increasing the volume rate. The plane-wave can

also be steered to multiple different angles, and the received data from different angles

can be coherently compounded to yield ultrasonic images with better contrast and

less speckle. Moreover, because the data processing is done after all channel data is

collected, with synthetic beam-formation techniques, PWCC is in essence a software

beam-formation process. It is highly flexible and scalable, with its computational

68

Figure 4-14: The illustration of how PWCC works for 2D ultrasonic imaging, courtesyof [68].

complexity proportional to number of pixels / voxels to be displayed in the final

image.

4.2.1 PWCC for 2D Imaging

The PWCC method was demonstrated on a 1D array for 2D imaging from previous

literature [66–69]. The intuitive illustration is shown in Figure 4-14. The 1D array

emits plane-waves, which have different wavefront angles. Under each transmitted

plane-wave angle, the received waveforms from all channels are collected and stored.

Normal delay-and-sum beam-formation is then carried out on each angle’s data set

to obtain the coarse 2D image of lower contrast and resolution. Finally, coherent

compounding is performed across images obtained from different transmit angles. As

a result, a higher quality image is produced.

The principle of coherent compounding is illustrated in Figure 4-15. The receive

side beamforming delays are calculated as in focused imaging. It is based on the

time-of-flight from the center of the transducer array (0, 0) to a spatial point in the

2D image with the coordinates of (x, z), then back to the receiving element at (x1, 0),

69

Figure 4-15: The principle of coherent compounding used in PWCC, courtesy of [68]:(a) the imaging space; (b) the beam-formation delay calculation when the transmittedplane-wave is normal to the transducer surface (α = 0o); (c) the beam-formation delaycalculation when the transmitted plane-wave is steered to an angle of α.

as in Equation (4.1) (c is sound speed):

τRX (x1, x, z) =√z2 + (x− x1)2/c. (4.1)

However, the transmit side beamforming delays need to take into account the

propagation of the plane-wave angle. It is done by adding a constant time offset

to the original delays used to generate the plane-wave, which effectively rotates the

plane wavefronts about a point “behind” the transducer by an angle of α. The delay

for a spatial point at (x, z) in the 2D image is in Equation (4.2):

τTX (α, x, z) = (z · cosα + x · sinα) /c. (4.2)

Combining both the transmit and the receive side delays, the propagation time

from the center of the transducer array to (x, z) is expressed in Equation (4.3):

τ (α, x1, x, z) = τTX + τRX . (4.3)

Additional techniques such as the constant F-number aperture scaling and apodiza-

tion, as mentioned in Section 2.2, can also be applied. Investigations in [68, 70]

have shown that approximately 7 to 9 plane-wave acquisitions are both adequate

70

and practical for coherent compounding. Therefore the plane-wave acquisitions have

10x reduction in number of transmissions than traditional focused emissions, while

producing images with comparable quality. Extensive image quality measurement

metrics have been used to reach the conclusion, including: -10dB lateral resolution,

contrast, side-lobe amplitude, and image SNR. The reduction in number of transmis-

sions could translate to less system power consumption, or higher image frame rate,

with similar image quality as conventional methods.

4.2.2 Extending PWCC to 3D Imaging on the Column-Row-

Parallel Architecture

The previous PWCC implements plane-wave steering along the azimuth (X) direction

only, so that the 2D images can be coherently compounded. It is quite natural to

extend the plane-wave insonification to be steered in both azimuth (X) and elevation

(Y) directions, so that the whole 3D space can be illuminated and the compounding

can be performed over the volumetric images. This possibility has been briefly men-

tioned in [71], in which a 32x32 2D transducer array is built and a 3D imaging system

is proposed. However, no detailed algorithm explanations or hardware measurement

results are exhibited.

On the contrary, our proposed 3D imaging architecture could be a suitable hard-

ware platform to support the plane-wave coherent compounding in 3D (PWCC3D).

The algorithm and the hardware realization will be described in this section.

PWCC3D Signal Processing

The beam-formation and coherent compounding procedure can be easily extended to

3D imaging, as shown in Figure 4-16. On our 16x16 imaging system assembly, each

data set of 256 received echo waveforms is associated with one transmit angle. Totally

p transmit angles “α X1” to “α Xp” can be steered along the azimuth direction and

q transmit angles “β Y 1” to “β Y q” are steered along the elevation direction. The

delay-and-sum beam-formation is applied onto each data set, yielding a 3D volumetric

71

image for each transmit angle. And finally the volumetric images for each angle can

be coherently compounded to produce a high quality 3D image.

Each voxel in the volumetric image is beam-formed from the 256-channel data,

the equations for calculating the delay values for each channel can be revised from

Equations (4.1)-(4.3) to adapt to 3D imaging.

The receive side beamforming delays are calculated based on the time-of-flight, but

the coordinates are extended to 3D. The distance is from the center of the transducer

array (0, 0, 0) to a spatial point (i.e. the voxel) in the 3D image with the coordinates

of (x, y, z), then back to the receiving element at (x1, y1, 0), as in Equation (4.4):

τRX (x1, y1, x, y, z) =√z2 + (x− x1)2 + (y − y1)2/c. (4.4)

The transmit side beamforming delays are used to account for the propagation

of the plane-wave angle. Depending on whether the plane-wave is steered across the

azimuth or elevation direction, the delays are calculated differently. Equation (4.5) is

used when the column-parallel mode is active and the plane-waves are steered along

the azimuth direction, with an transmit angle of α. The delay for a voxel at (x, y, z)

in the 3D image is:

τTX azimuth (α, x, y, z) = (z · cosα + x · sinα) /c. (4.5)

Equation (4.6) is used when the row-parallel mode is active and the plane-waves

are steered along the elevation direction, with an transmit angle of β. The delay for

a voxel at (x, y, z) in the 3D image is:

τTX elevation (β, x, y, z) = (z · cos β + y · sin β) /c. (4.6)

Combining both the transmit and the receive side delays, the delay value from

the center of the transducer array (0, 0, 0) to voxel (x, y, z) can be summarized with

72

Final3D Image

16x16 Rx Waveforms

3D Imageα_X1

Delay-and-sum BF

16x16 Rx Waveforms

3D Imageα_Xp

16x16 Rx Waveforms

3D Imageβ_Y1

Coherent Compounding

Tx: α_X1 Tx: α_Xp Tx: β_Y1 Tx: β_Yq

16x16 Rx Waveforms

3D Imageβ_Yq

Hilbert Transform

Envelop Detection (absolute value)

Delay-and-sum BF

Hilbert Transform

Delay-and-sum BF

Hilbert Transform

Delay-and-sum BF

Hilbert Transform

Complex Domain

Figure 4-16: The signal processing flow for PWCC3D on the Column-Row-Parallelarchitecture.

73

Equation (4.7) for azimuth and elevation steering:

τazimuth (α, x1, y1, x, y, z) =

(z · cosα + x · sinα +

√z2 + (x− x1)2 + (y − y1)2

)/c,

τelevation (β, x1, y1, x, y, z) =(z · cos β + y · sin β +

√z2 + (x− x1)2 + (y − y1)2

)/c.

(4.7)

Under each transmit angle, a coarse 3D image is formed by applying delay-and-

sum beam-formation algorithm on 256-channel data, with the delay values calculated

from Equation (4.7). Figure 4-16 indicates that the Hilbert transformation is first

performed to convert the original channel data into the “in-phase” signal I(t) and

“quadrature” signal Q(t) to preserve the phase information. When compounding is

performed across different transmit angles, the voxel values are added in both I(t) and

Q(t), which maintains the data coherency, hence the name coherent compounding.

The final compounded 3D image is obtained by taking the amplitude of I(t) and Q(t)

(√I(t)2 +Q(t)2) of the voxels using envelope detection.

Because the beam-formation is performed on each voxel while utilizing the same

set of data, the beamformer is a software beamformer and the processing is very

scalable and flexible. The data acquisition is only done once so that the data under

every Tx angle is stored. The beam-formation can be done independently over the

voxels of interest in the space. One could first perform beam-formation and image

display over a large space with large voxel spacing for a coarse volumetric image; after

spotting feature of interest, one could perform a second-pass processing using the same

collected data, over a smaller space with finer voxel spacing, which would generate

higher definition volumetric images. In this way, a flexible, low-power, software beam-

former can be designed to adapt to different user scenarios for optimal trade-offs

between power consumption, processing speed and image quality.

In addition, constant F-number technique is applied during the delay-and-sum

beam-formation (see Section 2.2). Voxels closer to the transducer surface will have a

smaller active aperture contributing to its beam-formation, while voxels farther away

will exploit a bigger active aperture for the beam-formation.

74

Implementing PWCC3D on the Column-Row-Parallel Architecture

The implementation of PWCC3D on the proposed Column-Row-Parallel architecture

is shown in Figure 4-17. All elements are turned on during the transmit phase, so

that a steered plane-wave can be emitted. In Figure 4-17(a), the array is configured

in the column-parallel mode for its Tx path. Since each of the 16 Tx pulser drivers at

the column side is supplied with an independent delay to drive the 16 elements along

the same column, the 16 columns can be delayed with respect to each other, thus

implementing beam-steering along the azimuth (X) direction. Similarly, to achieve

beam-steering along the elevation (Y) direction, as shown in Figure 4-17(b), the

array’s Tx path is arranged in the row-parallel mode, and 16 elements along the same

row are driven by the shared Tx pulser driver at the row side.

During the receive phase, the receive channels are turned on row-by-row, as can

be seen from Figures 4-17(c)-(e). For each row, 16 ultrasonic echo waveforms are

sensed by the activated CMUT elements and amplified by the receiver AFE. The

waveforms are then buffered on-chip by the column buffers and digitized by external

ADCs, before stored digitally in a PC. To collect all 256 elements’ echo waveforms, 16

consecutive ultrasonic insonifications of the same transmit angle are generated, while

the 16 rows are activated serially, such that the whole 16x16 aperture is swept.

This operation sequence is also illustrated in Figure 4-18. There are p angles along

azimuth and q angles along elevation used to generate the final compounded 3D image.

Each angle is transmitted and the echo waveforms are collected for all 256 channels.

Under each angle, 16 transmit-receive repetitions are needed to acquire all channel

data as shown in the inset of Figure 4-18. As a result, totally 16× (p+ q) transmit-

receive repetitions are needed for the processing of a final compounded image.

For a general case of a NxN array, the transmit-receive repetitions needed for

acquiring one volumetric image becomes N × (p+ q). For a imaging system running

at a certain PRF, the time for one transmit-receive repetition is the PRP (PRF and

PRP are defined in Section 2.1). Therefore, as shown in (4.8) and (4.9), the acquisition

time increases linearly with respect to array size growth (“N” scaling trend); and the

75

Ro

w S

elec

t L

og

ic

Column Select Logic

Plane-wave Delays(Column-parallel)

(Tx Plane-wave Steer in X)

Ro

w S

elec

t L

og

ic

Column Select Logic

Pla

ne-

wav

e D

elay

s(R

ow

-par

alle

l)

(Tx Plane-wave Steer in Y)

(Step 16 rows for all 256)

Ro

w S

elec

t L

og

ic

Column Select Logic

Acquire 16 waveforms each repetition

Ro

w S

elec

t L

og

ic

Column Select Logic



Ro

w S

elec

t L

og

ic

Column Select Logic



(c) (d) (e)

(a) (b)

Figure 4-17: The PWCC3D implementation on the Column-Row-Parallel architec-ture: (a) Tx beam-steering along azimuth (X) direction using column-parallel mode;(b) Tx beam-steering along elevation (Y) direction using row-parallel mode; (c)-(e)Rx signal acquisition, sweeping through 16 rows for each transmit angle.

76

Tx:α_X1

Tx:α_X1

Tx:α_X1

Tx: α_X1

Rx (collecting all echo waveforms)

Tx: α_X2

Rx

Tx: α_Xp

Rx

Tx: β_Y1 Tx: β_Y2

Rx

Tx: β_Yq

RxRx

Time

1 transmit-receive repetition

16 transmit-receive repetitions:Acquire full 16x16 waveforms under one Tx angle

Rx:Row1

Rx:Row16

Rx:Row2

Figure 4-18: The sequence of operation to implement PWCC3D on the Column-Row-Parallel architecture.

volume rate of a PWCC3D imaging system is inversely proportional to N . This is a

benign scaling trend for 3D imaging systems, because of the row-by-row or column-

by-column data reception capability provided by the architecture.

Acquisition T ime = N × (p+ q)× PRP =N × (p+ q)

PRF⇔ O (N) , (4.8)

V olume Rate =1

Acquisition T ime=

PRF

N × (p+ q)∝ N−1. (4.9)

4.2.3 PWCC3D Results: Simulations and Measurements

To evaluate the performance of PWCC3D, both Field II simulations and real mea-

surements are carried out. Simulations are compared against the measurements, and

various Tx angles are used to demonstrate the PWCC3D algorithm.

77

Single plane-wave, avg 5x

Cross-sectional

Image

5 plane-wave angles:(-6.7o,-3.3o,0o,3.3o,6.7o)

Wire Phantom

Cross-sectional

Image

Wire Phantom

CMUT CMUT

(a) (b)

X / azimuth

Y / elevation

Z / depth

Figure 4-19: The setup of the wire phantom imaging experiment using PWCC3Dalgorithm: (a) a single plane-wave is transmitted to image the wire phantom; (b) fivedifferent Tx angles are used along the azimuth direction for PWCC3D.

Wire Phantom

A wire phantom is first imaged by the 16x16 array setup in simulation and mea-

surements, so that the spatial impulse response can be recorded for the imaging

system. The physical setup is shown in Figure 4-19. The wire phantom is placed

at 7.5mm away from the transducer surface, horizontal to the surface. Transmit

pulsation is 2 bursts of 8.33MHz pulses3. A constant F-number of 1.75 is used for

beam-formation and the rectangular window is used for both Tx and Rx apodiza-

tion. Single Tx plane-wave angle insonification is compared against five Tx angles

(−6.7o,−3.3o, 0o, 3.3o, 6.7o) compounded along azimuth direction in this experiment.

Compounding along the elevation direction is not performed for the wire phantom be-

cause its benefit will not be evident for the wire spanning along the elevation direction;

but the compounding along azimuth direction makes big improvement, as revealed

by the simulated and measured images. The volumetric images are displayed at 20dB

dynamic range.

The simulation results are shown in Figure 4-20. It is done by simulating a line

of ideal point scatterers in space to mimic the metal wire in real experiment. The

vertical cross-sectional images of the wire phantom (point spread function) imaged

by single plane-wave and compounded are visually compared in Figure 4-20(a) and

(b). It can be seen that the 5-angle compounded image is of higher contrast and

3The choice of 2 bursts of pulses is for good image axial resolution, as been discussed in Section2.1.

78

better resolution. This is confirmed by the quantitative comparison in Figure 4-20(c)

and (d), where the lateral point scatterer’s amplitudes are plotted. The compounded

image has a finer -10dB lateral resolution (0.50mm compared to 0.58mm) and a

lower side-lobe amplitude (less than -30dB compared to -12dB) than single plane-

wave. The side-lobes can be more readily seen in Figure 4-20(e) and (f), where the

horizontal cross-sectional images of the wire are shown. While single plane-wave

transmit generates visible “fake” wires (i.e. the side-lobes) at the two sides of the

main wire location, the compounded image has no side wires visible.

Real imaging experiments on a metal wire phantom are also performed. The metal

wire has a diameter of 0.48mm and is placed 7.5mm away from the transducer. The

same pulsation and PWCC3D beam-formation is used to form the images. Similarly,

single-angle vs. 5-angle compounding results are compared in Figure 4-21. The mea-

sured wire images show quality degradations due to the wire thickness, array element

and circuit mismatches. But PWCC3D still demonstrates significant improvement

for image resolution, where the -10dB resolution is improved by over 46% in this case

(from 1.32mm to 0.71mm). The axial resolution is determined by the pulse frequency

and number of bursts. Therefore, -10dB axial resolution is measured to be similar in

two cases (0.39mm for single-angle vs. 0.36mm for 5-angle).

Ring Phantom

The wire phantom displays the benefit of PWCC3D only from the azimuth direction.

Here a metal ring phantom is used to fully demonstrate the benefit of PWCC3D for

a volumetric image. As shown in Figure 4-22, the ring is placed horizontally above

the transducer surface with a vertical distance of 7.5mm. The transmit pulsation is 2

bursts of 8.33MHz pulses, the constant F-number is 1.75, and the rectangular window

is used for both Tx and Rx apodization. The compounding employs 5 different

Tx plane-wave steering angles (−6.7o,−3.3o, 0o, 3.3o, 6.7o) in azimuth and elevation

directions respectively, so that the ring image can be enhanced in all directions.

In order to investigate closely the effect of coherent compounding in both az-

imuth and elevation directions, Figure 4-23 shows a comparison between different

79

(a) (b)

(c) (d)

Single-angle

X(mm)

Z(m

m)

X(mm)

Z(m

m)

5-angle in X

(e) (f)

X(mm)

Y(m

m)

5-angle in X

X(mm)

Y(m

m)

Single-angle

Lateral -10dB resolution: 0.58mm

Side-lobe: -12dB


Side-lobe: < -30dB

Wire Phantom

Cross-sectional Image

Wire Phantom


Figure 4-20: Simulation results of a wire phantom: (a) vertical cross-sectional im-age produced from single angle plane-wave insonification; (b) vertical cross-sectionalimage produced from 5-angle coherent compounded plane-wave insonification; (c) lat-eral resolution plot from single plane-wave; (d) lateral resolution plot from 5-angleplane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizon-tal cross-sectional image from 5-angle plane-waves.

80

(a) (b)

(c) (d)

Single-angle

X(mm)

Z(m

m)

X(mm)

Z(m

m)

5-angle in X

(e) (f)

X(mm)

Y(m

m)

5-angle in X

X(mm)

Y(m

m)

Single-angle



Wire Phantom


Wire Phantom


Figure 4-21: Measurement results of a wire phantom: (a) vertical cross-sectional im-age produced from single angle plane-wave insonification; (b) vertical cross-sectionalimage produced from 5-angle coherent compounded plane-wave insonification; (c) lat-eral resolution plot from single plane-wave; (d) lateral resolution plot from 5-angleplane-waves; (e) horizontal cross-sectional image from single plane-wave; (f) horizon-tal cross-sectional image from 5-angle plane-waves.

81

CMUT

(a) (b)

X / azimuth

Y / elevation

Z / depth

Single plane-wave, avg 10x

Cross-sectional

Image

Ring Phantom

Cross-sectional

Image

Ring Phantom

10 Tx Angles in X & Y(-6.7o,-3.3o,0o,3.3o,6.7o)

CMUT

Figure 4-22: The setup of the ring phantom imaging experiment using PWCC3Dalgorithm: (a) a single plane-wave is transmitted to image the phantom; (b) fivedifferent Tx angles are used along the azimuth direction and another five Tx anglesalong the elevation direction to image the phantom with PWCC3D.

compounding schemes. Comparing Figure 4-23(a) and (b), the 5-angle compounding

in X direction is able to suppress the side-lobes along the azimuth much more than

the single-angle plane-wave. The most noticeable difference is that the artifact in the

blue-cycle region in Figure 4-23(b) is much less evident than Figure 4-23(a). How-

ever, the side-lobes along the elevation are not suppressed, as can be seen from the

red-cycle region in Figure 4-23(b), which looks almost the same as Figure 4-23(a).

Similarly, comparing Figure 4-23(a) and (c), the 5-angle compounding in Y di-

rection is able to suppress the side-lobes along the elevation much more than the

single-angle plane-wave. The artifact along elevation in the blue-cycle region in Fig-

ure 4-23(c) is suppressed, but the side-lobes along the azimuth in the red-cycle region

remains and looks almost the same as Figure 4-23(a). When the compounding on

both azimuth and elevation directions are combined, as in Figure 4-23(d), the artifacts

along both directions are suppressed. The image quality is most enhanced compared

to Figure 4-23(a).

Figure 4-24 quantifies the performance improvement of PWCC3D for the ring

images. The vertical cross-sectional images are used to show the side-lobe amplitudes

of the ring images from the single-angle plane-wave insonification and the 10-angle

X & Y steered plane-waves. As can be seen, the side-lobes in the center of the ring

is improved from -7.3dB to -13.3dB, leading to a 6dB improvement with 10-angle

82

Ring Phantom


Single-angle

X(mm)Y

(mm

)

(a) (b)

(c) (d)

5-angle in X

X(mm)

Y(m

m)

5-angle X + 5-angle Y

X(mm)

Y(m

m)

5-angle in Y

X(mm)

Y(m

m)

Side-loberemains

Side-lobesuppressed

Side-lobesuppressed

Side-loberemains

Figure 4-23: Measured horizontal cross-sectional images of a ring phantom: (a) single-angle Tx plane-wave; (b) 5-angle Tx plane-wave compounding along azimuth direc-tion; (c) 5-angle Tx plane-wave compounding along elevation direction; (d) com-pounding across all 5-angle azimuth and 5-angle elevation directions.

coherent compounding. A 10kHz PRF is used for the 10-angle compounding scheme

in our experiments. According to (4.8) and (4.9), where (p + q) = 10, N = 16, the

acquisition time for one volumetric image is 16ms and the volume rate reaches 62.5

volume/s. As mentioned in Section 4.2.2, the volume rate will decrease linearly with

respect to the increase in the array size, or number of plane-wave angles, to trade off

for a better image quality.

Cyst Phantom Simulation on a 64x64 Array

As an extrapolation of our current hardware setup, a more complex setup is simulated

in Field II to investigate how technology scaling can push PWCC3D performance

further.

A hypothetical 64x64 2D array with an element pitch of 250µm is used in the

83

Single-angle

X(mm)Z

(mm

)

5-angle X + 5-angle Y

X(mm)

Z(m

m)

(a) (b)

(c) (d)

Side-lobe at center:-7.3dB

Side-lobe at center:-13.3dB


Ring Phantom

Figure 4-24: Measured vertical cross-sectional images of a ring phantom: (a) single-angle Tx plane-wave; (b) compounding across all 5-angle azimuth and 5-angle eleva-tion directions; (c) lateral resolution plot of ring image from single-angle Tx plane-wave; (d) lateral resolution plot of ring image from 5-angle X and 5-angle Y plane-waves.

simulation to provide a bigger aperture. A cyst phantom spanning between the depth

of 20mm to 50mm is initiated as the imaging target, which serves as a benchmark

for evaluating speckle reduction performance of PWCC3D. There are three cysts

located at (−3, 0, 25)mm, (0, 0, 35)mm, (3, 0, 45)mm, respectively. Each cyst size is

6mm in diameter. The surrounding of the cysts are randomly spaced point scatterers

mimicking normal tissues. The transmit pulsation is 2 bursts of 5MHz pulses, the

constant F-number is 1.75, and the rectangular window is used for both Tx and Rx

apodization.

The XZ cross-sectional images are shown. Figure 4-25(a) shows the single-angle

plane-wave image while Figure 4-25(b) shows a compounded one, in which 5 angles

along azimuth (−4o,−2o, 0o, 2o, 4o) and 5 angles along elevation (−4o,−2o, 0o, 2o, 4o)

84

Figure 4-25: Simulated XZ cross-sectional images showing the three cysts in one sliceimage: (a) image generated from single-angle plane-wave; (b) image generated from 5azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectionalimage location in 3D space.

are used. The associated simulation setup is illustrated in Figure 4-25(c). The com-

parison shows a much improved image contrast by utilizing PWCC3D.

The YZ cross-sectional images show individual cyst at different depth. Figures

4-26, 4-27, 4-28 show slice image comparisons of the three cysts.

Finally, the volume rate of 10-angle compounding PWCC3D implemented on a

64x64 array with the Column-Row-Parallel architecture would be 15.6 volume/s, as-

suming a 10kHz PRF, according to (4.9). Compared to the 10-angle compounding

on a 16x16 array in Section 4.2.3, the 64x64 system frame rate is exactly decreased

by 4x. But the image resolution and contrast become better by using a bigger array.

4.2.4 Discussion

The proposed PWCC3D algorithm on the Column-Row-Parallel architecture is a

suitable solution for high volume rate 3D ultrasonic imaging applications. The volume

rate can be traded off with image quality easily. More Tx angles lead to better image

85

Figure 4-26: Simulated YZ cross-sectional images showing the cyst at (−3, 0, 25)mm:(a) image generated from single-angle plane-wave; (b) image generated from 5azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectionalimage location in 3D space.

Figure 4-27: Simulated YZ cross-sectional images showing the cyst at (0, 0, 35)mm:(a) image generated from single-angle plane-wave; (b) image generated from 5azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectionalimage location in 3D space.

86

Figure 4-28: Simulated YZ cross-sectional images showing the cyst at (3, 0, 45)mm:(a) image generated from single-angle plane-wave; (b) image generated from 5azimuth-angle and 5 elevation-angle plane-waves compounded; (c) the cross-sectionalimage location in 3D space.

resolution and contrast, while the acquisition time would increase linearly and the

volume rate would be reduced. This is a flexible feature that allows PWCC3D to be

adaptive to a wide range of ultrasonic applications.

PWCC3D is also flexible for data processing as a software beamformer, where

volumetric images of different spatial resolution and/or at different regions can be

generated with the same acquired data. The software beamformer capability to-

gether with the flexibility of choosing different plane-wave angles, provide rich knobs

that can enable autonomous ultrasonic imaging devices. Such imaging device could

dynamically reconfigure the AFE and the beamformer, so that the data acquisition

and processing are performed with complexity that is suitable for the target scene.

System-level power saving and performance improvement can be optimized under this

frame work [7].

The cyst phantom simulation in Section 4.2.3 has shown how scaling brings im-

proved image resolution and contrast performance. Thanks to the Column-Row-

87

Parallel architecture, the 3D imaging system’s volume rate only decreases inversely

proportional toN rather thanN2, and the interconnection complexity of the front-end

is only as high as a 1D array in a 2D imaging front-end.

The Column-Row-Parallel architecture and PWCC algorithm can also be applied

onto a 2D array with the size of NxM, in which M is smaller than N (for example,

64x4). This type of “narrow” 2D array is sometimes called a 1.5D array, in that the

size usually scales at the N (azimuth) dimension while the M (elevation) dimension

is somewhat fixed. By operating the array row-by-row (each row is N elements), it

only takes M transmit-receive repetitions to collect data for one plane-wave angle.

Equations (4.8) and (4.9) can be revised to (4.10) and (4.11).

Acquisition T ime =M × (p+ q)

PRF∝ Constant, (4.10)

V olume Rate =PRF

M × (p+ q)∝ Constant. (4.11)

Because M is a relatively fixed number in size, the volume rate scaling becomes a

constant as the array size increases in the N dimension. This is the same scaling

trend for a 2D imaging system with a 1D array. The added array elements along

the N (azimuth) dimension contribute to the improved lateral resolution without

degrading the frame rate. Furthermore, the plane-wave coherent compounding along

the M (elevation) dimension effectively realizes the elevational beam-focusing, which

is traditionally implemented with a physical acoustic lens or electrical analog delay

lines on a 1D array.

4.3 Interleaved Checker Board Tx Apertures with

I&Q Excitations for HD2 Reduction in Ultra-

sonic Harmonic Imaging

Using the Column-Row-Parallel architecture, a new way to reduce Tx second har-

monic distortion (HD2) for ultrasonic tissue harmonic imaging (THI) is proposed.

88

It utilizes simultaneous I&Q excitations on two interleaved checker board Tx aper-

tures, in order to mitigate HD2 from both transducers and circuits with any arbitrary

pulse shapes. In particular, CMUT nonlinearity due to its electrostatic mechanism is

suppressed.

4.3.1 THI Principle and Previous Methods

Tissue harmonic imaging is a widely used imaging mode [18–22]. The ultrasound sys-

tem sends out bursts of ultrasound at the fundamental frequency. Human tissue or

contrast agents (micro-bubbles injected into human body) could have nonlinear reac-

tion to the ultrasonic wave. Specifically, when a sinusoidal pressure wave propagates

through the medium, the tissue or bubble would contract at the positive pressure

(the first half of the sine wave) and expand at the negative pressure (the second half

of the sine wave). The contraction and expansion cause slightly different propagation

speed for the ultrasonic wave, thus distorting the wave in an asymmetric way, which

generates weak second harmonic component.

Instead of tuning to the fundamental reflected ultrasonic echoes as in the con-

ventional ultrasound, THI mode looks for that weak second harmonic echo signal,

while filtering out the fundamental component. The benefit is that ultrasonic beam-

formation using the harmonic signal has a narrower beamwidth and lower side-lobes,

THI gains improved spatial resolution for better visualization, and improved contrast

resolution for better demonstration of subtle differences.

However, the harmonic signal also tends to be weaker for mainly two reasons.

First, the nonlinear generation of the second harmonic from the tissue is not strong

to begin with. Contrast agents such as micro-bubbles can be injected into human

body to increase the harmonic generation, but it is still weak compared to the fun-

damental component. Second, the tissue medium presents a frequency-dependent

attenuation for ultrasound propagation, ultrasonic wave at a higher frequency sees

more attenuation during the propagation [12]. Empirically the attenuation coefficient

is about 1-2dB/MHz/cm. If a 5MHz fundamental signal is used and a 10MHz second

harmonic component is generated, the propagation attenuation per centimeter for the

89

second harmonic is 5-10dB more than the fundamental. As a result, the fundamental

signal needs to be filtered or suppressed at the receive side, while at the transmit

side, the second harmonic generation from the transducer needs to be kept at mini-

mum (< −30dBc), so that only the harmonic signal produced by the human body is

received in the end.

Compared to traditional PZT transducers, CMUT is at a disadvantage in THI

mode because of the nonlinear transmit property from its electrostatic actuation

mechanism [19,20], where the actuation force (hence the generated acoustic pressure)

is proportional to square of the electrical pulse excitation V (t). Excessive HD2 is

generated during transmit, making CMUT difficult to be used for harmonic imaging.

Previously, methods to reduce the transmit HD2 generation in CMUT have been

explored. For example, work in [20] focused on pre-distorting the electrical excitation

signal’s pulse shape, such that the frequency content of the actual transmitted acoustic

pulse is HD2 free. The method is heavily dependent on detailed CMUT transmit

properties and its bias voltage, requiring complicated and frequent calibration. Sub-

harmonic driving is also tried in [19], but because CMUT has a DC bias voltage,

the emitted acoustic pulse still contains the sub-harmonic frequency content which

becomes an additional interference.

Instead of working on individual elements, [21,22,72] try to cancel the harmonics

at the transducer-level. In [21, 22], a technique called second harmonic inversion is

used. Pulse shape of I(t) is first transmitted. On the next repetition, a delayed pulse

shape Q(t) is used to transmit again; Q(t) is a quarter cycle delayed with respect to

I(t). At fundamental frequency, I(t) and Q(t) are out of phase by π/2; while at second

harmonic frequency, the components from I(t) and Q(t) have a phase difference of

π. As a result, the HD2 from transmitter can be cancelled by synthetically adding

two consecutive received echoes. The scheme is clever, but its drawback is that the

synthetic combining reduces the effective PRF of the system to half, and that motion

artifact in the system could lead to leakage in cancellation.

The work in [72] tries to cancel the Tx HD2 in one shot on a 1D array for 2D

imaging. Simulation has been performed, but not real measurements. The elements in

90

Ro

w S

ele

ct

Lo

gic

Column Select Logic

I I IR

ow

Sel

ect

Lo

gic

Column Select Logic

QQ Q

Per-element Bit: Bank1

Per-element Bit: Bank2

Ro

w S

elec

t L

og

ic

Column Select Logic

I/Q

Interleaved Checker Board Tx

Aperture

I/Q

Figure 4-29: Implementation of checker board Tx aperture on the proposed architec-ture.

a 1D array is arranged in two groups. Each group contains every other elements from

the array and elements from two groups interleave with each other. The two groups

are driven by I(t) and Q(t) pulses respectively. Because I(t) and Q(t) pulse emissions

happen at the same time, the resulting acoustic pressure field is a linear superposition

of the two groups, in which the second harmonic component is suppressed. This

method is not subject to motion from either the transducer or the scene. However,

care needs to be taken for the grating lobes. This is because the two neighboring

elements have to be driven with the correlated pulses, the effective pitch of the 1D

array becomes twice as big as its physical element pitch.

4.3.2 Tx HD2 Suppression on the Column-Row-Parallel Ar-

chitecture

Extending the interleaved configuration into a 2D array, the Tx HD2 cancellation can

be done for 3D imaging. In Figure 4-29, two banks of Tx per-element enable bits4,

Bank1 in red and Bank2 in yellow, are pre-programmed into checker board patterns.

The elements of the two banks interleave with each other. The pulser gate drivers at

the column side are time-multiplexed to drive both Bank1 and Bank2 with I(t) and

4The functionality of the per-element enable bits is mentioned in Section 3.3 and will be describedin detail in Section 5.1.

91

Q(t) simultaneously, which are out of phase by a quarter pulse cycle (see Equation

(4.12)). In the mid- to far-field region, the ultrasound pressure from the two banks

can cancel in second harmonic using the I(t) and Q(t) driving scheme.

Q (t) = I(t− T

4

). (4.12)

This I&Q combination on the two interleaved checker board Tx apertures for

HD2 reduction is a broadband technique that works for any arbitrary pulse shape. A

brief mathematical explanation can show the reason. The arbitrary pulse shape I(t)

with a period of T can be represented by its Fourier series in Equation (4.13), where

V0, V1, V2, ... are its Fourier coefficients and w = 2π/T :

I (t) = V0 + V1ejwt + V2e

j2wt + V3ej3wt + ... (4.13)

The delayed version pulse shape Q(t) is represented by:

Q (t) = I(t− T

4

)= V0 + V1e

jwt−jπ/2 + V2ej2wt−jπ + V3e

j3wt−j3π/2 + ... (4.14)

The pulse shape is provided electrically, and goes through an electrical to me-

chanical transduction. The process is modelled as a combination of both linear and

nonlinear processes in Equation (4.15). Because only second harmonic is of concern in

ultrasound systems, up to second-order nonlinearity is modelled for the investigation.

pI (t) = a+ b · I (t) + c · I(t)2

pQ (t) = a+ b ·Q (t) + c ·Q(t)2(4.15)

Looking at the emitted pressure signals pI(t) and pQ(t), Equation (4.16) shows

only their second harmonic component:

pI (t) |HD2 =(b · V2 + c · V12 + 2c · V0V2

)· ej2wt

pQ (t) |HD2 =(b · V2 + c · V12 + 2c · V0V2

)· ej2wt−jπ = −pI (t) |HD2

(4.16)

92

Equation (4.16) indicates that the second harmonic component generated from

I(t) and Q(t) excitations are out of phase by π, and it holds for any pulse shape5.

The fundamental component of pI(t) and pQ(t) are out of phase by π/2, therefore

the combined fundamental intensity is 3dB lower compared to a single full-aperture

excitation. Furthermore, because the nonlinear model is a general model, not only

CMUT nonlinearity, but other sources of nonlinearity can be cancelled using this

method too. For example, circuit mismatches tend to introduce asymmetry in pulse

shape between the rising and falling edges. The simultaneous I&Q excitations on the

interleaved checker board apertures can still be effective in improving the HD2 caused

by the circuit non-ideality.

In the end, the checker board patterns require that the element pitch be smaller

or approximately equal to the ultrasound wavelength (λ = c/f), so that the grating

lobes are kept at minimum and the HD2 cancellation in space is close to perfect.

4.3.3 Experimental Results

Both simulation and measurement are carried out to verify that the combination of

I&Q excitations cancels acoustic HD2 while the “useful” fundamental intensity is only

3dB less than conventional full-aperture excitation. The simulation assumes a 10x10

array with a pitch of 250µm. 20 cycles of 4.2MHz pulses are used as the stimulating

pulse shape, which go through a nonlinear transform modelled by Equation (4.15).

The pulse shape is 3-level with a peak-to-peak amplitude of 30Vpp, in order to mimic

the real measurement. Other pulse shapes, such as 2-level pulses or sinusoid with

Gaussian envelope, or different number of pulse cycles (between 2 to 20), are also

tried to verify that the cancellation works for arbitrary pulse shape. For conven-

tional excitation, all elements are driven with the same pulse shape I(t). For I&Q

method, the two interleaved banks of elements are driven by I(t) and the delayed

Q(t), respectively.

Figure 4-30 shows the Field II simulation of the I&Q method compared to con-

5It is interesting to mention that not only second harmonic, but 6th, 10th, 14th, etc. ((2 + 4 ·k)th, k = 0, 1, 2, ...) are also out of phase by integer multiples of π in I(t) and Q(t) excitations.

93

X(mm) Y(mm)

Z(m

m)

Spatial Pressure Field

X(mm) Y(mm)

Z(m

m)


X(mm) Y(mm)

Z(m

m)


X(mm) Y(mm)

Z(m

m)


Conventional I&Q

f0

(funda-mental)

2*f0

(HD2)

(a) (b)

(c) (d)

Figure 4-30: Simulation comparison between the conventional and I&Q methods: (a)fundamental component spatial intensity for conventional; (b) fundamental compo-nent spatial intensity for I&Q; (c) HD2 spatial intensity for conventional; (d) HD2spatial intensity for I&Q.

ventional excitation. The top two sub-figures (a) and (b) compare the fundamental

component of the emitted pressure field, in which the conventional method produces

a field with 3dB higher intensity than I&Q. The bottom two sub-figures (c) and (d)

compare the HD2 component of the emitted pressure field, which clearly shows a

large suppression from I&Q method. The results of two spatial locations are listed in

Table 4.1, indicating a 20dB reduction in HD2 from I&Q method.

Acoustic measurements are also performed to verify the proposed method. Due

to the fact that there are a few non-functional CMUT elements in the 16x16 array, a

94

I&Q vs. Conventional (Simulation) HD2 Reduction Fundamental Loss“A” (0, 0, 30.3)mm -19.7dB

-3.0dB (the whole space)“B” (0, 0, 10.2)mm -19.7dB

Table 4.1: Simulated HD2 improvement of the I&Q method.

I&Q vs. Conventional (Measurement) HD2 Reduction Fundamental Loss“A” (0, 0, 30.3)mm -21.7dB -3.4dB“B” (0, 0, 10.2)mm -22.1dB -3.2dB

Table 4.2: Measured HD2 improvement of the I&Q method.

10x10 sub-array is chosen to carry out the comparison6. The ASIC Tx channels are

programmed to excite CMUT with either I&Q or conventional full-aperture schemes,

using 3-level 30Vpp pulses7. Mounting a hydrophone on the 3D translation stage,

the emitted ultrasonic pressure wave is detected and shown on a oscilloscope. An

FFT shows the frequency content. At a given far-field spatial location (i.e. where

the hydrophone tip is located), the pressure intensity generated from I&Q and the

conventional excitations are compared. The measured results at the same spatial

locations as in simulation (Table 4.1) are summarized in Table 4.2, which confirms

that the I&Q method has 3dB less fundamental component and over 20dB less second

harmonic component, similar to simulation results shown in Figure 4-30 and Table

4.1. Moreover, theory predicts cancellation of all (2+4 ·k)th, k = 0, 1, 2, ... harmonics.

In our measurement, the reductions in the 6th and 10th components are observed on

the oscilloscope, while the 14th harmonic is too weak to see.

To sum up, the I&Q method could be used to reduce the second harmonic gener-

ation in Tx for a 3D imaging system. The method works for arbitrary pulse shapes

and works equally well for nonlinearity generated from both transducer and circuit.

In particular, it mitigates the nonlinear problem in CMUT with its electrostatic ac-

tuation, and it could suppress the harmonic from the pulser’s rising and falling edge

asymmetry.

6More details about the non-functional elements and the fault-tolerant circuit design can be foundin Section 5.5.

7Different pulse shapes are also tried to verify that the method works for arbitrary pulse shapes.

95

4.4 Annular Ring Apertures for Forward-looking

Imaging Applications

Forward-looking ultrasonic imaging systems can be used for intravascular (within the

blood vessel) and intracardiac (within the heart) visualizations. The miniaturized

imaging system is mounted onto the tip of a catheter, which provides minimally inva-

sive diagnosis, interventions or treatments in medical procedures [73–76]. Currently

the more commonly used ultrasound systems for intravascular ultrasound (IVUS) and

intracardiac echocardiography (ICE) are side-looking ones, while forward-looking ones

are gaining more popularity because they offers complimentary information.

Annular ring apertures are suitable to realize forward-looking imaging. Although

dedicated annular ring arrays are available by custom fabrication [75, 76], a general-

purpose 2D array with the proposed Column-Row-Parallel architecture can achieve

similar results [77, 78]. The full 2D array provides even more flexibility, since more

rings can be formed within the regular 2D aperture.

4.4.1 Annular Ring Apertures on Column-Row-Parallel Ar-

chitecture

As already been shown in Chapter 3, the 2D array with the Column-Row-Parallel

architecture can form a circular aperture or an annular ring aperture by programming

the per-element bits under each element. For annular ring imaging, a circular Tx

aperture is used for transmit and four concentric annular rings with different diameters

can be activated as Rx apertures, shown in Figure 4-31(a). The Tx elements are

supplied with the same delay value “D” as in Figure 4-31(b), so that the whole

circular aperture is driven in-phase and emits a broad ultrasound beam. The Rx

elements’ analog outputs are also combined in parallel along the column, and by

digitally summing the weighted waveforms from all column buffers, one echo waveform

will be collected for each annular ring (Figure 4-31(c)-(f)). The weight for each

column is the number of active elements along the column. Equation (4.17) describes

96

Ro

w S

elec

t L

og

ic

Column Select Logic

D D D

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

(a) (b)

(c) (d)

(e) (f)

D Digital Waveform

Tx Path Rx Path

s1,0(t) s1,15(t) s2,2(t) s2,13(t)

s3,4(t) s3,11(t) s4,6(t) s4,9(t)

Weighted Digital Summation

S1(t)


S2(t)


S3(t)


S4(t)

Figure 4-31: Annular ring mode imaging implemented in Column-Row-Parallel archi-tecture: (a) Tx and Rx aperture setup; (b) Tx aperture implemented in the proposedarchitecture, all active elements are driven in-phase; (c) Rx aperture with the biggestring shape, all active elements’ analog outputs are combined; (d) Rx aperture withthe 2nd ring shape; (e) Rx aperture with the 3rd ring shape; (f) Rx aperture withthe smallest ring shape.

97

DS1(t)

Tx RxD

S2(t)D

S3(t)D

S4(t)

Dynamic Beamforming:

R1

τ1(z)

Tx Rx Tx Rx Tx Rx

z

R2 R3 R4

z z z

z

τ2(z) τ3(z) τ4(z)

Beamformed Axial Line

Figure 4-32: Annular ring mode dynamic beam-formation scheme.

the function of the weighted digital summation block:

Sm (t) =15∑k=0

nk · sm,k (t), m = 1, 2, 3, 4. (4.17)

Take the smallest ring in Figure 4-31(f) as an example, number of active ele-

ments for columns s6(t) ∼ s9(t) are 4, 2, 2, 4, respectively. Therefore the weighted

summation should be:

S4 (t) = 4× s4,6 (t) + 2× s4,7 (t) + 2× s4,8 (t) + 4× s4,9 (t) . (4.18)

The four Rx annular rings are activated over four consecutive Tx transmits as

shown in Figure 4-32. The digital waveforms from the four Rx rings can then be

dynamically beamformed to generate a synthetic A-scan line along the axial axis of

the rings. Because all elements on the same ring have the same time-of-flight to a

point on the axial axis, each ring has a natural focus effect along the axial axis. The

delay value for a spatial point located at depth z away from the transducer surface,

for the ring with a radius of Rm, is calculated as:

τm (z) =√z2 +Rm

2/c, m = 1, 2, 3, 4. (4.19)

98

The beamformed image line along the axial axis thus becomes:

SBF (z, t) =4∑

m=1

Sm (t− τm (z)). (4.20)

The circular and ring apertures are translated horizontally, so that different axial

A-scan lines can be collected to form volumetric images. Examples of the translated

Tx and Rx apertures are shown in Figure 4-33, some edge effect will affect the scan

line intensity slightly, but not significantly.

4.4.2 Annular Ring Imaging Results

The forward-looking programmable annular ring array can form volumetric images

by moving the circular Tx and annular Rx apertures in the 2D array, so that multiple

axial lines can be acquired. Both simulation and measurement of a wire phantom

are performed, similar to the PWCC3D experiments. The wire phantom is 0.48mm

in diameter and is placed at 10.5mm away from the transducer surface, horizontal

to the surface. Transmit pulsation is 2 bursts of 8.33MHz pulses. The volumetric

images are displayed at 20dB dynamic range. Totally 81 circular Tx apertures are

swept through the 2D array, acquiring data for 81 axial lines. With 4 beamforming

annular rings at each Tx aperture location, totally 324 transmit-receive repetitions

are needed to acquire a full set of volumetric data. Similar to PWCC3D equations

(4.8) and (4.9), the acquisition time and volume rate for the annular ring imaging

system can be calculated in (4.21) and (4.22).

Acquisition T ime =(# Axial Lines)× (# Annular Rings)

PRF, (4.21)

V olume Rate =PRF

(# Axial Lines)× (# Annular Rings). (4.22)

The acquisition time again scales linearly with respect to the number of axial lines in

the volumetric image, or number of annular rings used for beam-formation.

The volumetric images from simulation and measurement are shown in Figure

4-34. The cross-sectional images display a clear wire in the space and at the same

99

Ro

w S

elec

t L

og

ic

Column Select Logic

D D D

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

Ro

w S

elec

t L

og

ic

Column Select Logic

(a) (b)

(c) (d)

(e) (f)

Digital Waveform

Tx Path Rx Path

s1,0(t) s1,11(t) s2,0(t) s2,9(t)

s3,0(t) s3,7(t) s4,2(t) s4,5(t)


S1(t)


S2(t)


S3(t)


S4(t)

D

Figure 4-33: Annular ring configuration example, off-center: (a) Tx and Rx aperturesetup; (b) Tx aperture implemented in the proposed architecture; (c) Rx aperturewith the biggest ring shape; (d) Rx aperture with the 2nd ring shape; (e) Rx aperturewith the 3rd ring shape; (f) Rx aperture with the smallest ring shape.

100

time they provide the evaluation for the performance. The measured -10dB lateral

resolution (from XZ slice, Figure 4-34(b)) is 1.19mm and the -10dB axial resolution

(from YZ slice, Figure 4-34(d)) is 0.32mm. Both numbers are close to the perfor-

mance measured from PWCC3D images with single-angle plane-wave insonification

in Section 4.2. Using a 10kHz PRF, the volume rate is 30.9 volume/s, according to

(4.22).

Lastly, this section aims to demonstrate the capability of the Column-Row-Parallel

architecture in forward-looking ultrasonic imaging applications. The 16x16 array size

limits the range covered in the space. As the manufacturing technology improves, a

bigger array could lead to a much better image quality. Furthermore, the number of

annular rings can also be increased, but at the cost of a proportionally lower volume

rate (the volume rate is linearly reduced with respect to the increase of the number

of annular rings used).

4.5 Summary

This chapter has presented several 3D medical ultrasonic imaging applications for

the Column-Row-Parallel ASIC architecture. The 16x16 CMUT-PCB-ASIC imag-

ing front-end is assembled to demonstrate the 3D imaging algorithms such as the

plane-wave coherent compounding and the annular ring aperture imaging. The same

architecture can be programmed to implement different imaging algorithms. Both

schemes are suitable for high volume rate imaging with decent quality. Moreover,

the architecture enables an interleaved checker board pattern with I&Q excitations

for Tx HD2 reduction. The scheme is promising to improve the intrinsic nonlinear

property of CMUT, facilitating the ultrasonic harmonic imaging mode.

101

(a) (b)

(c) (d)

Simulation

X(mm)

Z(m

m)

X(mm)

Z(m

m)

Measurement

(e) (f)

Wire Phantom


Wire Phantom


Simulation

Y(mm)

Z(m

m)

Y(mm)

Z(m

m)

Measurement

Simulation

X(mm)

Y(m

m)

X(mm)

Y(m

m)

Measurement

Wire Phantom


Figure 4-34: Cross-section slices of the wire phantom 3D images from simulation andmeasurement: (a) simulated XZ slice; (b) measured XZ slice; (c) simulated YZ slice;(d) measured YZ slice; (e) simulated XY slice; (f) measured XY slice.

102

Chapter 5

Design of the 16x16 Ultrasonic

Transceiver Array ASIC with

Column-Row-Parallel Architecture

The transistor-level design of the 16x16 ultrasonic transceiver ASIC is described in

this chapter. It follows the high-level description in Chapter 3, and adds implemen-

tation details to Chapter 4.

The block-level circuit design [5, 6] is optimized to interface to CMUT transduc-

ers. However, the architecture-level design is a flexible and scalable analog front-end

solution for 2D ultrasonic arrays in general, applicable to different technologies, such

as CMUTs, PMUTs, and bulk PZTs. This chapter will cover both architecture-level

and block-level circuit designs.

5.1 High-Level Description of the Ultrasonic Imag-

ing Transceiver Circuits and the Architecture

Logic Implementation

This section describes the digital implementation of the Column-Row-Parallel archi-

tecture. The design aims to realize the rich functionality as presented in Chapter 3

103

and 4, while achieving linear scaling for the programming time. The control logic

attains a proper separation of functionality, such that the control from the sides is

more often used to take advantage of its fast programming time, while the control

within each element provides more diverse system functionality.

The overview of the proposed Column-Row-Parallel architecture has been given

in Section 3.3. Figure 3-4 shows the array structure and Figure 3-5 shows the per-

element circuit block diagram. In addition to these main blocks, more circuit details

will be discussed here. For convenience, the block diagram of one transceiver channel

in the 2D array presented in Figure 3-5 is shown again in Figure 5-1.

Each CMUT element in the 2D array is DC-biased with the shared RC network

provided off-chip. Resistor Rb and capacitor Cb filter out noise from the high voltage

supply and provide an AC ground for the transducer. The DC bias voltage V BIAS

applied on the CMUT is between 20-50V. The transceiver channel includes a 30Vpp

high voltage pulser in the transmit (Tx) path, which drives the ultrasonic transducer

to emit acoustic energy. The emitted ultrasonic wave travels through the medium

and is reflected whenever it hits medium boundaries with mechanical impedance

mismatch. The reflected echoes are transformed by the CMUT element into a weak

electrical signal. A low noise amplifier (LNA) in the receive (Rx) path amplifies

the weak signal to the output. During transmission, the Rx switch is turned off

to prevent the high voltage transients from breaking the LNA implemented with low

voltage transistors. The CMUT device used in this work has a pass-band of 3-10MHz.

The frequency range, power consumption, noise, and linearity performance of the Tx

and Rx circuits are designed and optimized for this CMUT device’s parameters.

After collecting multiple channels’ outputs from one or several transmissions, ul-

trasonic images can be generated, as been shown in Chapter 4. Medical ultrasound

systems use beam-formation to improve the image quality. Tx beam-formation is re-

alized by controlling and applying different delays across the Tx channels. Similarly,

the received signals are digitized and processed by an off-chip Rx beamformer.

The digital control inside each transceiver channel has been described in Section

3.3. The combination of Row Select Signals, Column Select Signals and Per −

104

Transceiver[ i, j ]

ColumnGate Driver[ i ]

Tr[ j ]

Rr[ j ]

Tc[ i ] Rc[ i ]

T

R

Column BUF[ i ]

RowBUF[ j ]

RowGate Driver[ j ]

T_en R_en

T_enTc[ i ]

T_enTr[ j ]

R_enRc[ i ]+Rr[ j ]

R_enRc[ i ]

R_enRr[ j ]

b

b

Figure 5-1: A re-plot of Figure 3-5 in Section 3.3. (a) The block-level implementationof one transceiver channel and (b) the per-element logic implementation. Columnand row select logic is implemented with shift registers that can be reprogrammed in“N” time (implementation detail will be shown in Figure 5-2).

105

P_bankSel

T_en0

1R_en

T_en1 T_en2 R_en2R_en1

0

1

T_bank1

<0>

<1>

<15>

<0>

<1>

<15>

T_bankSel

01

T[0:15] Tmode

0

Tr[0:15]

0 Tc[0:15]

T_bank2

01

(0: row-para)(1: col-para)

01

01

01

R_bank1

<0>

<1>

<15>

<0>

<1>

<15>

R_bankSel

01

R[0:15] Rmode

0

Rr[0:15]

0 Rc[0:15]

R_bank2

01

(0: row-para)(1: col-para)

01

01

01

(a)

(b)

(c)

Figure 5-2: Circuit implementation for the logic control: (a) multiplexing for per-element enable bits; (b) Tx row / column selection logic; (c) Rx row / column selectionlogic.

element Enable Signals determine whether the channel is connected to column side,

row side, or turned off. The per-element logic implementation has been shown in

Figure 5-1(b). The transmitter and receiver are controlled independently and they

are time-multiplexed during normal operations.

Both control signals on the sides and the per-element controls are implemented

with shift registers (SR’s), which can be programmed serially. Each set of control

is realized with two SR banks multiplexed, as shown in Figure 5-2. The P bankSel

(Figure 5-2(a)), T bankSel (Figure 5-2(b)), and R bankSel (Figure 5-2(c)) control

the SR bank selection for per-element enable bits (T en and R en), Tx row / column

selection (T [0 : 15]) and Rx row / column selection (R[0 : 15]) respectively. There are

several benefits associated with this implementation. First, while one bank is being

programmed, the other bank is in use to avoid interrupt of operations. Second, the

two banks can be pre-programmed, and by quickly switching between the two banks

using the bankSel signal, they can take turns to control the ASIC.

Additionally, for Tx and Rx side controls in Figure 5-2(b) and (c), they are further

106

divided intoRow Select Signals (Tr[0 : 15] andRr[0 : 15]) and Column Select Signals

(Tc[0 : 15] and Rc[0 : 15]). Only one side can be active at any given time. Taking

Tx control as an example, the multiplexed SR outputs T [0 : 15] are forked and gated

by a pair of multiplexers controlled by Tmode to generate Tx Row Select Signals

(Tr[0 : 15]) and Tx Column Select Signals (Tc[0 : 15]), the logic ensures that while

one side is activated, the other will remain all “0”.

As been briefly mentioned previously, this partition of controls provide flexibility

and scalability. The side controls are programmed within “N” time, which can be

easily reprogrammed between consecutive ultrasound transmits. The example use

of this control style is the row-by-row receive implemented in PWCC3D algorithms

in Section 4.2. The per-element controls provide maximum flexibility despite the

fact that they are programmed with a longer time (“N2”), as they are snake-chained

through all 2D array elements. They can be less often changed to provide mode

switch. Examples include the annular ring imaging experiments in Section 4.4, and

the selective disabling of non-functional elements as will be described in Section 5.5,

which is a critical fault-tolerant feature for analog front-end circuits working with

MEMS devices. Lastly, alternating two pre-programmed SR banks could realize the

fast swapping between two ultrasound aperture patterns. The simultaneous I&Q ex-

citations to the interleaved checker board patterns implemented by switching between

two per-element SR banks in Section 4.3 is a perfect example.

5.2 Tx Circuit Design

This section describes the design of the transmitter path. The block-level transmit-

ter circuit design will be introduced first [5, 6], which is optimized to drive CMUT

elements. A multi-level pulse shaping technique with charge recycling is proposed to

boost the efficiency of the transmitter with a CMUT element load. The design is

highly scalable and compact, requiring minimum off-chip components. Next, design

issues for making a 2D array of transmitters will be described, which is more general

and applies to many other types of ultrasonic transducers. High voltage pass-gate

107

transistors implement multiplexers, which realizes the programmable column / row

addressing, and handles the parallelism of multiple transmit elements.

5.2.1 Multi-Level Pulsing for Efficient CMUT Driver

For the transmitter, high voltage linear amplifiers are commonly used to drive the

PZT loads to achieve good linearity and acceptable efficiency [79, 80]. To drive a

CMUT load, however, linear amplifiers are not optimum. In addition to the ampli-

fier power consumption, a considerable power loss becomes associated with charging

and discharging the parasitic capacitance of the CMUT element [41], degrading the

overall power efficiency of the transmitter stage. Furthermore, the linearity of the

amplifier does not translate to good linearity performance of the transmitter stage,

because the CMUT element distorts the amplifier’s output waveform through the

nonlinear relationship between the electrical input signal and electrostatic force act-

ing on the element’s membrane [20]. Resonant transmitters with inductors to cancel

out the loading capacitance could boost the power efficiency [27]. However, bulky

off-chip inductors of several micro-Henries are needed for every transmitting chan-

nel, to work with typical loads of 10-200pF per channel at the ultrasound operating

frequency range of 1-20MHz [81–83], which is undesirable for compact integration.

Alternatively, the multi-level pulsing technique, which was initially introduced for

chip-to-chip interconnects [84], can be applied to reduce the power consumption on

the capacitive load. Multi-level techniques have been used in PZT ultrasound drivers

for pulse-shaping and harmonic suppression [81–83,85]. However, the power efficiency

was not improved because charge recycling was not implemented between the multi-

ple voltage levels. This section presents the advantage of the multi-level pulsing with

charge recycling to improve the combined power efficiency of the CMUT transducer

and transmitter. It also requires the least off-chip components, as will be seen in

Section 5.2.2.

The transmitter load model of a CMUT element is represented by a capacitor

and resistor in parallel, as shown in Figure 5-3(a). The capacitor C is the parallel-

plate capacitance between the CMUT element’s membrane and the common node.

108

p

1 pk

v(t)

1 pk

v(t)

b

b

p

Figure 5-3: (a) The transmitter load model of a CMUT element used in this work.(b) An exemplary 2-level square wave pulse applied onto CMUT. (c) An exemplary3-level pulse applied onto CMUT.

The resistor R is the medium’s mechanical load at the CMUT surface, transformed

to the electrical port [41]. The power dissipated by R, due to the electrical pulse’s

fundamental frequency component, models the useful acoustic power delivered into

the medium. The power dissipated while charging and discharging C (dynamic power)

does not contribute to the acoustic output and thus is wasted.

The CMUT transducer used in this work is a 16x16 2D array. Each CMUT element

has a size of 250µm× 250µm and is modelled as 2pF ||1MΩ [41]. The Tx efficiency is

defined as the ratio between the useful acoustic power and the total power dissipated.

It models the combined efficiency of CMUT and the ultrasonic pulser together, by

capturing both the power loss in the pulser circuitry and the dynamic power dissipated

by charging and discharging the CMUT parasitic capacitance.

To show how multi-level pulse-shaping increases Tx efficiency, first assume the

conventional 2-level square wave pulses are used to drive a 2pF ||1MΩ load, as shown

in Figure 5-3(b). The pulse magnitude is 30Vpp at a frequency of 3.3MHz. The

109

amplitude of the fundamental frequency component is the Fourier series of the periodic

pulse shape, as described in (5.1):

V1 =2

Tp

∫ Tp

0v (t) · sin

(2π

Tp· t)dt. (5.1)

The amplitude V1 is calculated to be 19.1V, or 13.5Vrms. Therefore, the power dis-

sipated on the 1MΩ resistor, i.e., the transmitted ultrasonic power at fundamental

frequency, is 0.182mW. Meanwhile, the dynamic power wasted on charging and dis-

charging the capacitor C is calculated to be: CV 2f = 6mW.

An N-level pulser, using (N−1) regulated voltage sources to charge and discharge

the capacitor in a stepwise fashion, reduces the wasted dynamic power to CV 2f/(N−

1) [84]. The power saving comes from the charge recycling mechanism during the

discharge operation, which is enabled by the regulated voltage supplies1. Instead of

discarding all the capacitor charge CV to ground as in the square wave case, a charge

packet of CV/(N − 1) is recycled back to the power supply when the capacitor is

switched from one voltage level to the next lower one. As many as (N − 2) charge

packets of CV/(N − 1) are recycled until the last packet is dumped to ground. As a

result, the dynamic power is reduced by a factor of (N − 1). At the same time, the

magnitude of the fundamental component is only decreased slightly following (5.1),

leading to overall efficiency improvement. For example, Figure 5-3(c) shows 3-level

pulses with 20ns middle voltage level steps, out of a 300ns period. Its fundamental

frequency component amplitude is 18.7V, or 13.2Vrms. The useful power delivered

is 0.174mW and the dynamic power is CV 2f/2 = 3mW. A comparison to the square

wave example reveals theoretically a 49% total power saving with only a 4.4% acoustic

power reduction, or equivalently 88% more acoustic output power given the same total

power dissipation.

1Without regulated supplies which recycle charge, the dynamic power cannot be reduced evenwith multi-level pulsing, as is the case in [81–83].

110

30V

15V

Ψ1

Ψ1

Ψ2

Ψ2

M8

M7

M6

M5

0.1uF

M1

M2

M3

M4

0V

Shared DC-DC Converter

(off-chip capacitors)

0.1uF

1MΩ

VBIAS

3-level Waveform Generation & Tx Beamforming Control

0.1uF

HVDD=30V

CMUT

CMUT Bias Circuitry(off-chip)

3

2

1

4

Vo

Figure 5-4: Circuit schematic of the four-channel 3-level pulsers with the middle-voltage generation (all transistors are high voltage devices).

5.2.2 3-Level Pulser Circuit Design

The 3-level pulser is implemented as shown in Figure 5-4. The three pulse voltage

levels are 30V (HVDD), 15V and 0V (GND). The 15V middle voltage is generated

from a 2:1 parallel-series switched-capacitor DC-DC converter (M5-M8), which is

shared between channels. The only off-chip components are two 0.1µF capacitors.

Because of the charge recycling nature of the proposed 3-level pulser, and that the

CMUT load (roughly 2pF per channel) is much smaller than 0.1µF , the converter

can operate at a very low frequency (10-100Hz) to save power, consuming less than

1% of the total 256-channel pulsing power.

3-level pulse-shaping is implemented with four high voltage switches (M1-M4) in

each channel. NMOS M1 and M2 are used for the transitions of 15→0V and 0→15V

respectively, while PMOS M3 and M4 are used for the transitions of 30→15V and

15→30V respectively. The on-resistance of each transistor and the CMUT capacitance

form a RC time constant that determines pulse voltage level settling. The transistors

are sized wide enough to keep the RC time constant at around 3ns, so that the 10%

111

to 90% rise / fall time is 6.6ns. This is close to 1/20 of the pulse cycle typically

used (3-10MHz pulses with pulse cycles of 100-333ns) to make sure the 3-level pulse

shape is not excessively compromised by the settling edges. The relative timing

differences between each channel’s gate control signals is digitally adjustable and

effectively implements the Tx beamforming.

To reduce number of I/O ports needed for pulser gate control, a non-overlapping

2-to-4 line decoder is used for each channel with 2 lines of low voltage control in-

puts (Ain and Bin) supplied off-chip from a FPGA running at 100MHz. As shown

in Figure 5-5(a), the inputs first go through non-overlapping signal generation blocks

(implementation shown in Figure 5-5(b)) before being fed into the 2-to-4 decoder.

The non-overlapping block ensures that the generated low-voltage gate control sig-

nals (ϕ1(LV ) − ϕ4(LV )) have dead time between each other, such that the pulser

transistors (M1-M4 in Figure 5-4) are not on at the same time, dissipating unneces-

sary crowbar current. The non-overlapping dead time is 2-bit adjustable through the

variable length delay lines controlled by Delay[0 : 1] to provide enough adjustment

margin.

The low-voltage gate controls are further level-shifted by the cross-coupled level

shifters in Figure 5-5(c), which translate the low-swing signals into high-swing signals

that drive gates of the high voltage transistors in the pulser and the DC-DC converter.

The threshold voltage of M1 and M2 is low enough such that they can be completely

turned on by the 3.3V inverters. The level-shifted gate drive signals have a 30V voltage

swing, which is under the rated operation conditions of high voltage transistors in

this process. The typical set of 3-level pulser control signals and the resultant 3-level

pulse shape at the output V o, are shown in Figure 5-5(d). Because the low-voltage

signal swing is small, the digital control power is negligible compared to the pulser

power.

This design of multi-channel pulsers with a shared voltage converter can be ex-

tended easily to more Tx channels, without additional off-chip components. It could

also be revised to implement more voltage levels to achieve more dynamic power re-

duction. However, this requires the addition of more switches connected between the

112

φ4

φ1

φ3

Vo(pulser output)

φ2

AinA

Ab

BinB

Bb

A

Ab

B

Bb

Ab & Bb φ1(LV)Ab & B φ2(LV)A & B φ4(LV)A & Bb φ3(LV)

XinX

Xb

Non-overlap

Delay[0:1]

(a)

(b)

Non-overlap

Non-overlap

Ain

Bin

(d)

Non-overlapping2-to-4 Line Decoder

2-to-4 Decoder

30V

φx

(LV)(1x) (8x)

3.3V 3.3V

3.3V

φx

M1 M2

M3 M4

M5

M6

(8x)

(LV) (LV)

(LV)*M1~M6 are HV devices

Level-shifter (X4)

φ1φ2φ4φ3

Level-shifter

(To pulser gates)

(Off-chip controls)

(c)

Level-shifter

Figure 5-5: The digital control circuits for the pulser: (a) the signal flow and block di-agrams; (b) the non-overlapping signal generator; (c) the level shifter implementation;(d) the control signal timing diagram.

113

CMUT and the voltage levels. Due to the large drain capacitance of high voltage

switches, the self-loading effect takes away much of the power savings from introduc-

ing additional voltage levels. According to simulation results of the 0.18µm CMOS

process used in this work, a 3-level pulser dissipates 16% of total power to drive the

gate and drain capacitance of M5-M8 in Figure 5-4. For a 4-level pulser, the dynamic

power reduction is counteracted by the power increase to drive more and bigger tran-

sistors, leaving the overall efficiency roughly the same as a 3-level pulser. A 5-level

pulser incurs even more power penalty on driving the high voltage transistors and the

efficiency is lower than a 3-level pulser. Therefore, a 3-level pulser design is used in

this work.

5.2.3 Tx Path Design for 2D Ultrasonic Transducer Arrays

For the 2D ASIC implementation, a 2D grid of per-element 3-level 30Vpp pulse-

shaping pulsers are connected by column and row lines, additional circuitry is added

to support column-parallel and row-parallel modes.

Figure 5-6(a) shows the complete schematic of a pulser at the jth row and ith

column and its corresponding row and column gate drivers. Except M2 and M3, all

transistors’ bulk are connected to source. The bulk of M2 and M3 are connected to

0V and 30V respectively. The pass-gate multiplexers2 implemented in high voltage

transistors are added into the per-element pulser as shown in Figure 5-6(b). This is

to implement the functionality of Tr and Tc switches in Figure 5-1(a), so that the

pulser gates can be either driven by the row driver, the column driver, or none, in

which the gate is held at 0V for M1-M2 and at 30V for M3-M4.

An important issue in the 2D array design is the line parasitics. To account for the

line parasitics accurately, the line metal layout is extracted to obtain the estimated

lumped circuit model (Rp, Cp) as shown by the red circle in Figure 5-6(a). The pulser

is placed under each element to avoid the parasitics affecting the pulsing performance

as much as possible. This is because the line parasitics are only present as a load for

2All four pulser gates, M1-M4, have their MUX, but only M3 is shown in Figure 5-6(a) as anexample.

114

M1

M2

M3

M4

0.1uF

1MΩ

VBIAS

CMUT

0V

15V

30V

Pulser [ i, j ]

30V

M5 M6

M7 M8

M11

M12

Row Gate Driver for M3 [j]

MUX3

Rp=62Ω

Cp=25fF

Per-elementline parasitics

(1x layout width)

30V

To M3 Gate

MUX3

φr<j>

φc<i>

30V

M9

M10

INV1

INV2

INV3

D3,r[j] (2pF)

φ3,c[i]

Tc

Tr

φ3,r[j]

Column Gate Driver for M3 [i]

D3,c[i]

φ3

φ2

φ1

φ4

(Tc+Tr)

Tc

Tr

Figure 5-6: Tx design for the 2D array: (a) 2D pulser schematic; (b) MUX imple-mentation.

the gate drivers, rather than the pulser itself. In this way, when there are different

number of elements active along a column or a row line, the gate driver sees different

loads while each pulser always sees a constant CMUT load that is local. Therefore, the

design makes sure that the pulse’s shape and amplitude is preserved and invariant to

number of active elements. Meanwhile, the gate driver transistor sizing is optimized

to drive pulsers on the same column or row line with the presence of parasitic line

capacitance.

In current design, the gate drivers are sized for the heaviest driving load, which

corresponds to all 16 active pulser gates (about 90-100fF per gate capacitance), and

115

the line parasitics across the length of 4mm (16 × 250µm). The column / row line

layout is implemented with minimum width metal wire layout, giving an estimated

per-element (250µm length) line parasitic model: Rp = 62Ω, Cp = 25fF . The gate

driver power consumption takes up about 35% of the total power consumption in Tx.

However, in the future, the gate drivers can also be made programmable in driving

strength, so that it can adapt to the number of active elements to save power. With

the adaptive driving strength, the self-loading of the gate driver at the light load can

be reduced, and a constant pulser efficiency can be maintained.

5.3 Rx Circuit Design

This section describes the design of the receiver path. The block-level circuit opti-

mization to interface to CMUT elements is presented first [5, 6]. A transimpedance

amplifier (TIA) topology is utilized to improve the trade-offs between noise, band-

width, and power dissipation. Design optimizations for the 2D array will be described

next, which can be applied to general 2D ultrasonic transducers. A specially sized

source follower output stage is added to the LNA to implement receiver parallelization

for improved SNR.

5.3.1 LNA Optimization Methodology for CMUT

For the receiver, large input capacitance limits the bandwidth and tends to increase

the noise contribution from the input stage transistors, degrading the noise figure

(NF). Bulky off-chip inductors are needed to impedance match the source to a tra-

ditional PZT pre-amplifier that assumes a low-impedance source [27]. Charge-based

amplifiers were attempted for CMUTs. The continuous-time charge amplifier achieved

low noise and low power performance for CMUT working at kHz range [86], but the

large impedance from the DC-setting network limits the bandwidth for a CMUT ar-

ray working at MHz range for medical imaging applications. The switched-capacitor

charge integrating amplifier in [87] could provide enough bandwidth, but issues such

as clock feed-through and charge injection are difficult to mitigate for the inherently

116

Figure 5-7: Small signal model and noise sources of the CMUT element and the LNA.

single-ended CMUT signal path. Moreover, because the sampling clock switches at a

higher frequency than input signal bandwidth, the settling requirement for the op-amp

demands a higher bandwidth than what is needed in op-amps used as continuous-time

buffers, leading to much more power consumption. In this section, the TIA topology

is described to improve the trade-off between gain, bandwidth and noise, with an

inductor-less design at the presence of high input capacitance [40,45,88].

Figure 5-7 shows the small signal model of the CMUT and LNA. Figures 5-8 and

5-9 plot various circuit transfer functions to help analyze the optimization process for

the LNA. The closed-loop TIA gain is expressed as:

ZCL = Rf ·(

1

1 + sRfCf

)· F · AOL

1 + F · AOL, (5.2)

where F = Zi/(Zi+Zf ) is the feedback factor, and AOL is the op-amp open-loop gain.

From (5.2), the LNA DC transimpedance gain is Rf and its bandwidth is determined

by the smaller of the following two poles:

fp =1

2πRfCf, (5.3)

117

104

106

1080

20

40

60

80

100

Freq (Hz)

Gai

n (d

B)

AOL

1/FZ

CL

Gn

cp iz

Figure 5-8: Transfer functions when the LNA optimality condition is reached.

OL OL

i i p p

Figure 5-9: Transfer function examples when the LNA optimality condition of fi ≈ fpis not reached: (a) fi < fp, (b) fp < fi.

118

fi ≈√fc · fz =

√fc ·

1

2π (Rf ||Ri) (Ci + Cf ). (5.4)

fp in (5.3) is due to the RC time constant of the second multiplying term in (5.2).

fi in (5.4) comes from the third multiplying term in (5.2), which reaches -3dB when

F ·AOL = 1. Graphically, as can be seen in Figures 5-8 and 5-9, fi is the intersection

between 1/F and AOL curves, which is approximately the geometric mean of 1/F ’s

zero (fz) and the op-amp’s unity-gain frequency (fc), assuming a 20dB/dec slope in

both 1/F and AOL curves.

When fi < fp, as shown in Figure 5-9(a), an increase in Rf always improves

LNA’s gain-bandwidth product (GBP). This is because gain = Rf , while bandwidth

= fi, which is approximately proportional to 1/√Rf as indicated by (5.4). GBP

improves roughly proportionally with√Rf . However, because fp is proportional to

1/Rf as indicated by (5.3), the increase in Rf leads to faster decrease in value of fp

than fi. When fi ≈ fp, the LNA achieves maximum GBP available from the op-amp.

The phase margin is roughly 45o. Further increase in Rf no longer improves GBP,

because the bandwidth becomes limited by fp and is proportional to 1/Rf (Figure

5-9(b)), holding the GBP constant. But as Rf increases, the phase margin continues

to improve at the expense of a reduced bandwidth [88].

The optimality condition, fi ≈ fp, also minimizes noise contribution from the

op-amp input-referred voltage noise. Figure 5-7 shows all noise sources in the circuit.

The noise figure is expressed as:

NF = 1 +Ri

Rf

+V 2op

I2in · |Zi||Zf |2 +

I2op

I2in+

2 ·∣∣∣Vop · Iop∣∣∣

I2in · |Zi||Zf |. (5.5)

From (5.5), a large Rf is desired to reduce its thermal noise contribution. More-

over, the op-amp’s input-referred voltage noise (Vop) has a peaking effect due to the

impedance drop in |Zi| at higher frequencies. It can be mathematically seen from the

following noise gain expression (Gn), defined as the transfer function from LNA input

119

fi fp Rf

103

106

1090

20

40

60

80

Freq (Hz)10

310

610

90

20

40

60

80

Freq (Hz)

Gai

n (d

B)

fi fp Rf fi fp Rf

103

106

1090

20

40

60

80

Freq (Hz)

A

OL

1/FG

n

Figure 5-10: Transfer function examples: (a) fi < fp, (b) fi ≈ fp, (c) fi > fp.

(Vop) to the output (Vout):

Gn =∣∣∣∣ AOL1 + F · AOL

∣∣∣∣ =∣∣∣∣ 1

F||AOL

∣∣∣∣ ≈ min(∣∣∣∣ 1

F

∣∣∣∣ , |AOL|) . (5.6)

The dashed red curves in Figure 5-8 and Figure 5-10 show the graphical inter-

pretation of (5.6): Gn is the lower parts of 1/F and AOL curves, which has a con-

siderable peaking effect within the LNA bandwidth. By comparing the optimal and

non-optimal conditions in Figure 5-10, one can see that the condition fi ≈ fp mini-

mizes the noise peaking effect while exploiting the maximum possible GBP from the

op-amp design.

5.3.2 LNA Transistor-Level Implementation

Following the guidelines discussed in Section 5.3.1, the LNA optimization starts with

a 10MHz bandwidth target and the optimal condition: fi ≈ fp ≈ BW . Rf is maxi-

mized while keeping the corresponding Cf , estimated from (5.3), larger than parasitic

capacitances to maintain control over circuit stability. The unity-gain frequency fc

is estimated from (5.4) to set the op-amp design target. Further design adjustments

keep phase margin above 60o.

Figure 5-11 shows the LNA schematic. The input stage devices (M1, M2) are

biased at the boundary of strong and weak inversion, as shown in Figure 5-12(a),

to achieve high transconductance per unit current and low noise while minimizing

120

120fF

M1M2

M0

M4M3

M6

M5

Cc M8

M7

s3

120fF 120fF

175KΩ

120fF

68KΩ 175KΩ175KΩ175KΩ

Programmable Transimpedance Gain

s1

s2

s4

s5

s6

2pF

68KΩ

68KΩ20pF

RxSw

1.8V

M9

400μA 83μA25μA

Ms1 Ms2 Ms3 Ms4

Ci M10

vin

vip

vip

vb1

vb2

CMUTAC

Model

HV Rx Switch

out0

23Ω(1MHz), 1KΩ(10MHz)

Figure 5-11: The LNA schematic, implemented in the TIA topology. All transistorsare low voltage devices except the HV Rx Switch M10.

size and parasitic capacitance. The differential pair suppresses interference from the

power supplies, which is not possible with single-ended topologies [40, 45]. Circuit

simulation result in Figure 5-12(b) shows that the sizing of M1 and M2 is optimized

for the target CMUT parameter and that the noise figure is minimized to be below

10dB. The Miller compensation leg (M9, Cc) keeps the op-amp second pole well

beyond the closed-loop bandwidth for good phase margin. The source follower (M7,

M8) lowers the op-amp output impedance to enforce accurate feedback.

During high voltage transmissions, the high voltage Rx switch (M10) is opened

and the low voltage switches (s1-s6) are closed. The on-resistance of M10 directly

impacts LNA noise performance. Its size is chosen such that its noise contribution

is only a small portion of the input stage, and its parasitic drain and source capaci-

tance do not degrade phase margin and bandwidth. Switches s1-s6 put the op-amp

into sleep mode when they are closed, during which only the reference current re-

mains conducting for fast wake-up within 1µs. The sleep mode enables system-level

power saving opportunities. In addition, 4-step programmable transimpedance gain

121

(gm*ro) w.r.t. (ID/W)

Current Density (ID/W) [A/μm]10-8 10-7 10-6 10-5 10-4 10-3 10-2

Strong Inv.Weak Inv.

Input stage sizing

(a) (b)Input Stage Width [μm]

250 500 750 1000 1250 15009.75

0

10.0

10.5

11.0

11.5

12.0NF w.r.t. W

Input stage sizing

NF

[dB

](L↑)

(gm*r

o)

Figure 5-12: Design optimization for input stage transistors: (a) transistors are sizedat the boundary of strong and weak inversion; (b) transistor width is optimized forthe lowest noise figure.

is implemented to provide system-level flexibility.

5.3.3 Rx Path Design for 2D Ultrasonic Transducer Arrays

For Rx path in a 2D array, we want to achieve the same parallelism effect as in

the Tx path. Therefore, the LNA is modified such that when multiple Rx channels

are activated on the same column or row line, their analog outputs combine for an

increased SNR, where signals are averaged and noise is reduced. In this way, CMUT

elements are effectively parallelized to receive acoustic echoes to satisfy system-level

requirements. One example of its use is already presented in Section 4.4, where the

active Rx elements in the annular ring aperture are in parallel and the analog signals

are added along the columns.

To illustrate the principle of analog signal combining, Figure 5-13 shows the pro-

cess of combining two Rx channel outputs. In Figure 5-13(a), the input current signals

(is1, is2) and the input-referred current noise (in1, in2) from two CMUT elements are

amplified by the two TIAs. The outputs of the LNAs (implemented with TIA) are

modelled as the Thevenin’s equivalent circuits. Both the current signal and the cur-

rent noise are amplified by the transimpedance gain Z into voltage sources, in series

with a output resistance Ro. The output configuration is then converted to Norton’s

122

equivalent circuits as shown in Figure 5-13(b), to indicate the combination is done

in the current domain3. The current gain from the input to the output is expressed

as K = Z/Ro. Assuming the two channel parameters are perfectly matched and

ignoring the line parasitics for now, the combined LNA circuits are equivalent to the

circuit shown in Figure 5-13(c). The two output resistors are in parallel to form a

output resistance of Ro/2. The two current signals add up directly as in Equation

(5.7), while the two noise sources add up in power in Equation (5.8), since they are

uncorrelated noise sources.

is,output = K · (is1 + is2) . (5.7)

in,output = K ·√in1

2 + in22. (5.8)

Because the CMUT element size is the same and the LNAs are designed to be

matched, the input-referred noise power should be roughly equal (i2n1 = i2n2). More-

over, if the two receiving CMUT elements are close to each other in space, the two

CMUTs would see ultrasound echoes similar in amplitude and phase, leading to sim-

ilar input signals (is1 ≈ is2). The above assumptions lead to the output signal and

noise expressions in Equation (5.9) and (5.10). This translates to a 3dB improvement

in the output SNR when two Rx channels are in parallel compared to a single channel

output, as indicated in Equation (5.11). Naturally, more parallelism would lead to

further SNR improvement, and SNR improvement follows the trend of 10 log(N) dB,

in which N is number of channels in parallel. It is also summarized in the “Theory”

row in Table 5.1.

is,output = 2K · is1. (5.9)

in,output =√

2K · in1. (5.10)

SNR2x = 20 log

(is,outputin,output

)= 20 log

(2K · is1√2K · in1

)= 3+20 log

(is1in1

)= 3dB+SNR1x.

(5.11)

3Thevenin’s equivalent circuit in voltage domain will yield the same conclusion, but Norton’sequivalent is easier for explanation.

123

is1

(K*is1)

in1

is1 Z*is1in1

Z*in1

is2 Z*is2in2Z*in2

(K*in1)

K = Z/Ro

is2

(K*is2)

in2

(K*in2)

K*(is1+is2)K*√(in12+in22)

Figure 5-13: The signal and noise combining with two Rx channels in parallel: (a) twochannels on the same line, shown in Thevenin’s equivalent circuit at LNA outputs; (b)two channels on the same line, shown in Norton’s equivalent circuit at LNA outputs(c) two channels combined, showing the resultant signal and noise amplitudes.

124

In the implementation, line parasitics and component mismatches need to be taken

into consideration. The RC model of the line parasitics is shown in Figure 5-13(b),

and the LNA output stage must be specially designed to achieve the proper analog

signal combination. First of all, current mode combining should be used because it is

intrinsically robust against the parasitics and mismatch. The LNA output impedance

(Ro) must maintain a relatively large value compared to line parasitic resistance (Rp),

i.e. Rp << Ro, so that the circuit DC condition is less susceptible to mismatch and

parasitics, and the signal combining has less distortion. On the other hand, Ro must

not be too high either, because the line capacitance would limit bandwidth, due to

the time constant formed by Ro and the line capacitor (Cp).

As a result, the output resistance and the line parasitics need to be co-designed

to work together optimally. The line parasitics can be adjusted during design by

changing the metal layout wire width, and a source follower stage (M11-M12 in Figure

5-14) is proposed to provide a constant output impedance. First, because the source

follower stage is the last stage of the LNA, the linearity requirement determines its

biasing current. An estimated biasing current of 34µA is calculated based on the

needed worst case slew rate for a full-swing 10MHz output signal as in (5.12).

ID ≈ Islew = Cload · Vlinear · (2πf) = 0.9pF × 0.6V × (2π × 10MHz) = 34µA. (5.12)

The loading capacitance is estimated from the input stage capacitance of the suc-

ceeding row / column BUF amplifier (0.5pF) plus the 4mm line capacitance assuming

minimum layout width (0.4pF); the linear range of the output signal swing ampli-

tude is estimated based on the maximum possible voltage headroom; and the signal

frequency is the maximum 10MHz supported in the ASIC design. The initial bias-

ing current leads to roughly an output resistance of 2.2kΩ as in (5.13), assuming an

estimated 0.15V transistor over-drive voltage.

Ro ≈1

gm=VGS − VTH

2ID=

0.15V

2× 34µA= 2.2kΩ. (5.13)

Starting with this initial design, the row / column line width is swept to find a

125

solution that not only maintains the SNR improvement with Rx channel parallelism,

but also preserves a 10MHz bandwidth. At the same time, the output stage linearity

performance numbers, such as HD2, IMD3, and Po1dB, are re-examined as the par-

asitic loading from the line changes. If the linearity specs are not met, the transistor

sizing or the biasing current of the source follower stage are tweaked to satisfy the

design target. After several iterations of changes in the line width and output stage

design, the final optimal design has a 45µA biasing current and a 1.7kΩ LNA Ro. The

corresponding line width is chosen to be 10x minimum metal wire width, as shown

in Figure 5-14. The estimated per-element (250µm length) line parasitic model is:

Rp = 6.2Ω, Cp = 250fF . According to circuit simulation, the circuit maintains a

worst case 9.2MHz bandwidth when only the channel at the end of the line is acti-

vated, driving the whole 4mm line to reach the BUF amplifier. Except for the worst

case, most other configurations4 provide a bandwidth over 10MHz. Meanwhile, the

SNR improvement with 16x channel parallelism is 11.97dB, which is very close to the

ideal target of 12dB when there is no parasitics.

As a sanity check, the row / column line width is modified to see its effect on circuit

performance. When the line width is decreased by 5x (Rp = 31Ω and Cp = 50fF

per element), the larger line resistance reduces noise averaging of 16x parallelism to

11dB. When the line width is increased by 3x (Rp = 2.07Ω and Cp = 750fF per

element), the worst case channel bandwidth drops to 6.2MHz due to the increased

line capacitance.

The last step in design is to carry out the Monte Carlo simulation for verification

in the presence of device mismatches. Less than 2% DC disturbance is observed in

Monte Carlo simulations that include line parasitics, global process variations, and

local transistor mismatches. Finally, the ASIC measurement (more details are in

Chapter 6) verified the design functionality. The measured SNR improvement with

parallel channels is close to theory, as listed by Table 5.1. The discrepancy between

the measurement and the ideal case most likely comes from the fact that there exist

4Other configurations include: a single channel closer to the BUF, driving a shorter line with lessparasitics; several channels parallelized; etc.

126

120fF

M1M2

M0

M4M3

M6

M5

Cc M8

M7

s3 M12

M11 0.5pF

Row BUF[ j ]

Ro=1.7KΩ

CMUTAC

Model

120fF 120fF

175KΩ

120fF

out_r

out_c

68KΩ 175KΩ175KΩ175KΩ

Source FollowerOutput Stage

Programmable Transimpedance Gain

s1

s2

s4

s5

HV Rx Switch s6

Rp=6.2Ω

Cp=250fF

Per-elementline parasitic

(10x layout width, 250μm length)

LNA[ i, j ]

2pF

68KΩ

68KΩ20pF

RxSw

Rc

Rc

RrRr

1.8V

M9

400μA 83μA25μA

45μA

Ms1 Ms2 Ms3 Ms4

Ci M10

Mc1Mc2

Mr1

Mr2

0.5pFColumn BUF[ i ]

35/0.45

45/0.54 25/0.18

50/0.18

25/0.1850/0.18

vin

vip

vip out

vb1

vb2

vb3

Figure 5-14: The LNA schematic, implemented in the TIA topology. All transistorsare low voltage devices except the HV Rx Switch M10. “vip” node is also bufferedwith a source follower to output (not shown).

SNR improvement with parallelism 2x 4x 8x 16xTheory (dB) 3 6 9 12

Measured (dB) 2.41 5.41 8.20 10.86

Table 5.1: SNR improvement from Rx channel parallelism, theory prediction andmeasurement.

correlated noise sources, preventing the noise power to be averaged out.

Discussion on Scaling

As can be seen from Table 5.1, the measured channel SNR improvement deviates

from the theoretical expectation more as the channel parallelism increases. The

performance degradation is the result of the line parasitics and indicates that the

parallelism cannot be scaled up to infinite number of channels. In particular, it is

impossible to maintain a satisfactory bandwidth performance for the channel located

at the farthest end of the line, when the line length is excessively long.

However, several techniques can be proposed to mitigate the negative effect from

127

the line parasitics and improve the scaling to an even larger array, as described below.

• Increasing the source follower stage bias current and transistor sizing further

could lead to more than 16x parallel channels with the same performance. The

corresponding line width needs to be increased approximately proportionally

to keep Rp << Ro for current summing. Channel count increase in this way

will stop when self-loading condition for circuit bandwidth is reached. At that

point, Cp becomes the dominant load at the output, and the increase of Cp

completely offsets the reduction of Ro. Circuit simulation shows that at around

64x parallelism with a 40x minimum metal line width, self-loading is reached;

increasing output stage sizing and power consumption does not extend parallel

channels any more.

• The metal wire layout in current design is using only one layer of metal. Sev-

eral metal layers can be connected in parallel to yield a better line parasitics

model. For example, by using two metal layers in parallel to implement the

interconnecting column and row lines, Rp is reduced by 2x while Cp is increased

by a factor that is much less than 2x, because there are no coupling capacitance

between the two metal layers at the same potential. As a result, the channel

parallelism can be approximately extended further by close to 2x.

• The column or row lines can be interconnected from both ends to the column

or row buffers, effectively reducing the line parasitics. The worst case channel

in this scenario becomes the one at the center of a line, rather than the ones

at the two ends. Therefore, approximately another 2x more channels can be

placed in parallel with the same performance.

• Lastly, inserting intermediate buffering stages in the middle of interconnection

lines could extend the number of parallel channels even further, as shown in

Figure 5-15. Within each intermediate block, 16-64x channel outputs can be

combined in parallel by each channel’s source follower stage. The additional

line buffers inserted could attain parallelism with even more channels without

excessive bandwidth / linearity performance degradation.

128

Figure 5-15: Parallelism with even more Rx channels by utilizing intermediate linebuffers to preserve the circuit performance.

5.4 Biasing

The current biasing for the 2D ASIC is carefully designed to provide good matching

for channels across the array. Figure 5-16 shows the biasing scheme. An 8-bit DAC

is used to generate a gate voltage, which is applied onto a tunable PMOS transistor.

The PMOS Md0 is implemented by binary weighted PMOS transistors in parallel

to provide 8-bit tunable widths. The 8-bit DAC produces a nominal seed current of

25µA and the 8-bit tunable PMOS width provides 0.2µA steps over the adjustable

range of 0 − 50µA. The seed current generated by Md0 is fed into a current mirror

with 16 branches implemented by NMOS transistors Md1 and Mn0−Mn15. These

16 branches provide seed currents for the 16 rows in the 2D array. The layout of

transistors Mn0−Mn15 are physically placed next to each other for good matching.

Each of the 16 row currents is then routed and distributed to its corresponding row,

where it goes through another set of current mirrors with 16 branches. For example,

row current generated by Mn0 is mirrored by PMOS transistors Mp0 and M0−M15.

129

240 255

0 15

(MOS cap)

Cd1

Cp0

Cp15

Figure 5-16: The biasing circuit for the 2D array.

Similarly, for matching purposes, transistors M0−M15 are placed next to each other,

before their generated biasing currents are routed into corresponding circuit channels.

The current into each channel is nominally 25µA.

It is important to design the current mirror to be robust against mismatches across

the array. The transistor mismatch model in strong inversion is expressed in (5.14).

∆I

I=

√√√√[∆ (W/L)

(W/L)

]2+(

2∆VTHVGS − VTH

)2

=

√√√√(∆W

W

)2

+(

∆L

L

)2

+(

2∆VTHVGS − VTH

)2

.

(5.14)

The transistor L is chosen to be long to provide both large output impedance and small

sensitivity to channel length mismatches. At the same time, the transistor W is chosen

to keep the transistor well in the saturation region, with Vdsat = |VGS −VTH | ≈ 0.3V .

The large over-drive voltage helps maintain a relatively small VTH mismatch.

To reduce the noise contribution from current mirror transistors to LNA circuits,

MOS capacitors Cd1, Cp0 − Cp15 are instantiated as bypass capacitors. They take

up as much free layout area as possible, such that the noise generated from current

mirrors are negligible according to circuit simulation.

130

5.5 The Fault-Tolerant ASIC Design for Faulty MEMS

Devices

This section discusses the practical issues in the CMUT-ASIC assembly process. The

fault-tolerant transceiver front-end design in conjunction with the use of per-element

enable bits become an elegant solution to overcome the defective transducer elements.

The method increases assembly yield and allows successful system demonstration.

A 2D CMUT array contains a large number of elements, inevitably there could

exist defective elements. Currently, we obtain 2D CMUT transducer samples exter-

nally with the size of 16x16 to work with our 2D ASICs for experiments. Some of

these MEMS research prototypes suffer from failure mechanisms including individual

shorted elements and individual open elements. The problematic elements are ran-

domly distributed in the array, and their positions vary from device to device. For

short elements, the short behavior is also observed to be related to the bias voltage. A

higher V BIAS tends to create more short elements; when V BIAS is reduced, some

elements that were shorted might turn into a normal element.

For the non-functional elements in the array, the open elements do not require

special treatment. The transceiver channel with an open element is not useful, since

no ultrasonic signal can be emitted or received. But that element does not affect the

transceiver circuit, nor prevent other elements from working properly. On the other

hand, the short elements cause more problems. Because the whole 2D array is biased

with a shared high voltage supply V BIAS, a short element could propagate the high

voltage to the side that is connected to the circuit, exposing the transceiver circuitry

under V BIAS and potentially damaging the circuit. Furthermore, if the transceiver

circuit provides a relatively low impedance path to ground, V BIAS could be pulled

down to close to 0V, sinking current through the low-impedance path from V BIAS

to ground. Since V BIAS is shared across the array, the whole array would be hardly

biased in this situation and become useless.

While extensive research is ongoing to make the device more reliable with a lower

defective element percentage, it is worthwhile to investigate methods to cope with

131

the existing defects. In particular, given the fact that even one short element could

render the whole array useless, and that achieving 100% functional element percentage

is difficult for 2D arrays with ever-growing sizes, fault-tolerance is indispensable to

work with 2D CMUT arrays in the future.

Previously, a very manual process has been used to overcome the problem caused

by the short elements [40, 43]. The elements in a 2D CMUT array are first tested

with a probe station to identify all the short elements under a certain V BIAS. The

solder bumps at the positions corresponding to the short elements are then manually

removed, to prevent the electrical contact between the short CMUT element and the

interposer PCB. In this way, the short CMUT elements are physically isolated from

the transceiver circuitry and the rest of the array can operate normally.

There are several drawbacks with this “selective bumping” approach. First, using

a probe station to sweep through all 256 elements to find shorts is a very slow and

manual process which is prone to errors. Second, because each CMUT device has a

unique pattern of short elements, it is not an easily automated process to remove the

detected shorts. Lastly, this manual approach might not solve the problem completely.

It has been observed that new CMUT short elements might emerge when a different

V BIAS voltage is applied. Therefore, a fixed solder ball removal pattern might work

at the beginning, but as soon as one single additional new short element emerges, the

assembly becomes not usable.

On the contrary, our 2D ASIC takes advantage of circuit techniques to implement

fault-tolerant transceivers, in order to eliminate the need for “PCB selective bump-

ing”. The ASIC and CMUT are flip-chip bonded together in the usual way without

selective solder ball removing, as already been described in Section 4.1. Afterwards,

the ASIC performs a programmable “channel removal” process electrically, used both

as a scanner to detect short elements and as a selector to isolate the detected shorts.

Our solution does not require additional circuitry, but only small changes in control-

ling the existing front-end HV transistors in the Tx pulser and the RxSw, as shown

in Figure 5-17. In each channel, totally five front-end HV transistors are directly

connected to the CMUT element as shown in Figure 5-17(a). M1-M4 are pulser

132

30V

15V

M1

M2

M3

M4

0V

0.1uF

1MΩ

VBIAS

CMUT

30V

30V

0->30->0V0V

(Monitor current)

+

0V

M10

0.1uF

1MΩ

VBIAS

G1.[0]

G1.[1]

G1.[255]

M1.[0]

M1.[1]

M1.[255]

10kΩ V I

CMUT.[0]

CMUT.[1]

CMUT.[255]

(Monitor current)

Figure 5-17: The technique used for detecting and isolating the short CMUT elements:(a) front-end transistors in each channel and their control voltages; (b) the effectivecircuit connection of all 256 channels with CMUT elements.

transistors and M10 is the Rx protection switch (RxSw). Their gate voltages can be

controlled independently. When all transistors are switched off, the CMUT element

is effectively disconnected and “selectively removed” from the array. To detect short

elements, M1 is used to provide a ground path to CMUT, while other four transistors

are kept off. Focusing on M1, the 256-channel electrical connections between ASIC

and CMUT are reduced to Figure 5-17(b). M1 from each channel is sequentially

turned on and off, applying a voltage sequence of 0→30→0V to M1’s gate. For ex-

ample, when M1 from channel [0] is on with its gate voltage G1.[0] at 30V, CMUT

in channel [0] is connected across the ground and V BIAS. Normally, the CMUT is

a capacitor at DC and the current monitored by the voltage meter is zero. But if the

CMUT is shorted, the 10kΩ probing resistor would expose a leakage current through

the abnormal CMUT, indicating a short element.

The per-element enable bits in the Column-Row-Parallel architecture is the key

factor to ensure the selective enabling of transceiver channels to only make electrical

connections to normal elements. It is the independent control over each channel that

133

board8-B-SOICMUT,VBIAS=30V

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191

192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223

224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239

240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191

192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223

224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239

240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255

Figure 5-18: Two successful 16x16 CMUT-ASIC assemblies with short CMUT ele-ments (marked in red) isolated by the ASIC. The rest of the elements are functionaland their sensitivity performance is expressed by the brightness of the elements, whichwill be described in detail in Section 6.4.

allows us to identify individual short elements. And by iterating through each of the

256 channels, all short CMUT elements are identified. The ASIC is then programmed

such that only the channels with normal CMUTs are enabled which contribute to the

imaging operations. All transceiver channels facing shorted elements will keep their

front-end HV transistors cut-off during all operations. If new short elements emerge

in the future, the ASIC can be programmed again to easily account for the changes.

Figure 5-18(a) and (b) show two example assemblies with the short elements marked

in red.

However, the electrical isolation does have one limitation. The maximum accept-

able V BIAS is limited to the maximum rated voltage that the HV transistors can

withstand. This is because the short elements would conduct V BIAS to the circuit

side, and the HV transistors are therefore stressed by the voltage difference between

the drain and source. V BIAS as high as 40V has been applied without breaking the

ASIC in our experiments. Normally a V BIAS of 30V is used since it already offers

134

enough acoustic pressure and sensitivity to perform the imaging experiments.

Overall, our approach has been successful. It leverages the powerful functionality

provided by the electronics implementing the Column-Row-Parallel architecture, and

makes the short detection problem a fast and automated electrical process. It does

not require repetitive manual device characterization, and it could easily adapt to el-

ement property changes over time. Imaging experiments in Chapter 4 are all carried

out with the short elements disabled in this way. As also been discussed in Section 4.1

and 7.2, with several non-functional elements inside the 16x16 2D aperture, imaging

experiment results in Chapter 4 are not severely affected. Interpolation is used to

make up for the missing elements’ signals in digital post-processing in receive. Trans-

mit interpolation is also possible with pulsers that provide programmable amplitude

and phase generation (the 3-level pulser design in this work has a fixed pulse ampli-

tude generation), such that the neighboring channels of a missing channel can adjust

their pulse shapes to compensate for the missing signal in transmit. Furthermore, as

can be seen from Figure 5-18, the channel responses of different channels across the

array have mismatches due to the device and circuit component mismatches, and from

the flip-chip bonding assembly process. This response difference in the receive path

is already corrected by digital post-processing, where a weight is applied onto each

channel’s waveform to account for the response amplitude difference. In the transmit

path, similar correction can be applied with pulse shape pre-distortion. Because the

assembly property does not change much, this correction / trimming process is static

and only needs to be performed infrequently.

Lastly, the per-element addressing capability combined with the highly flexible

front-end circuit design has its application beyond implementing fault-tolerance. It

is a general testing and calibration infrastructure that enables programmable access

to individual ultrasonic channels. In fact, the block-level circuit performance char-

acterization is carried out by only enabling a single channel for measurement; the

channel performance mismatches are evaluated by turning on different channels; and

the LNA parallelism measurement is obtained by activating different number of Rx

channels for each parallelism configuration. For an even larger 2D transducer array in

135

the future, the ability of random access to channels in the array is critical for device

characterization, performance evaluation, and calibration.

136

Chapter 6

ASIC Characterization

The ASIC circuit block characterization is presented in this chapter. We have taped

out two ultrasonic transceiver ASIC test chips. Before the 2D 16x16 ASIC was made,

a 1D 4-channel ASIC was first fabricated and tested [5, 6], which is designed for

a 1D CMUT array, with a pitch of 300µm and an element height of 3mm [41].

The 1D chip allows us to familiarize with the CMOS high voltage process and the

CMUT device properties. The 2D chip re-uses circuit blocks from 1D chip, with

innovation at the architecture-level. While the 2D chip testing focuses heavily on

system-level demonstrations, which has been covered in Chapter 4, the 1D chip testing

focuses heavily on interesting acoustic experiments and device characterization. In

this chapter, both chip’s test results will be presented, with different emphases.

6.1 Tx Ultrasonic Power and Efficiency Measure-

ment

The most important performance specification for the ultrasonic transmitter is the

power efficiency. The Tx efficiency is defined as the ratio between the transmitter’s

acoustic output power and the total consumed electrical power. To obtain Tx effi-

ciency, the total pulsing power can be measured electrically. However, the ultrasonic

power transmitted into the medium requires acoustic measurements.

137

This section shows the way of characterizing the transmitter performance with a

combination of acoustic and electrical measurements. The 1D chip is used to show the

characterization process, but both chips’ results are listed at the end of the section.

6.1.1 Measuring Acoustic Output Power

From (6.1), acoustic power is the product of the acoustic intensity (I) at transducer

surface and the transducer surface area (A). I is calculated from the RMS funda-

mental frequency component of the acoustic pressure at the transducer surface (prms)

and the acoustic impedance of the medium (Zm).

Pacoustic = I · A =p2rmsZm· A (6.1)

In practice, the acoustic pressure at the transducer surface cannot be directly

measured. Instead, it can be reliably back-calculated from a pressure measurement

at another location. According to [28–30], when the transducer aperture is close to a

square or a circle, the pressure magnitude profile along the axial direction reaches its

maximum at the boundary between the near and far field. The maximum magnitude

is roughly twice the pressure magnitude at the transducer surface.

For back-calculation, an acoustic pressure measurement system1 is established in

lab, as shown in Figure 6-1. The 1D CMUT array is submerged in vegetable oil at the

bottom of the oil tank. The test chip circuitry is connected to CMUT from under the

oil tank. A hydrophone (ONDA HNC0400) is mounted on the 3D translation stage

to probe the acoustic pressure magnitude generated by CMUT, in the oil medium.

Figure 6-2 shows the detailed configuration to measure the acoustic output power.

The four-channel pulser circuitry is parallelized and connected to eight CMUT ele-

ments in parallel, in order to form an aperture of 2.4mm x 3mm (roughly a square).

The solid curve in Figure 6-3 is the corresponding acoustic simulation of the pressure

field using the Field II software, verifying that surface pressure (z=0mm) is about

1This measurement setup is for 1D chip testing, which is similar to the 2D chip test setupdescribed in Chapter 4.

138

Figure 6-1: The photo of the lab setup for measuring the acoustic output power andthe Tx efficiency.

139

Figure 6-2: Acoustic output power and Tx efficiency measurement setup.

Figure 6-3: Normalized RMS pressure along the transducer axial axis, measurementvs. simulation. The measurement deviates from the simulation in the near fieldbecause the hydrophone tip is too close to the transducer surface, distorting thepressure field.

140

half maximum pressure (z=5.9mm). Furthermore, the hydrophone is used to probe

the acoustic pressure magnitude along the axial direction (z-axis). The measured

result in dots in Figure 6-3 shows good agreement to both theory and simulation.

In near field, the measured data do not exhibit amplitude fluctuations as predicted

by simulation. This is likely caused by the hydrophone tip distorting the pressure

field as it approaches the transducer. However, it does not affect the accuracy of the

maximum pressure measurement and the surface pressure back-calculation.

6.1.2 Measuring Tx Efficiency

Fixing the hydrophone at the near and far field boundary (5.9mm away), acoustic

output power is obtained with the aforementioned method. Tx efficiency is thus

acquired after dividing the acoustic output power by the total power consumption.

Different pulse shapes are generated to evaluate the efficiency improvement. The

pulse shape is defined by the ∆/T ratio as shown in Figure 6-4(a), where ∆ is the

step duration of the middle voltage level and T is the pulse period. When ∆/T = 0,

2-level pulses are generated. As ∆/T increases, the pulses turn into 3-level, reducing

the dynamic power from CV 2f to CV 2f/2 and increasing the efficiency. But as

∆/T increases further, the acoustic power starts to decrease because less energy is

contained within the pulse shape. Since the dynamic power is kept at CV 2f/2,

efficiency decreases. Therefore, there is an optimal pulse shape to maximize the Tx

efficiency. For example, Figure 6-4(b) is a time-domain waveform of optimal 3-level

pulses at 3.3MHz.

Figure 6-5 shows the measurement results. As an example, Table 6.1 compares

the optimal 3-level pulser against the 2-level pulser operating at 3.3MHz: the optimal

3-level pulser dissipates 38% less total power at the cost of delivering 7% less acoustic

power. In other words, the 3-level pulser outputs 50% more acoustic power at the

same power dissipation. The measured improvement is not as big as the theoretical

calculation in Section 5.2.1 (50% rather than 88%), mainly for two reasons. First,

the RC settling transition distorts pulse shapes, with 3-level pulses being distorted

more severely than 2-level pulses, which leads to more acoustic power reduction in a

141

30Vpp

(a)

(b)

Figure 6-4: (a) Tx efficiency measurement setup and pulse shape definition. (b)Measured time-domain waveform of the optimal 3-level 3.3MHz pulses, ∆=20ns,∆/T=0.067

real-world 3-level pulser (7% rather than 4.4%). Second, a 3-level pulser uses more

high voltage transistors than a 2-level pulser, dissipating more power for driving the

transistors’ gate and drain capacitance, which leads to less total power reduction

(38% rather than 49%).

The relative efficiency improvements of a 3-level pulser over a traditional 2-level

pulser at 2.5, 3.3 and 5.0MHz pulses are measured to be 56%, 50% and 43%, re-

spectively. Table 6.2 lists the optimal 30Vpp 3-level pulser power dissipation and

efficiency at all three measured frequencies. Efficiency improvement is less for pulses

with a shorter period, because the same RC settling transition distorts shorter pulse

shape more severely, reducing useful acoustic output power. Moreover, higher fre-

quency pulses dissipate proportionally more dynamic power while acoustic output

power is kept roughly the same, thus the overall efficiency curve shifts down. Lastly,

the optimal ∆ value for the three frequencies is approximately the same (20ns), which

is slightly more than the RC settling time. This is because the optimal pulses use

just enough time to settle to the middle level to achieve CV 2f/2 dynamic power,

142

Figure 6-5: Tx efficiency measurement results using different 3-level pulse shapes byvarying the ∆/T ratio and at different frequencies.

Table 6.1: Measured Power and Efficiency Comparison at 3.3MHz for the 1D ASICand CMUT (40pF capacitance per element)

2-level Optimal 3-level ChangeAcoustic Power 0.56mW 0.52mW -7%

Total Power 84.5mW 52.4mW -38%Efficiency 0.66% 1.0% 50%

while keeping the middle level as narrow as possible to maintain large fundamental

frequency pulse energy delivery. When normalized over pulse period T in Figure 6-5,

the optimal ∆/T ratios become different for different pulse frequencies.

Similarly, the 2D chip is designed to generate pulses at frequencies between 2-

10MHz for the CMUT element size of 250µm × 250µm. The capacitance is roughly

2pF per element. Its performance is summarized in Table 6.3.

By comparing the optimal 3-level pulser against the 2-level pulser, this work is

effectively compared against a range of traditional pulsers. The reason is that not

only for 2-level pulsers [40, 41, 89], but also for multi-level pulsers without charge

143

Table 6.2: Measured Optimal 3-level Pulser Performance Summary for the 1D ASICand CMUT (40pF capacitance per element)

2.5MHz 3.3MHz 5.0MHzTotal Power (mW) 39.4 52.4 77.6

Relative Efficiency Improvement56% 50% 43%

Against a 2-level Pulser

Table 6.3: Measured Optimal 3-level Pulser Performance Summary for the 2D ASICand CMUT (2pF capacitance per element)

4.2MHz 5.6MHz 8.3MHzTotal Power (mW) 7.1 9.6 14.3

Relative Efficiency Improvement46% 38% 18%

Against a 2-level Pulser

recycling [81–83] or pulsers implemented as linear amplifiers [79, 80], the dynamic

power dissipation is always CV 2f . Therefore these traditional pulsers have similar

(if not worse, considering the quiescent power dissipation in linear amplifiers) Tx

efficiency performance compared to the 2-level pulser used in this work.

Table 6.4 gives a comparison between different types of pulsers for ultrasonic

imaging. The multi-level pulser in [82] (STHV748 datasheet) does not implement

charge recycling, it would consume the same amount of CV 2f dynamic power as the

2-level pulser in [40] (2008) & [43] (2013), if the load is the same. The linear amplifier

approach in [79] (2012), on the other hand, is more suitable for resistive transducers.

Because 2D transducers typically have capacitive elements, its efficiency would be low

due to quiescent power dissipation. Lastly, the discrete-level pulsers tend to generate

harmonics. This work attempts to improve the pulser’s HD2 performance from the

system-level, employing the I&Q excitation method presented in Section 4.3.

6.2 LNA Characterization

The LNAs from the 1D and the 2D ASICs are tested as single amplifier blocks in this

section. Table 6.5 and Table 6.6 summarize the measured performance numbers from

the two ASICs respectively. In Table 6.7, selected performance specifications of the

LNAs are compared against other CMUT LNAs in the literature.

144

Table 6.4: CMUT Pulser Performance Comparison

Pulser SpecsOur 2DASIC

Our 1DASIC

[5]

[40](2008)& [43](2013)

[82](STHV748datasheet)

[79] (2012)

CMUT ElementSize (µm× µm)

250 x250

(2pF)

300 x3000

(40pF)

250 x250

(2pF)

GeneralPurpose

(200Ω||50pF )

GeneralPurpose

(100Ω||150pF )

Pulser TypeDiscrete3-Levels

Discrete3-Levels

Discrete2-Levels

3- / 5- Levels,without charge

recycling

LinearAmplifier

Active Power7.1mW

@4.2MHz77.6mW@5MHz

N/A N/A 20W

Quiescent Power 0 0 0 N/A 37mW

Dominant PowerDissipation

“CV 2f/2”Dynamic

Power

“CV 2f/2”Dynamic

Power

“CV 2f”Dynamic

Power

“CV 2f”Dynamic

Power

“V 2/R”Resistive

PowerPulse Amplitude 30 Vpp 30 Vpp 25 Vpp ± 90 V 90 VppMinimum Pulse

Width/Bandwidth20 ns 20 ns 100 ns 20 MHz 6.5 MHz

Linearity N/A N/A N/A N/A HD2<-43dBc

Table 6.5: Measured LNA Performance Summary for the 1D ASIC [5]LNA Specs Measured Result

Process 0.18µm CMOSTarget CMUT Element Size 300µm× 3000µmActive Power Consumption 14.3 mWSleep Power Consumption 1.5 mW

Bandwidth 5.2 MHzTransimpedance Gain 96.6 dBΩ

Receive Sensitivity 1.2 Pa(rms)Receive Responsivity 162 mV/kPa

Input-referred Pressure Noise 0.56 mPa/√Hz @3MHz

Output-referred Voltage Noise 91 nV/√Hz @3MHz

Noise Figure 10.3 dB @3MHzOutput P1dB 618 mVpp

4-Ch Gain Mismatch <0.11 dBΩCrosstalk <-47 dBc @3MHz; <-35 dBc @10MHz

Wake-up / Sleep Time <1µs

145

Table 6.6: Measured LNA Performance Summary for the 2D ASICLNA Specs Measured Result

Process 0.18µm CMOSTarget CMUT Element Size 250µm× 250µmActive Power Consumption 1.4 mWSleep Power Consumption 0.054 mW

Bandwidth 10.2 MHzTransimpedance Gain 116/113.5/110/104 dBΩ

Receive Sensitivity 7.3 Pa(rms)Receive Responsivity 123 mV/kPa

Input-referred Pressure Noise 2.3 mPa/√Hz @5MHz†

Input-referred Current Noise 0.41 pA/√Hz @5MHz

Output-referred Voltage Noise 289 nV/√Hz @5MHz

Noise Figure 13 dB @5MHzOutput P1dB 946 mVpp†

HD2 −46dBc @330mVpp, 2MHz tone†

HD3 −46dBc @330mVpp, 2MHz tone†

IMD3 −72dBc @324mVpp, 2MHz & 2.01MHz (-25dBc) tones†

256-Ch Gain Mismatch <2.0 dBΩCrosstalk <-50 dBc @3MHz; <-22 dBc @15MHz

Wake-up / Sleep Time <1µs

†: These results are measured at the maximum LNA gain setting.

Being used for different medical ultrasound applications, the CMUT arrays are

very different in size, impedance and operating frequency. Thus, the corresponding

LNA specs are also vastly different and difficult to compare. For example, the 1D

CMUT used in this work is designed as an alternative to 1D PZT linear arrays

operating up to 5MHz; the 2D CMUT in this work however, has a smaller per-

element size (thus smaller element capacitance) while its bandwidth is larger (up to

10MHz). To establish a figure of merit for fair comparison and to be able to apply the

data available in CMUT LNA literature, the noise efficiency factor (NEF) commonly

used for instrumentation amplifiers [90] is revised for use here. The orignial NEF and

the revised NEF’ are expressed in (6.2) and (6.3) respectively:

NEF = Vrms,in ·√

2 · Itotπ · UT · 4kT ·BW

, (6.2)

146

Table 6.7: CMUT LNA Performance Comparison

LNA SpecsOur 2DASIC

Our1D

ASIC[5]

[40](2008)

[43](2013)

[73](2010)

[45](2011)

CMUT ElementSize (µm× µm)

250 x250

300 x3000

250 x250

250 x250

63 x1037

70 x 70

Active Power(mW) [Ptot]

1.4 14.3 4.0 9.4 3.8 6.6

Sleep Power (mW) 0.054 1.5 N/A N/A N/A N/ABandwidth (MHz) 10.2 5.2 10 25 20 10-20Transimpedance

Gain (dBΩ)116/113.5/110/104

96.6 112.7 106.6 94.0 129.5

Input-referredPressure Noise

Density(mPa/

√Hz) [pn,in]

2.3@5MHz

0.56@3MHz

1.8@5MHz

N/A2.18

@10MHz3.0

@15MHz

Noise Figure (dB)13

@5MHz10.3

@3MHzN/A N/A

10.5@10MHz

1.8@10-20MHz

Output P1dB(mVpp)

946 618 N/A N/A 84.2 N/A

NEF’(mPa ·

√mW/Hz)

[pn,in ·√Ptot]

2.7 2.1 3.6 N/A 4.2 7.7

NEF ′ =prms,in√BW

·√Ptot = pn,in ·

√Ptot. (6.3)

The constant factors in the original NEF are ignored, and Vrms,in is replaced by

prms,in or pn,in. prms,in is the input-referred RMS noise amplitude in-band and pn,in

is the input-referred noise spectral density averaged inside the passband. Note that

both prms,in and pn,in are acoustic pressure noise, input-referred all the way to the

mechanical side at the CMUT element surface, in the unit of Pa and Pa/√Hz respec-

tively. This input-referred method normalizes the effect of CMUT receive sensitivity

and LNA gain. Moreover, the input-referred noise spectral density at the center fre-

quency of the passband is used to approximate pn,in (the input-referred noise spectral

density averaged inside the passband) for the actual NEF’ calculation, because it is

the more accessible measurement result in the literature.

147

Figure 6-6: The die photo of the four-channel ultrasonic imaging transceiver test chip.

The NEF’ in (6.3) handles CMUT element size scaling correctly. For example, a

CMUT element with 2x bigger surface area presents approximately 2x bigger input

capacitance to the LNA. If two of the same LNAs are parallelized to buffer the 2x

CMUT element, the same bandwidth and noise figure targets are achieved. Although

the parallelization reduces the input-referred noise amplitude by√

2x and increases

the power consumption by 2x, the NEF’ is held unchanged. This is expected since

the same LNA design is used in both cases. Another example to show the usefulness

of NEF’ is [45] in Table 6.7. It achieves a very low noise performance as indicated

by the noise figure. On the other hand, excessive power is dissipated on a very small

CMUT element, which leads to a relatively high NEF’.

Table 6.7 suggests that our LNA designs for the 1D CMUT achieves the low-

est NEF’, indicating the best power efficiency for noise and bandwidth performance.

NEF’ in our 2D LNA is slightly worse than our 1D LNA due to the overhead needed

to drive extra line capacitance and to combine analog outputs. In addition, both

designs achieve good linearity performance as shown by results in P1dB, harmonics

and intermodulation numbers.

Finally, Figure 6-6 shows a die photo of the 1D test chip fabricated in TSMC

0.18µm high voltage CMOS process. The chip occupies a total area of 3mm× 3mm

148

Figure 6-7: The die photo of the 256-channel 16x16 2D ultrasonic imaging transceivertest chip.

and each channel occupies an area of 300µm × 1100µm. The shared middle voltage

generation circuit occupies an area of 300µm× 600µm. Figure 6-7 shows a die photo

of the 2D test chip fabricated in the same CMOS process. The chip occupies a total

area of 6mm × 5.5mm and each channel is element-matched to the CMUT element

area of 250µm× 250µm.

6.3 The Tx Beam-Steering Experiment

Although Tx beam-steering or beam-focusing is already used on the 2D CMUT-

ASIC system for real imaging experiments in Chapter 4, a tangible Tx beam-pattern

demonstration would help understanding. Therefore, a simple Tx beam-steering ex-

periment is conducted on the 1D ASIC, in which the ultrasonic lateral beam-pattern

is measured. In this experiment, each of the four-channel pulsers is connected to one

149

(z=7.4mm) (z=7.4mm)

Figure 6-8: (a) Measured ultrasonic lateral beam profile, steered to the center (broad-side). (b) Measured beam profile, with 30ns delay between channels.

of four consecutive CMUT elements in the experimental setup in Figure 6-1; each

pulser drives its CMUT with the 3.3MHz optimal 3-level pulses. The hydrophone

is placed at a fixed depth in the transducer’s far field (z=7.4mm). By moving the

hydrophone along the lateral direction (x-axis) and collecting the acoustic pressure

readings, the lateral beam profile can be plotted. Furthermore, ultrasonic Tx beam-

steering is demonstrated on the four-channel transmitter system when varying the

relative pulsing delays across four channels.

Figure 6-8(a) shows the measured beam profile in dots with zero delay between

channels. The beam is steered to the center, i.e., broadside. Figure 6-8(b) shows

the profile when 30ns delay is applied between channels. The figures also show the

Field II simulation results of the same experimental configurations for each case.

The simulation and measured data match well. Hand calculation based on classical

wave propagation provides another verification for Figure 6-8(b). The beam lateral

displacement ∆x and the channel delay, td = 30ns, are related to each other by (6.4):

∆x

z≈ td · c

d, (6.4)

where depth z = 7.4mm, sound speed c in vegetable oil is measured to be 1460m/s,

and CMUT element pitch d = 300µm. The calculated beam lateral displacement

∆x = 1.08mm, which is consistent with the measured result.

150

Figure 6-9: The setup of the pulse-echo experiment for characterizing the completeultrasound channel.

6.4 The Pulse-Echo Experiment

The pulse-echo experiment characterizes the complete ultrasound signal chain. The

1D ASIC test setup in Figure 6-1 is revised to perform the experiment. As shown

in Figure 6-9, the pulser drives a single CMUT element with a wideband pulse as an

approximation to the ideal impulse excitation. The narrowest pulse that can be gen-

erated from the pulser is a 2-level 30Vpp pulse with 20ns pulse width (Figure 6-10(a)).

The excited ultrasonic wave then propagates through the oil medium and is reflected

back at the oil-air boundary 26mm away from the transducer (the hydrophone is not

needed for this experiment). The reflected echo is received by the same CMUT ele-

ment and amplified by the LNA (Figure 6-10(b)). Because the CMUT blocks the DC

component and acts as a differentiator, the received echo looks similar to the deriva-

tive of the transmitted pulse, with a positive peak and a negative peak corresponding

to the rising edge and the falling edge of the transmitted pulse. The echo duration

is about 0.3µs, corresponding to the dominant frequencies (3-5MHz) that go through

the ultrasound signal chain. The echo’s FFT in Figure 6-10(c) confirms the intuition.

It shows the total channel impulse response, including CMUT, the oil medium and

LNA. It mainly reflects the band-pass characteristic and the wide bandwidth of the

151

0 2 4 6 8 10-40

-30

-20

-10

0(c) Spectrum of Received Echo Waveform

Freq (MHz)

Am

plitu

de (

dB)

32 33 34 35 36 37-0.2-0.1

00.10.2

Time (us)

Vol

tage

(V

)

(b) Received Echo Waveform

-0.2 -0.1 0 0.1 0.2-10

010203040

Time (us)

Vol

tage

(V

)

(a) Transmitted Pulse Waveform

Time (us)

Time (us)

Freq (MHz)

f0=4.5MHz

BW=5.2MHz

2.3MHz 7.5MHz

Figure 6-10: The key waveforms from the pulse-echo experiment, showing the ultra-sound channel characteristics. (a) The transmitted pulse waveform. (b) The receivedecho waveform. (c) The spectrum of the received echo waveform.

CMUT device, with a center frequency of 4.5MHz and a -6dB fractional bandwidth

of 116%2.

Similarly, the 2D ASIC performs the pulse-echo experiment on all of its 16x16

transceiver channels, which shows (on average) a center frequency of 6.25MHz and a

-6dB fractional bandwidth of 75% of the CMUT-ASIC total channel response. The

reflected echo amplitude also shows the channel sensitivity. By collecting all echoes’

amplitudes, the sensitivity map of the array can be obtained as shown in Figure 6-11

(a re-plot of Figure 5-18). Except for the short elements in red, the working elements

are drawn in grayscale. The brightness encodes the normalized sensitivity of each

2-6dB bandwidth is used instead of -3dB because the spectrum is showing the combined CMUTcharacteristic both-ways.

152


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191

192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223

224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239

240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191

192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223

224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239

240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255

Figure 6-11: A re-plot of Figure 5-18 in Section 5.5. Two successful 16x16 CMUT-ASIC assemblies with short CMUT elements (marked in red) isolated by the ASIC.The rest of the elements are functional and their sensitivity performance is expressedby the brightness of the elements.

CMUT element. The sensitivity map can be used for digital calibration of channel

gain mismatch for the 2D array.

153

Chapter 7

Conclusion

7.1 Summary of Contributions

In summary, this thesis presents a Column-Row-Parallel ASIC architecture as a scal-

able and flexible hardware solution for 3D wearable / portable medical ultrasound

applications.

The architecture provides a “N” interconnection complexity and a “N” acquisi-

tion time complexity for a NxN 2D ultrasonic transducer array, which is scalable as

the array size grows bigger. The architecture offers column-parallel and row-parallel

operations, and fine-granularity per-element selection control, which makes the hard-

ware system flexible for different ultrasonic imaging algorithms.

A 16x16 ASIC ultrasonic transceiver test chip interfacing to a 16x16 CMUT array

is designed and fabricated to demonstrate the proposed architecture. Plane-wave

coherent compounding algorithm in 3D (PWCC3D) is implemented on the system

assembly as a fast volume rate, high quality, volumetric imaging algorithm. The

architecture also enables a technique for HD2 reduction in the transmitters used in

ultrasonic harmonic imaging mode. The interleaved checker board patterns with

I&Q excitations achieve Tx HD2 reduction by over 20dB compared to conventional

methods. This technique is applicable to nonlinearity from both CMUT transducers

and circuits, and it is useful for any arbitrary pulse shapes.

The circuit design of the 16x16 ASIC transceiver is optimized to the target CMUT

155

transducer. The high-voltage transmitter uses a 3-level pulse-shaping technique with

charge recycling to improve the power efficiency. The design requires minimum off-

chip components and is scalable for more channels. The receiver is implemented with

a transimpedance amplifier topology and is optimized for trade-offs between noise,

bandwidth, and power dissipation. The test chip is characterized with both acoustic

and electrical measurements. Comparing the 3-level pulser against traditional 2-level

pulsers, the measured Tx efficiency shows 50% more acoustic power delivery with the

same total power dissipation. The CMUT receiver achieves the lowest noise efficiency

factor compared with that of the literature (2.1 compared to a previously reported

lowest of 3.6, in units of mPa ·√mW/Hz).

In addition, both transmitters and receivers can be parallelized to efficiently imple-

ment the Column-Row-Parallel architecture. Particularly for the receivers, a special

output stage is implemented for the receiver LNA, such that the analog outputs are

combined for a higher SNR, which scales with the number of LNAs as 10 log(N) dB.

The 2D transceiver chip is also designed to be fault-tolerant against defects exist-

ing in CMUT arrays. The transceiver channels can be used for detecting CMUT short

elements and then disconnecting the non-functional elements from the array. This

selective element disabling capability is realized electrically and can be automated.

This design strategy has proven to be critical for working with faulty MEMS devices.

It is especially beneficial for the 2D arrays with large element count, and it reflects

a highly desirable feature for front-end sensor interface circuit design. The random

access to the array elements serves as a flexible testing infrastructure in general. Not

only faulty channels can be detected and isolated, performance characterizations of

functional channels can also be obtained with this programmable interface.

There are several low power circuit design techniques used in this work, to make

the system suitable for wearable / portable applications. They are summarized as

follows:

• The multi-level pulse shaping technique is combined with a charge-recycling,

regulated power supply to implement the transmit pulser in Section 5.2.1. The

dynamic power consumed by the capacitive load of the CMUT element is re-

156

duced by half with a 3-level pulsing scheme. The regulated power supply that

recycles the charge is implemented with a shared DC-DC power converter, which

requires only two off-chip capacitors. The circuit is scalable to more channels,

and is easily integrated.

• The transimpedance amplifier topology for the receiver LNA is optimized for

the best power efficiency, given the bandwidth, gain and noise performance

requirements. The optimization procedure is described in Section 5.3.1. The

revised noise efficiency factor (NEF’) shows that the amplifier design achieves

the lowest power consumption while meeting all design targets. In addition

to the low active power dissipation, the amplifier also implements a very low

power sleep mode. Both the main amplifier stages and the biasing currents are

turned off in sleep mode by auxiliary switches, which help attain less than 1µs

amplifier recover time during wake up. With some prior information about the

scene (e.g. from a coarse image of the space in the first pass), the receiver signal

chain can be put into sleep mode for power savings, when it is not needed to

perform imaging at certain regions. Therefore, the sleep mode offers flexibility

for system-level power scheduling.

• The receiver output stage is designed to facilitate the analog signal combin-

ing. A source follower stage is specially sized to provide proper signal current

summing, while overcoming the parasitics from the 2D interconnect lines. The

optimization procedure is shown in Section 5.3.3. Similar to the sleep mode,

the receiver output stage could benefit the system flexibility by providing pro-

grammable receiver parallelism. More receivers are parallelized for a better

signal quality (i.e. SNR) when it is necessary for the imaging algorithm.

• At the algorithm-level, flexible beam-formation schemes are proposed, such that

power consumption can be tuned according to the image quality requirement. In

Sections 4.2 and 4.4, the 3D plane-wave coherent compounding (PWCC3D) al-

gorithm and the annular ring aperture imaging method are presented as scalable

imaging algorithms that could adjust image volume rate, contrast and resolu-

157

tion performance for variable system power dissipation. For example, with less

transmit plane-wave angles in PWCC3D, or less annular rings formed in the

ring apertures, the image contrast and resolution are degraded, but the energy

consumed for data acquisition of one volumetric image is decreased. Moreover,

after data acquisition, PWCC3D beamforming processing is also scalable. A

low resolution volumetric image should first be computed with relatively low

power consumption; a higher resolution image can then be computed based on

the region of interest. The latter consumes more digital computation power,

but offers proportionally higher image resolution performance.

7.2 Future Work

Several possible improvements can be made for the 16x16 Column-Row-Parallel ASIC:

• A more complete analog front-end design would require on-chip ADCs. Because

the scalable Column-Row-Parallel architecture offers “N” I/O complexity, only

16 (rather than 256) ADCs are required for the 16x16 ASIC, which is practical to

implement. In fact, there are many octal analog front-end ASICs commercially

available for conventional 1D ultrasonic arrays [91–93], where each ADC channel

occupies 2 LVDS output pins to output serialized digital data. The Column-

Row-Parallel ASIC with on-chip ADCs could take the same strategy to deal

with the massive amount of data and save pin count.

• When the 2D array size grows beyond 16x16, if a single ASIC with excessive

silicon area is to be avoided for yield and reliability reasons, multiple Column-

Row-Parallel ASICs could be tiled together for expansion. For example, four

16x16 ASICs make a 32x32 front-end system. To simplify the tiling assembly,

the ASIC layout could be re-arranged, such that only two sides, instead of all

four sides of the chip have extra area for peripheral I/Os. In this way, the 16x16

transceiver array is exposed to two sides, to which four of the same ASIC chips

can be simply abutted for a 32x32 transceiver array, as shown in Figure 7-1.

158

Figure 7-1: Four 16x16 ASICs tiled together for a 32x32 imaging front-end.

Figure 7-2: CMUT-ASIC assembly alternatives to eliminate the interposer PCB: (a)TSV technology for interconnecting ASIC I/Os to the main testing PCB; (b) Applyingflip-chip bonding technology for CMUT-ASIC interconnection and wire-bonding forASIC I/Os.

• The current system assembly is accomplished by using an interposer PCB as

shown in Section 4.1. It helps adapt to different CMUT footprint and it serves

as an intermediate substrate, such that the ASIC I/Os can be connected to the

main testing board. But it increases the assembly complexity and adds addi-

tional parasitic capacitance and resistance to the CMUT-ASIC interconnection.

In the future, the interposer PCB can be eliminated by adopting new process

and assembly technologies for the interconnection, such as the through silicon

via (TSV), or the co-assembly of wire-bonding and flip-chip bonding. Their cor-

responding ASIC I/O connection methods to the main testing PCB are shown

in Figure 7-2(a) and (b).

• As been briefly mentioned in Section 5.2.3, the future ASIC could make the

pulser gate driver programmable in driving strength, to dynamically adapt to

159

different number of active pulsers on a column / row line, leading to a power

saving. Similarly for the Rx path, if the ADCs are implemented, ADCs with

configurable accuracy (i.e. number of bits) can be designed to adapt to different

number of active LNAs (different SNR) along the line to save power.

• The programmable gain Rx LNAs in the array are currently controlled glob-

ally. It would be more flexible to have control over individual LNA, by adding

configuration bits per element. Moreover, programmable Tx pulse amplitude

control can be realized with a multi-level pulser design with per-element con-

trols. Different voltage levels can be used to realize pulses with different ampli-

tudes. These programmable functionality could enable fine-granularity, flexible

apodization for both Tx and Rx apertures. In particular, it can be used to per-

form signal strength compensation against the channel mismatches as seen in

Figure 5-18; and to compensate the missing channels by adjusting neighboring

channels’ pulse shapes (amplitude and phase) as been mentioned in Section 4.1

and 5.5.

At the system-level, work is being done by Bonnie Lam, under the guidance of

Prof. Anantha Chandrakasan and Prof. Charles Sodini, to design a custom digital

test chip that performs 3D beam-formation based on the Column-Row-Parallel analog

front-end ASIC made in this thesis. This thesis aims to demonstrate the Column-

Row-Parallel architecture as a promising hardware system framework for efficient,

low-power, and scalable 3D ultrasonic imaging for wearable / portable applications.

The analog front-end circuit implementation is the focus, while the system-level digital

beam-formation processing is performed on a PC. To demonstrate a complete wear-

able / portable ultrasonic imaging device, a real-time low-power digital beam-former

that is optimized for the Column-Row-Parallel analog front-end is indispensable.

Furthermore, intelligence can be implanted in the beam-former chip, such that

the ultrasound device becomes adaptive and autonomous. The beam-former could

understand the scene based on its beam-formed data, and improve its imaging strategy

correspondingly. One example of intelligence has been described in Section 4.2. One

160

could control PWCC3D algorithm to either obtain volumetric images of a large space

with coarse spatial resolution, or to “zoom into” a smaller region with finer resolution,

under certain volume rate or power constraints. The beam-former chip could exploit

such algorithm features and provide feedback controls to the analog front-end to

realize a closed-loop adaptive imaging system.

On another thread, PMUT is currently being investigated as an alternative trans-

ducer technology to CMUT by Katherine Smyth, under the guidance of Prof. Sang-

Gook Kim at MIT. Because Column-Row-Parallel architecture is independent of

transducer technology, implementation of a Column-Row-Parallel analog front-end for

PMUT would be interesting. Block-level circuit optimization is different for PMUT

due to its different device characteristics as compared to CMUT. To understand the

performance differences between PMUT and CMUT, and the impact to circuit topol-

ogy, a detailed comparison study is currently on-going.

161

Bibliography

[1] G. E. Moore, “Cramming more components onto integrated circuits,” Proc.

IEEE, vol. 86, no. 1, pp. 82–85, Jan 1998.

[2] C. Prinz and J. Voigt, “Diagnostic accuracy of a hand-held ultrasound scanner in

routine patients referred for echocardiography,” Journal of the American Society

of Echocardiography, 2010.

[3] S. Nikolov and J. Jensen, “3d synthetic aperture imaging using a virtual source

element in the elevation plane,” in Ultrasonics Symposium, 2000 IEEE, vol. 2,

oct 2000, pp. 1743 –1747 vol.2.

[4] S. Nikolov, J. Jensen, R. Dufait, and A. Schoisswohl, “Three-dimensional real-

time synthetic aperture imaging using a rotating phased array transducer,” in

Ultrasonics Symposium, 2002. Proceedings. 2002 IEEE, vol. 2, oct. 2002, pp.

1585 – 1588 vol.2.

[5] K. Chen, H.-S. Lee, A. Chandrakasan, and C. Sodini, “Ultrasonic imaging

transceiver design for cmut: A three-level 30-vpp pulse-shaping pulser with im-

proved efficiency and a noise-optimized receiver,” Solid-State Circuits, IEEE

Journal of, vol. 48, no. 11, pp. 2734–2745, 2013.

[6] K. Chen, A. Chandrakasan, and C. Sodini, “Ultrasonic imaging front-end design

for cmut: A 3-level 30vpp pulse-shaping pulser with improved efficiency and a

noise-optimized receiver,” in Solid State Circuits Conference (A-SSCC), 2012

IEEE Asian, 2012, pp. 173–176.

163

[7] K. Chen, B. Lam, C. Sodini, and A. Chandrakasan, “System energy model for a

digital ultrasound beamformer with image quality control,” in Ultrasonics Sym-

posium (IUS), 2012 IEEE International, 2012, pp. 615–618.

[8] G. S. Kino, Acoustic Waves: Devices, Imaging, and Analog Signal Processing.

Prentice Hall, 1987.

[9] R. S. C. Cobbold, Foundations of Biomedical Ultrasound. Oxford University

Press, 2006.

[10] D. Olendorf, C. Jeryan, and K. Boyden, The Gale encyclopedia of medicine.

Gale Research (Detroit, MI), 1999.

[11] J. A. Jensen, Estimation of Blood Velocities Using Ultrasound, A Signal Process-

ing Approach. Cambridge University Press, 1996.

[12] T. Szabo, Diagnostic Ultrasound Imaging: Inside Out. Elsevier, 2004.

[13] P. Satamura, “Study of the flow patterns in peripheral arteries by ultrasonics,”

J. Acoust. Soc. Japan, vol. 15, pp. 151–158, 1959.

[14] D. Baker, “Pulsed ultrasonic doppler blood-flow sensing,” Sonics and Ultrason-

ics, IEEE Transactions on, vol. 17, no. 3, pp. 170 – 184, jul 1970.

[15] C. Kasai, K. Namekawa, A. Koyano, and R. Omoto, “Real-time two-dimensional

blood flow imaging using an autocorrelation technique,” IEEE Transactions on

Sonics and Ultrasonics, vol. SU-32, no. 3, pp. 458–463, May 1985.

[16] M. Anderson, M. McKeag, and G. Trahey, “The impact of sound speed errors on

medical ultrasound imaging,” The Journal of the Acoustical Society of America,

vol. 107, p. 3540, 2000.

[17] D. H. Evans and W. N. McDicken, Doppler Ultrasound (Second ed.). John

Wiley and Sons, 2000.

164

[18] F. Tranquart, N. Grenier, V. Eder, and L. Pourcelot, “Clinical use of ultrasound

tissue harmonic imaging,” Ultrasound in medicine & biology, vol. 25, no. 6, pp.

889–894, 1999.

[19] A. Novell, M. Legros, N. Felix, and A. Bouakaz, “Exploitation of capacitive

micromachined transducers for nonlinear ultrasound imaging,” Ultrasonics, Fer-

roelectrics and Frequency Control, IEEE Transactions on, vol. 56, no. 12, pp.

2733–2743, 2009.

[20] S. Satir and F. L. Degertekin, “Harmonic reduction in capacitive micromachined

ultrasonic transducers by gap feedback linearization,” IEEE Transactions on

Ultrasonics Ferroelectrics and Frequency Control, vol. 59, no. 1, pp. 50–59, Jan

2012.

[21] F. Lin, C. Cachard, R. Mori, J. Viti, F. Varray, F. Guidi, and O. Basset, “In-

fluences of bubble motion to second-harmonic inversion imaging,” in Ultrasonics

Symposium (IUS), 2012 IEEE International, 2012, pp. 675–678.

[22] M. Pasovic, M. Danilouchkine, T. Faez, P. L. van Neer, C. Cachard, A. F. van der

Steen, O. Basset, and N. de Jong, “Second harmonic inversion for ultrasound

contrast harmonic imaging,” Physics in Medicine and Biology, vol. 56, no. 11, p.

3163, 2011.

[23] J. Rubin, R. Bude, P. Carson, R. Bree, and R. Adler, “Power doppler us: a po-

tentially useful alternative to mean frequency-based color doppler us.” Radiology,

vol. 190, no. 3, p. 853, 1994.

[24] J. Platt, J. Rubin, J. Ellis, and M. DiPietro, “Duplex doppler us of the kidney:

differentiation of obstructive from nonobstructive dilatation.” Radiology, vol. 171,

no. 2, p. 515, 1989.

[25] A. Yuan, P. Yang, D. Chang, C. Yu, S. Kuo, and K. Luh, “Lung sequestration.

diagnosis with ultrasound and triplex doppler technique in an adult.” Chest, vol.

102, no. 6, p. 1880, 1992.

165

[26] K. Thomenius, “Evolution of ultrasound beamformers,” in Ultrasonics Sympo-

sium, 1996. Proceedings., 1996 IEEE, vol. 2, nov 1996, pp. 1615 –1622 vol.2.

[27] E. Brunner, “Ultrasound system considerations and their impact on front-end

components,” Analog Devices, 2002.

[28] J. Bushberg, The essential physics of medical imaging. Williams & Wilkins,

2002.

[29] H. Pettersson, The Encyclopaedia of Medical Imaging: Physics, Techniques and

Procedures vol. 1. Taylor & Francis Ltd, 1998.

[30] X. Zeng and R. J. McGough, “Evaluation of the angular spectrum approach

for simulations of near-field pressures,” The Journal of the Acoustical Society of

America, vol. 123, no. 1, p. 68, 2008.

[31] C. Capps, “Near field or far field,” EDN, August, vol. 16, pp. 95–102, 2001.

[32] L. Steiner and P. Andrews, “Monitoring the injured brain: Icp and cbf,” British

journal of anaesthesia, vol. 97, no. 1, p. 26, 2006.

[33] F. M. Kashif, T. Heldt, and V. G. C., “Model-based estimation of intracranial

pressure and cerebrovascular autoregulation,” Comput Cardiol, pp. 35: 369–372,

2008.

[34] W. Mason, Electromechanical transducers and wave filters. Van Nostrand Rein-

hold, 1946.

[35] F. V. Hunt and D. T. Blackstock, Electroacoustics: the analysis of transduction,

and its historical background. American Institute of Physics for the Acoustical

Society of America, 1982.

[36] C. H. Sherman and J. L. Butler, Transducers and arrays for underwater sound.

Springer, 2007.

166

[37] B. Savord and R. Solomon, “Fully sampled matrix transducer for real time 3d

ultrasonic imaging,” in Ultrasonics, 2003 IEEE Symposium on, vol. 1, 2003, pp.

945–953 Vol.1.

[38] C. H. Seo and J. Yen, “A 256 x 256 2-d array transducer with row-column

addressing for 3-d rectilinear imaging,” Ultrasonics, Ferroelectrics and Frequency

Control, IEEE Transactions on, vol. 56, no. 4, pp. 837–847, 2009.

[39] O. Oralkan, A. Ergun, J. Johnson, M. Karaman, U. Demirci, K. Kaviani, T. Lee,

and B. Khuri-Yakub, “Capacitive micromachined ultrasonic transducers: next-

generation arrays for acoustic imaging?” IEEE Transactions on Ultrasonics Fer-

roelectrics and Frequency Control, vol. 49, no. 11, pp. 1596–1610, Nov 2002.

[40] I. Wygant, X. Zhuang, D. Yeh, O. Oralkan, A. Ergun, M. Karaman, and

B. Khuri-Yakub, “Integration of 2d cmut arrays with front-end electronics for vol-

umetric ultrasound imaging,” IEEE Transactions on Ultrasonics Ferroelectrics

and Frequency Control, vol. 55, no. 2, pp. 327–342, Feb 2008.

[41] O. Oralkan, “Acoustic imaging using capacitive micromachined ultrasonic trans-

ducer arrays: devices, circuits, and systems,” Ph.D. dissertation, Stanford Uni-

versity, 2004.

[42] I. Wygant, N. Jamal, H. Lee, A. Nikoozadeh, O. Oralkan, M. Karaman, and

B. Khuri-yakub, “An integrated circuit with transmit beamforming flip-chip

bonded to a 2-d cmut array for 3-d ultrasound imaging,” IEEE Transactions

on Ultrasonics Ferroelectrics and Frequency Control, vol. 56, no. 10, pp. 2145–

2156, Oct 2009.

[43] A. Bhuyan, J. W. Choe, B. C. Lee, I. Wygant, A. Nikoozadeh, O. Oralkan, and

B. T. Khuri-Yakub, “3d volumetric ultrasound imaging with a 32x32 cmut array

integrated with front-end ICs using flip-chip bonding technology.” IEEE Inter-

national Solid-State Circuits Conference (ISSCC), Digest of Technical Papers,

Feb 2013, pp. 396–397.

167

[44] J. Zahorian, M. Hochman, T. Xu, S. Satir, G. Gurun, M. Karaman, and

F. Degertekin, “Monolithic cmut-on-cmos integration for intravascular ultra-

sound applications,” IEEE Transactions on Ultrasonics Ferroelectrics and Fre-

quency Control, vol. 58, no. 12, pp. 2659–2667, Dec 2011.

[45] G. Gurun, P. Hasler, and F. L. Degertekin, “Front-end receiver electronics for

high-frequency monolithic cmut-on-cmos imaging arrays,” IEEE Transactions on

Ultrasonics Ferroelectrics and Frequency Control, vol. 58, no. 8, pp. 1658–1668,

Aug 2011.

[46] P. Helin, P. Czarnecki, A. Verbist, G. Bryce, X. Rottenberg, and S. Severi,

“Poly-SiGe-based cmut array with high acoustical pressure.” IEEE International

Conference on Micro Electro Mechanical Systems (MEMS), Jan 2012, pp. 305–

308.

[47] D. Dausch, J. Castellucci, D. Chou, and O. Von Ramm, “Theory and operation

of 2-d array piezoelectric micromachined ultrasound transducers,” Ultrasonics,

Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 55, no. 11, pp.

2484–2492, 2008.

[48] A. Hajati, D. Latev, D. Gardner, A. Hajati, D. Imai, M. Torrey, and M. Schoep-

pler, “Three-dimensional micro electromechanical system piezoelectric ultra-

sound transducer,” Applied Physics Letters, vol. 101, no. 25, pp. 253 101–253 101–

5, 2012.

[49] K. Smyth, S. Bathurst, F. Sammoura, and S.-G. Kim, “Analytic solution for n-

electrode actuated piezoelectric disk with application to piezoelectric microma-

chined ultrasonic transducers,” Ultrasonics, Ferroelectrics and Frequency Con-

trol, IEEE Transactions on, vol. 60, no. 8, pp. 1756–1767, 2013.

[50] P. Muralt, N. Ledermann, J. Paborowski, A. Barzegar, S. Gentil, B. Belgacem,

S. Petitgrand, A. Bosseboeuf, and N. Setter, “Piezoelectric micromachined ul-

trasonic transducers based on pzt thin films,” Ultrasonics, Ferroelectrics and

Frequency Control, IEEE Transactions on, vol. 52, no. 12, pp. 2276–2288, 2005.

168

[51] I. Wygant, “A comparison of cmuts and piezoelectric transducer elements for

2d medical imaging based on conventional simulation models,” in Ultrasonics

Symposium (IUS), 2011 IEEE International, 2011, pp. 100–103.

[52] J. Jensen, “Field: A program for simulating ultrasound systems,” in NordicBaltic

Conference on Biomedical Imaging, 1996.

[53] J. A. Jensen and N. B. Svendsen, “Calculation of pressure fields from arbitrarily

shaped, apodized, and excited ultrasound transducers,” IEEE Transactions on

Ultrasonics Ferroelectrics and Frequency Control, vol. 39, no. 2, pp. 262–267,

Mar 1992.

[54] M. Karaman, I. Wygant, O. Oralkan, and B. Khuri-Yakub, “Minimally redun-

dant 2-d array designs for 3-d medical ultrasound imaging,” Medical Imaging,

IEEE Transactions on, vol. 28, no. 7, pp. 1051–1061, 2009.

[55] B.-H. Kim, T.-K. Song, Y. Yoo, J. H. Chang, S. Lee, Y. Kim, K. Cho, and

J. Song, “Hybrid volume beamforming for 3-d ultrasound imaging using 2-d

cmut arrays,” in Ultrasonics Symposium (IUS), 2012 IEEE International, 2012,

pp. 2246–2249.

[56] J. Song, S. Jung, Y. Kim, K. Cho, B. Kim, S. Lee, J. Na, I. Yang, O.-k. Kwon,

and D. Kim, “Reconfigurable 2d cmut-asic arrays for 3d ultrasound image,” in

SPIE Medical Imaging. International Society for Optics and Photonics, 2012,

pp. 83 201A–83 201A.

[57] B.-H. Kim, Y. Kim, S. Lee, K. Cho, and J. Song, “Design and test of a fully

controllable 64x128 2-d cmut array integrated with reconfigurable frontend asics

for volumetric ultrasound imaging,” in Ultrasonics Symposium (IUS), 2012 IEEE

International, 2012, pp. 77–80.

[58] M. Rasmussen and J. Jensen, “3-d ultrasound imaging performance of a row-

column addressed 2-d array transducer: A measurement study,” in Ultrasonics

Symposium (IUS), 2013 IEEE International, 2013.

169

[59] T. Christiansen, C. Dahl-Petersen, J. Jensen, and E. Thomsen, “2-d row-column

cmut arrays with an open-grid support structure,” in Ultrasonics Symposium

(IUS), 2013 IEEE International, 2013.

[60] M. Rasmussen and J. Jensen, “2-d row-column cmut arrays with an open-grid

support structure,” in Proceedings of SPIE, vol. 8675. SPIE - International

Society for Optical Engineering, 2013.

[61] X. Zhuang, D.-S. Lin, A. Ergun, O. Oralkan, and B. Khuri-Yakub, “P2p-6 trench-

isolated cmut arrays with a supporting frame,” in Ultrasonics Symposium, 2006.

IEEE, 2006, pp. 1955–1958.

[62] D.-S. Lin, R. Wodnicki, X. Zhuang, C. Woychik, K. Thomenius, R. Fisher,

D. Mills, A. Byun, W. Burdick, P. Khuri-Yakub, B. Bonitz, T. Davies,

G. Thomas, B. Otto, M. Topper, T. Fritzsch, and O. Ehrmann, “Packaging

and modular assembly of large-area and fine-pitch 2-d ultrasonic transducer ar-

rays,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on,

vol. 60, no. 7, pp. 1356–1375, 2013.

[63] D.-S. Lin, X. Zhuang, R. Wodnicki, C. Woychik, O. Omer, M. Kupnik, and

B. Khuri-Yakub, “Packaging of large and low-pitch size 2d ultrasonic transducer

arrays,” in Micro Electro Mechanical Systems (MEMS), 2010 IEEE 23rd Inter-

national Conference on, 2010, pp. 508–511.

[64] S. Smith, H. Pavy Jr, and O. von Ramm, “High-speed ultrasound volumetric

imaging system. i. transducer design and beam steering,” Ultrasonics, Ferro-

electrics and Frequency Control, IEEE Transactions on, vol. 38, no. 2, pp. 100–

108, 1991.

[65] O. von Ramm, S. Smith, and H. Pavy Jr, “High-speed ultrasound volumetric

imaging system. ii. parallel processing and image display,” Ultrasonics, Ferro-

electrics and Frequency Control, IEEE Transactions on, vol. 38, no. 2, pp. 109–

115, 1991.

170

[66] J. Bercoff, “Ultrafast ultrasound imaging,” Ultrasound Imaging -

Medical Applications, Prof. Oleg Minin (Ed.), InTech, Available

from: http://www.intechopen.com/books/ultrasoundimaging-medical-

applications/ultrafast-ultrasound-imaging, pp. 3–24, 2011.

[67] O. Couture, M. Fink, and M. Tanter, “Ultrasound contrast plane wave imag-

ing,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on,

vol. 59, no. 12, pp. –, 2012.

[68] G. Montaldo, M. Tanter, J. Bercoff, N. Benech, and M. Fink, “Coherent plane-

wave compounding for very high frame rate ultrasonography and transient elas-

tography,” Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions

on, vol. 56, no. 3, pp. 489–506, 2009.

[69] J. Bercoff, M. Tanter, and M. Fink, “Supersonic shear imaging: a new tech-

nique for soft tissue elasticity mapping,” Ultrasonics, Ferroelectrics and Fre-

quency Control, IEEE Transactions on, vol. 51, no. 4, pp. 396–409, 2004.

[70] S. Nikolov, J. Kortbek, and J. Jensen, “Practical applications of synthetic aper-

ture imaging,” in Ultrasonics Symposium (IUS), 2010 IEEE, 2010, pp. 350–358.

[71] M. Docter, R. Beurskens, G. Ferin, P. Brands, J. Bosch, and N. de Jong, “A ma-

trix phased array system for 3d high frame-rate imaging of the carotid arteries,”

in Ultrasonics Symposium (IUS), 2010 IEEE, 2010, pp. 318–321.

[72] S. Krishnan and M. O’Donnell, “Transmit aperture processing for nonlinear con-

trast agent imaging,” Ultrasonic imaging, vol. 18, no. 2, pp. 77–105, 1996.

[73] A. Nikoozadeh, “Intracardiac ultrasound imaging using capacitive microma-

chined ultrasonic transducer (cmut) arrays,” Ph.D. dissertation, Stanford Uni-

versity, 2010.

[74] A. Nikoozadeh, I. Wygant, D.-S. Lin, O. Oralkan, A. Ergun, D. Stephens,

K. Thomenius, A. Dentinger, D. Wildes, G. Akopyan, K. Shivkumar, A. Ma-

hajan, D. Sahn, and B. Khuri-Yakub, “Forward-looking intracardiac ultrasound

171

imaging using a 1-d cmut array integrated with custom front-end electronics,” Ul-

trasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, vol. 55,

no. 12, pp. 2651–2660, 2008.

[75] D. Yeh, O. Oralkan, I. Wygant, M. O’Donnell, and B. Khuri-Yakub, “3-d

ultrasound imaging using a forward-looking cmut ring array for intravascu-

lar/intracardiac applications,” Ultrasonics, Ferroelectrics and Frequency Control,

IEEE Transactions on, vol. 53, no. 6, pp. 1202–1211, 2006.

[76] C. Tekes, M. Karaman, and F. Degertekin, “Optimizing circular ring arrays

for forward- looking ivus imaging,” Ultrasonics, Ferroelectrics and Frequency

Control, IEEE Transactions on, vol. 58, no. 12, pp. –, 2011.

[77] R. Fisher, K. Thomenius, R. Wodnicki, R. Thomas, S. Cogan, C. Hazard, W. Lee,

D. Mills, B. Khuri-Yakub, A. Ergun, and G. Yaralioglu, “Reconfigurable arrays

for portable ultrasound,” in Ultrasonics Symposium, 2005 IEEE, vol. 1, Sept

2005, pp. 495–499.

[78] R. Fisher, R. Wodnicki, S. Cogan, R. Thomas, D. Mills, C. Woychik,

R. Lewandowski, and K. Thomenius, “Packaging and design of reconfigurable

arrays for volumetric imaging,” in Ultrasonics Symposium, 2007. IEEE, Oct

2007, pp. 407–410.

[79] D. Bianchi, F. Quaglia, A. Mazzanti, and F. Svelto, “A 90Vpp 720MHz GBW

linear power amplifier for ultrasound imaging transmitters in BCD6-SOI.” IEEE

International Solid-State Circuits Conference (ISSCC), Digest of Technical Pa-

pers, Feb 2012, pp. 370–372.

[80] B. Haider, “Power drive circuits for diagnostic medical ultrasound.” IEEE

International Symposium on Power Semiconductor Devices and IC’s, 2006, pp.

1–8.

[81] “MD1712 data sheet: High speed, integrated ultrasound driver IC,” Supertex,

Sunnyvale, CA, USA.

172

[82] “STHV748 data sheet: Quad +/-90V, +/-2A, 3/5 levels, high speed ultrasound

pulser,” STMicroelectronics, Geneva, Switzerland.

[83] “TX734 data sheet: Quad channel, 3-level RTZ, +/-75V, 2A integrated ultra-

sound pulser,” Texas Instruments, Dallas, TX, USA.

[84] L. Svensson and J. Koller, “Driving a capacitive load without dissipating fCV2.”

IEEE Symposium on Low Power Electronics, Digest of Technical Papers, 1994,

pp. 100–101.

[85] K. Kristoffersen and H. Torp, “Method and apparatus for generating a multi-level

ultrasound pulse,” Apr. 4 2006, U.S. Patent 7,022,074.

[86] S.-Y. Peng, M. Qureshi, P. Hasler, A. Basu, and F. Degertekin, “A charge-based

low-power high-snr capacitive sensing interface circuit,” IEEE Transactions on

Circuits and Systems I: Regular Papers, vol. 55, no. 7, pp. 1863–1872, Aug 2008.

[87] S. Berg, T. Ytterdal, and A. Ronnekleiv, “Co-optimization of cmut and re-

ceive amplifiers to suppress effects of neighbor coupling between cmut elements.”

IEEE Ultrasonics Symposium, Nov 2008, pp. 2103–2106.

[88] J. Graeme, Photodiode Amplifiers: Op Amp Solutions. McGraw-Hill, 1995.

[89] I. Cicek, A. Bozkurt, and M. Karaman, “Design of a front-end integrated circuit

for 3d acoustic imaging using 2d cmut arrays,” IEEE Transactions on Ultrasonics

Ferroelectrics and Frequency Control, vol. 52, no. 12, pp. 2235–2241, Dec 2005.

[90] M. S. J. Steyaert, W. M. C. Sansen, and C. Zhongyuan, “A micropower low-

noise monolithic instrumentation amplifier for medical purposes,” IEEE Journal

of Solid-State Circuits, vol. SC-22, pp. 1163–1168, Dec 1987.

[91] “AFE5808 data sheet: Fully integrated, 8-channel ultrasound analog front end

with passive CW mixer,” Texas Instruments, Dallas, TX, USA.

[92] “AD9277 data sheet: Octal LNA/VGA/AAF/14-bit ADC and CW I/Q demod-

ulator,” Analog Devices, Inc., Norwood, MA, USA.

173

[93] “MAX2082 data sheet: Octal ultrasound transceiver with integrated AFE,

pulser, T/R switch, and coupling capacitors,” Maxim Integrated, San Jose, CA,

USA.

174

A Column-Row-Parallel ASIC Architecture for 3D Wearable ...

Documents

Transcript of A Column-Row-Parallel ASIC Architecture for 3D Wearable ...