Nano-Tera First Prize 2015

download Nano-Tera First Prize 2015

of 1

description

Single-Chip 3D Ultrasound Beamforming

Transcript of Nano-Tera First Prize 2015

  • Single-Chip3D Ultrasound Beamforming

    Pascal Alexander Hager 1, Andrea Bartolini 1,2, Luca Benini 1,21 Integrated Systems Laboratory (IIS), ETH Zrich

    2 Electrical, Electronic, and Information Engineering (DEI), Universit di Bologna

    1. Introduction

    Focal Point

    Scanline

    Transducer Element

    Scatterer

    Virtual Source

    t

    Project Goal: Development of a high-performance, low-power signal processing platform for ultrasound imaging targeting future 3D portable ultrasound systems.3D ultrasound systems can achieve ... - volumetric measurements - fast motion capture (e.g., of heat valves) - separation of acquisition and interpretation... but currently still use massive analog preprocessing, making low-power, highly integrated portable devices impossible.A fully-digital architecture promises more flexibility and higher integration but needs to... - employ massively parallel hardware (10'000 channels) - require sophisticated signal processing (238 MFP/s)... at a the power budget of a portable device.

    2. The ProblemUltrasound imaging:

    Geometry Transmission Reception

    The beamforming operation:

    In conventional 2D ultrasound systems, the delays are precomputed and provided from an external memory. This is unfeasible since... - there are 159G delays, which would require 198GB of storage - and an access data-rate of 23.8 Tbit/s for 3D imaging.

    Delay

    4. ResultsThe challenges are to... - compute focal points at a high rate (238MFP/s) - process massive amount of input data (160GB/s) - provide the required delays (2.38TD/s)... on a single chip without external memory.

    Biggest Challenge

    3. Our Solution

    B. Bandpass Processing:

    B

    fc fs/2-fs/2

    B

    fs/2-fs/2

    B

    a) Analog Input Signal b) Critically SampledBaseband Signal

    c) InterpolatedAnalytic Signal

    fs/2-fs/2

    B

    d) Beamformed Undersampled Signal

    fs/2-fs/2

    B

    e) Beamformed Baseband Signal

    fs/2-fs/2

    2B

    f) Envelop Signal

    RF inputTransducers BF inputAFE(incl. ADC)

    BeamformerPre-BF

    Interpolation Delay and Sumdemodulate and

    interpolateEnvelop

    ExtractionBF output

    Post ProcessingImage

    BueredSignal

    Spectrum:

    Processing Chain: MinimizedDatarate

    MinimizedDatarate

    Exploit bandpass properties to process data at the information rate.

    - reduction of focal point and delay computation rate by at least 12x. C. Direct on-chip delay computationCompute delays from the underlying geometry, which can be parametrized with very few constants (36.6kBit).

    A. Highly-scaleable beamforming architecture [1,2]

    Signal Aquisition and Pre-Processing Post Processing

    TransducerMatrix

    ImagingVolume

    ComputedFocal Points

    Final Image

    this work

    Multicore ProcessorSystem

    Beamforming

    AnalogSignals

    digital IQsamples

    Analog FrontendASICBF ASIC

    Beamformer Channel (BFC)

    InterpolationPolyphase Filter

    Buer1024x32bit

    Local Delay Index CalculationModulation

    LUTBFCBFC

    BFCBFCBFC

    BFCBFCBFC

    BFC

    - fully scaleable with the number of channels and target throughput - no external memories for buffering and delays required

    uC Engine

    const mult prodinstr RF ALU

    256x22bit

    64x16bit memmemdouble

    buering

    2x64x48bit

    const512x64bit

    const200x9bit

    1x

    Fast ControlSystem Control

    shot index

    100xprog. LUT

    register 100x

    100x

    1x

    Global Control

    Shared Computations

    Shared Computations (with uC Engine)

    Local Delay Index Computaion:Computations per channel

    (2 Additions, 1 Square-Root)

    - sharing computations in combination with a programmable unit enables area and power efficient on-chip delay computation.

    Synthesis results (28nm SOI) for 4-100 channels:

    - 100 channels: 1.68mm2, 3.4M GE, 303.4mW - linear scaling with the number of channels - shared computations and uC Engine grow irrelevant rapidly - Estimation for 10'000 channels: 1.68cm2, 30.3W

    0% 25% 50% 75% 100%

    PowerArea

    Interp. Modula. Buer LDIC Apod

    Area/Power Distributionwith BFC

    5. Conclusions

    [1] P. Vogel et al., Efficient Parallel Beamforming for 3D Ultrasound Imaging, GLSVLSI 2014, May 2014.[2] P. A. Hager et al., Assessing the area/power/performance tradeoffs for an integrated fully-digital, large-scale 3d-ultrasound beamformer,in Biomedical Circuits and Systems Conference (BioCAS), 2014 IEEE, Oct 2014, pp. 228231.

    Our approach:Highly-scaleable beamforming architectureBandpass processingDirect on-chip delay computation

    Architectural Level:Algorithmic Level:System Level:

    - A fully-digital single-chip beamformer for 3D imaging is feasible - The effort for delay computation can be minimized by sharing computations and using a programmable uC Engine.

    - The beamformer can be retrofitted for efficient high-performance 2D beamforming, by reducing channel count and software update. - Integration of the beamformer in the transducer head will enable portable 2D and 3D ultrasound imaging devices.

    We attack on all design levels