Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound

42
Sonic Millip3De: Massively Parallel 3D Stacked Accelerator for 3D Ultrasound Richard Sampson * Ming Yang Siyuan Wei Chaitali Chakrabarti Thomas F. Wenisch * * University of Michigan Arizona

description

Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound. Richard Sampson * Ming Yang † Siyuan Wei † Chaitali Chakrabarti † Thomas F. Wenisch * * University of Michigan † Arizona State University. Portable Medical Imaging Devices. - PowerPoint PPT Presentation

Transcript of Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound

Page 1: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Sonic Millip3De:Massively Parallel 3D Stacked Accelerator for

3D Ultrasound

Richard Sampson* Ming Yang† Siyuan Wei† Chaitali Chakrabarti† Thomas F. Wenisch*

*University of Michigan †Arizona State University

Page 2: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

2

Portable Medical Imaging Devices

• Medical imaging moving towards portability– MEDICS (X-Ray CT) [Dasika ‘10]

– Handheld 2D Ultrasound [Fuller ‘09]

• Not just a matter of convenience– Improved patient health [Gunnarsson ‘00, Weinreb ‘08]

– Access in developing countries• Why ultrasound?

– Low transmit power [Nelson ‘10]

– No dangers or side-effects

Page 3: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

3

Handheld 3D Ultrasound

• 3D has numerous benefits over 2D– Easier to interpret images– Greater volumetric accuracy

• … as well as many challenges– 12k transducers, 10M image points

• 10-20x beyond state of the art– High raw data bandwidth (6Tb/s)

• Major bottleneck in state of the art– Tight handheld power budget (5W)

Page 4: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

4

Why a Custom Accelerator?

• Software algorithms load/store intensive– von Neumann designs inefficient

• Large system would require over 700 DSPs– General purpose CPUs even less efficient

Architecture Energy/Scanline(1 fps)

Single CoreTime/Scanline

Intel Core i7-2670 25.08J 4.46sARM Cortex-A8 33.04J 132.18sTI C6678 DSP 2.84J 2.27s

Page 5: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

5

Contributions

• Iterative delay calculation algorithm– Reduces storage by over 400x– Enables streaming data flow

• Sonic Millip3De design– Leverages 3D die stacking technology– Transform-select-reduce accelerator framework

• Power and image analysis of Sonic Millip3De– Negligible change in image quality– Able to meet 5W power budget by 11nm node

Page 6: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

6

Outline

• Introduction• Ultrasound background• Algorithm design• System design

– Sonic Millip3De– Select Sub-Unit

• Results and analysis• Conclusions

Page 7: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

7

Ultrasound: Transmit and Receive

Receive Raw Channel Data

ImageSpace

FocalPoints

ReceiveTransducer

TransmitTransducer

𝜏

Page 8: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

8

𝜏

Page 9: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

9

𝜏

Page 10: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

10

𝜏

Page 11: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

11

𝜏

Page 12: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

12

𝜏

Page 13: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

13

𝜏

Page 14: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

14

𝜏

Page 15: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

15

𝜏

Page 16: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

16

𝜏

Page 17: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

17

𝜏

Page 18: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

18

𝜏

Page 19: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

Ultrasound: Transmit and Receive

19

𝜏

Page 20: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

20

Ultrasound: Transmit and Receive

Each transducer stores array of raw receive data

𝜏

Page 21: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

21

Ultrasound: Image Reconstruction

Image reconstructed from data based on round trip delay

Page 22: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

22

Ultrasound: Image Reconstruction

Images from each transducer combined to produce full frame

Page 23: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

23

Delay Index Calculation

• Iterate through all image points for each transducer and calculate delay index

• Often done with lookup tables (LUTs) instead• 50 GB LUT required for target 3D system

𝜏 𝑃

𝑃

Page 24: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

24

Challenges of Handheld 3D Ultrasound

• Delay index LUT requires too much storage– New iterative algorithm reduces necessary

constant storage by 400x• Peak raw data bandwidth (6Tb/s) infeasible

– Sub-aperture multiplexing reduces peak data rate, but requires more transmits

• Handheld power budget very tight (5W)– 3D stacked, highly parallel data streaming design

reconstructs images efficiently

Page 25: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

25

Iterative Delay Index Calculation

• Deltas between adjacent focal points on a scanline form smooth curve

• Fit piecewise quadratic approx. to delta function

• Two sections sufficient for negligible error

Section 1 Section 2

Page 26: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

26

Sub-aperture Multiplexing

• Peak raw data bandwidth (6Tb/s) infeasible• Solution: sub-aperture multiplexing

– Transmit multiple times from same location– Receive with subset of transducers (sub-aperture)– Sum images together

• Prior work: reduce data rate• Our design: also reduces HW

and power requirements

Page 27: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

27

System Design

Page 28: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

28

System Design

Sonic Millp3De comprises 1,024 parallel pipelines

Page 29: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

29

System Design: Transducers

Interchangeable CMOS transducer layer; can use older process

Page 30: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

30

System Design: ADC/Storage

Separate storage layer to reduce wire lengths

Page 31: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

31

System Design: Transform-Select-Reduce

Accelerator units in fast, low power process

Page 32: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

32

Select Sub-Unit Design

Selects sample closest to each focal point using our algorithm

Page 33: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

33

Select Sub-Unit Design

All delays for a scanline estimated using 9 constants

Section 1 Section 2

Page 34: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

34

Select Sub-Unit Design

Adders calculate next iteration of quadratic approximation

A(n+1)2 + B(n+1) + C = (An2 + Bn + C) + 2An + (A+B)

Section 1 Section 2

Page 35: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

35

Select Sub-Unit Design

Decrementor selects sample for next image focal point

Section 1 Section 2

Page 36: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

36

Select Sub-Unit Design

Section decrementor indicates when to change constants

Section 1 Section 2

Page 37: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

37

Outline

• Introduction• Ultrasound background• Algorithm design• System design

– Sonic Millip3De– Select Sub-Unit

• Results and analysis• Conclusions

Page 38: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

38

System ParametersParameters Value

Sub-apertures 12Transmit Sources 16

Transmits per Frame 192Transducers per Sub-aperture 1,024

Total Transducers 12,288Storage per Transducer 4,096 x 12 bits

Focal Points per Scanline 4,096Image Depth 6 cm

Image Angular Width π/4Sampling Frequency 40 MHzInterpolation Factor 4x

Interpolated Sampling Frequency (fs) 160 MHzSpeed of Sound (tissue) 1,540 m/s

Target Frame Rate 1 fps

Page 39: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

39

Image Quality Comparison

Ideal Our Design (12 bit)

Our design has negligible difference from ideal system

11 bit

Bits Ideal 14 13 12 11 10CNR 2.972 2.942 2.960 2.942 2.536 2.233

Simulations using Field II [Jensen ‘92, ‘95]

Page 40: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

40

Power Analysis and Scaling

45 32 22 16 110

5

10

15

20DRAMMemory InterfaceNetwork WiresAcceleratorSRAMADCTransducers

Technology Node

Pow

er (W

)

Can meet 5W by 11nm node

Page 41: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

41

Conclusions

• 3D die stacked Sonic Millip3De design is able to meet 5W power budget by 11nm

• Algorithm/HW co-design enables order-of-magnitude gains– Power and output quality goals often in conflict– Need guidance from domain experts to balance

• Architects have much to offer for application-specific system designs

Page 42: Sonic Millip3De : Massively  Parallel 3D Stacked  Accelerator for  3D Ultrasound

42

Questions?

Special thanks to:

Brian FowlkesOliver KripfgansRon Dreslinski