Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

30
Qiuling Zhu, Eric L. Turner, Christian R. Berger, Larry Pileggi, Franz Franchetti September 22, 2011 Application-Specific Logic-in- Memory for Polar Format Synthetic Aperture Radar

description

Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar. Qiuling Zhu, Eric L. Turner, Christian R. Berger, Larry Pileggi , Franz Franchetti September 22, 2011. Application-Specific Logic-in-Memory. - PowerPoint PPT Presentation

Transcript of Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Page 1: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Qiuling Zhu, Eric L. Turner, Christian R. Berger, Larry Pileggi, Franz Franchetti

September 22, 2011

Application-Specific Logic-in-Memory forPolar Format Synthetic Aperture Radar

Page 2: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 2 Slide 2

Application-Specific Logic-in-Memory

Can we push some memory-intensive computational logic into or close to the memory by constructing a smart and efficient “Logic in memory” block ?

Traditional:CPUMain

MemoryLocal

Memory

Main Memory

logicCPU

Local Memory

logic

Logic-in-memory:

Page 3: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 3 Slide 3

Enabling Technology: Regular Patterns

Regular patterns

Application-specific “Magic” memory

Compatible Logic

SRAM bitcell

Compatible logic cells

Implementing sub-22nm designs using a limited set of pattern constructs can enable robust compilation of smart memories

D. Morris, et. al, “Design of Embedded Memory and Logic Based On Pattern Constructs” , Symp.VLSI Technology, June 2011.

Page 4: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 4 Slide 4

Tool Chain: Chip Generator and Memory Compiler

logic

Chip GeneratorSRAM bitcell

Compatible logic cells

App-specific logic-in-memory

Smart Memory Compiler

Chip Generator Generates designs from high-level parameterization and specification Utilizes Stanford’s chip generator platform (Genesis 2)

Smart Memory Compiler Map memory and logic onto a set of pre-characterized pattern constructs Allow flexible synthesis of logic and memory functionalities in place of hard IP

Local Memory

Logic

Logic in Memory

Page 5: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 5 Slide 5

Big Question: Impact on Algorithms

Logic-in-memory Local data dependency Regular memory access pattern Simple computational logic Cost measure changes

Traditional Data storage and processing are logically and physically split Algorithms are optimized w.r.t. cost measure as Operation count, minimum number of memory accesses, reuse,… eg. FFT: O(log n), Matrix Multiplication: O(n)

Logic-in-memory changes the relative cost of operations, requiring new types of algorithms.

Page 6: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 6 Slide 6

Case Study: Interpolation Memory

ALU

ALU

Original Phantom image

x

level k

level k-1

level k-2

Ex 4: Tomography Backprojection Ex 3: Geometry Transformation

Ex 1: FFT Twiddle Factor Ex 2: Image Pyramid Memory

Page 7: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 7 Slide 7

Outline

SAR Polar Format Algorithms for Logic-in-Memory

Extension: Partial Reconstruction

Implementation and Design Automation

Experimental Results

Summary

Page 8: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 8 Slide 8

Synthetic Aperture Radar (SAR)Data acquisition

SAR image formation

Interpolation

2D FFT

Image formation

Page 9: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 9 Slide 9

FFT Upsampling Based Polar Reformatting

GridInterpolation Inverse 2D FFT

I is the number of segments per range line, m is the input segment size and n is the size of the upsampled output segment.

Computational cost: Interpolation: 10lm1·(m·log2(m) + n·log2(n))

2D IFFT: 10·n22·log2(n2)

SAR image formation: Range interpolation

•FFT upsampling based Cross range

interpolation 2D inverse FFT

n2

n2

m1

m1

n2

n2

Logic-in-Memory Interpolation• Needs new algorithm

Data transferring cost:

Memory CPUInterpolation

Page 10: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 10 Slide 10

Local Interpolation Based Polar Reformatting

Grid points in Curvilinear grid (measurements)

Grid points in Cartesian space (outputs)

P(x,y)

Approach: direct local interpolation

sqrt, atan operations are expensive in Logic-in-memory

Finding neighbors is expensive

Page 11: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 11 Slide 11

Local Interpolation Based Polar Reformatting

Coordinate transformation• Four-corner image perspective geometric transformation• Avoid sqrt and atan

2D surface interpolation• Simple logic computation• bilinear, bicubic,…

Steps:

sqrt, atan… Grid points in Curvilinear grid (measurements)Grid points in Cartesian space (outputs)

P(x,y)

(+, -,×…)

dx

dy

Page 12: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 12 Slide 12

2D Interpolationdx

i, j i, j+1

i+1, ji+1, j+1

dy P(x,y)

Bilinear Interpolation

i-1, j-1 i-1, j i-1, j+1 i-1, j+2

i, j-1 i, j i, j+1 i, j+2

i+1, j-1 i+1, j i+1, j+1 i+1, j+2

i+2, j-1 i+2, j i+2, j+1 i+2, j+2

dx

dy

P(x,y)

Bicubic Interpolation

Nearest Neighbor

i, j

Dividable 2D interpolation• Bilinear: (2 horizontal + 1 vertical) 1D interpolations • Bicubic: (4 horizontal + 1 vertical) 1D interpolations• 1D interpolation: Newton divided difference form based polynomial interpolation

Suitable for Logic in Memory• Localized computation: Outputs are only decided by their neighbors• Regular memory access: Continuous or block data array access• Simple computational logic: Adders, subs, boolean operations …

Page 13: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 13 Slide 13

Tiling: Accurate Geometry Approximation

Tile1 Tile2

Tile3 Tile4

Geometry approximation conditions: deltawidth is small enough RL is large enough

RL

deltawidth

K

Solution: Image tiling

error

Tile in the Cartesian grid Output oriented tiling Easy to identify boundary and tile overlap

Page 14: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 14 Slide 14

Outline

SAR Polar Format Algorithms for Logic-in-Memory

Extension: Partial Reconstruction

Implementation and Design Automation

Experimental Results

Summary

Page 15: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 15 Slide 15

SAR Partial Reconstruction Scenario: Big image, small screen, pan-and-zoom (e.g. handheld device) Bad approach: reconstruct everything, display only region of interest Better: reconstruct only what will be displayed

requires sophisticated filtering before reconstruction

Image data10,000 × 10,000

Display800× 600

Partial image formation

Interpolation + Filtering 2D FFT

Partial Image formation

Page 16: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 16 Slide 16

Partial Reconstruction I Reconstructs and displays low-resolution full-size image• Traditional: Interpolate all, full-size large IFFT then decimation• Alternative: Partial interpolation then smaller-size IFFT• Theory behind: Multiplication in the Frequency is identical to convolution in the spatial space.

Low pass filtering In the spatial domain

cut off high frequencies in Fourier space

only computes the pixels that are required!

Smaller-size interpolation

Smaller-size IFFT

Page 17: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 17 Slide 17

Partial Reconstruction IIReconstructs and displays a high-resolution image portion • Traditional: Full-size large IFFT, reconstruct all then cut off unnecessary region• Alternative: Decimation filtering and then smaller-size IFFT• Theory behind: Multiplication in the space is identical to convolution in the Fourier domain. Displacement in time is equivalent to phase shift

smaller IFFT

FFT sample

interpolatedecimation

filter

ROI

Logic inMemory

Page 18: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 18 Slide 18

Decimation Filter Implementation

0 5 10 15

-120

-100

-80

-60

-40

-20

0

Frequency (Hz)

Mag

nitu

de (d

B)

Magnitude Response (dB)

ciccompCICcascade

CIC Spec: Decimation factor = 16; N = 4; M= 1 CIC Comp Spec: Fp = 0.45; Fst = 0.55; Ap = 0.1dB, Ast = 35dB; 45 stages; downsample = 2 ; total decimation factor = 32 ;

Frequency Response:

FIR Polyphase filter is expensive at high decimation factors Cascaded Integrated Comb(CIC) filter is more economical

• Large decimation factors • No multiplication • CIC compensation is required

z-1 z-1 z-1 z-1

R

z-M z-M z-M z-M

inp

outp

M=1N=4CIC filter

structure

Page 19: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 19 Slide 19

Outline

SAR Polar Format Algorithms for Logic-in-Memory

Extension: Partial Reconstruction

Implementation and Design Automation

Experimental Results

Summary

Page 20: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 20 Slide 20

Design Automation and Optimization

Customized Parameters

Target + Budget

Performance Model

Performance/Cost Report

Regular Pattern

Smart memory Compiler

RTL Design(memory/logic

mixed)

Design Space Exploration

Code Generator

Hardware Structure Design Automation Flow:

Page 21: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 21 Slide 21

Chip Generator

http://genesis.web.ece.cmu.edu/gui/scratch/mydesign-10545.php

Reference: O. Shacham, O. Azizi, M. Wachs, et. al, "Rethinking Digital Design: Why Design Must Change”, Micro, IEEE, Dec 2010.

Page 22: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 22 Slide 22

Outline

SAR Polar Format Algorithms for Logic-in-Memory

Extension: Partial Reconstruction

Implementation and Design Automation

Experimental Results

Summary

Page 23: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 23 Slide 23

original

linear cubic

hermitian image

FFT-based

Actual reconstruction algorithms

Is FFT-based SAR better than interpolation-based SAR?

Perfect reconstruction of point targets

Reconstruction Quality vs. FFT SAR

Page 24: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 24 Slide 24

Can FFT and Interpolation Be Distinguished?

Answer: Hypothesis TestingHypothesis testing for linear and FFT: P(Error) = 0.495Random guessing: P(Error) = 0.5

Results are statistically indistinguishable. Interpolation is as good as FFT

bilinear interpolation

FFT interpolation nearest neighbor interpolation

bicubic interpolation

Page 25: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 25 Slide 25

Accuracy Improvement Through Tiling

00.0020.0040.0060.008

0.010.0120.0140.0160.018

0.02

Mean Square Error relative to Gold Standard Method

One-tile

4-tiles16-tiles

Nearest Neighbor Bilinear Bicubic

Mean square error vs. interpolation methods for different tile numbers

MSE decreases with more tiling and higher interpolation order

Page 26: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 26 Slide 26

Energy Saving for Logic-in-Memory

Energy saving increases with the increasing of problem size

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

1.00E+12

size32×32 size64×64 size128×128 size256×256 size512×512

Energy Saving for SAR PFA Grid Interpolation

CPU_centricLogic_in_Memory

Energy(nJ) vs. SAR image size

Page 27: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 27 Slide 27

Accurate Region-of-Interest by Sacrificing Border

0

1

2

3

4

5

6

7

8

9

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Decimation Filter Hardware Cost with ROI Factors

ast=15dBast=20ddBast=25dBast=30dBast=35dB

Area[1000um2]vs. Region of Interest(ROI) , decimation factor = 2

error

Imperfect image edge is resulting from non-steep filter transition region

ast: decimation filter stopband attenuation (dB)

Page 28: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 28 Slide 28

Partial Reconstruction: Operation saving vs. Cost

IFFT operation counts decreases exponential with increasing decimation factors Logic hardware cost is negligible compared with memory cost Decimation filter cost slightly increases when increasing decimation factors

0.00E+00

5.00E-05

1.00E-04

1.50E-04

2.00E-04

2.50E-04

3.00E-04

3.50E-04

4.00E-04

0 20 40 60 80 100 120 140

Logic in Memory Hardware Cost

Grid Interpolation + Decimation Filter(Beta=0.3,Ast=25dB)Grid Interpolation + Decimation Filter(Beta=0.3,Ast=35dB)Grid Interpolation + Decimation Filter(Beta=0.2, Ast=35dB)Grid Interpolation

Logic area/memory area vs. decimation factor

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

0 20 40 60 80 100 120 140

2D IFFT Computational Cost vs Decimation FactorOperation count vs. decimation factor, SAR image size = 4K×4K

Beta: filter rolloff factors ; Ast: decimation filter stopband attenuation (dB)

Page 29: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 29 Slide 29

Outline

SAR Polar Format Algorithms For Logic-in-Memory

Extension: Partial Reconstruction

Implementation and Design Automation

Experimental Results

Summary

Page 30: Application-Specific Logic-in-Memory for Polar Format Synthetic Aperture Radar

Slide 30 Slide 30

SummaryLogic in Memory and its applications for interpolation

Evaluation and integration with Genesis2

Logic in Memory for SAR FPA and partial reconstruction

Tile1 Tile2

Tile3 Tile4

0 5 10 15

-120

-100

-80

-60

-40

-20

0

Frequency (Hz)

Mag

nitu

de (d

B)

Magnitude Response (dB)

z-1 z-1 z-1 z-1

R

z-M z-M z-M z-M

inp

outp

0

1

2

3

4

5

6

7

0 20 40 60 80 100 120 140

Decimation Filter Hardware Cost

Beta=0.3,Ast=25dBBeta=0.3,Ast=35dBBeta=0.2, Ast=35dBPolar-to-Rect_Interpolation

Area[1000um2] vs. Decimation Factor

Local Memory