Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

18
© 2016 OpenPOWER Foundation Leveraging OpenPOWER and FPGAs to Accelerate Machine Vision and Learning Lisa Liu, Michaela Blott, Giulio Gambardella, Ken O’Brien Xilinx Research

Transcript of Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Page 1: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

© 2016 OpenPOWER Foundation

Leveraging OpenPOWER and FPGAs to Accelerate Machine Vision and Learning

Lisa Liu, Michaela Blott, Giulio Gambardella, Ken O’BrienXilinx Research

Page 2: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Agenda

• Introduction

• Performance characterization & estimation

• Machine vision & learning

• Summary

© 2016 OpenPOWER Foundation 2

Page 3: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

© 2016 OpenPOWER Foundation

Introduction

3

Page 4: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Xilinx Research - Ireland

Page 4

- 7 researchers + students & visiting scholars

- 2 in university program

- Est. 10 years ago

Applications & Architectures:Through application-driven technology development with customers, partners, and engineering & marketing

Page 5: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Architecture Trends: Heterogeneous Computing

• End of Moore’s law and Dennard scaling• Technology node scaling becomes increasingly difficult

• Power cost eclipses CAPEX and heat dissipation is problematic

• Resulting economics are questionable (performance/$)

• Heterogeneous computing• Provides further performance scaling

• Reduces power consumption

• CAPI• Provides a coherent way to integrate accelerators into POWER8

• Enables users to build POWER-based heterogeneous platforms

© 2016 OpenPOWER Foundation 5

Page 6: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Ease of Use:New Design Environments

© 2016 OpenPOWER Foundation 6

• ISE, RTL-based design entry with IP library

Legacy

• Vivado HLS

• SDNet (DSL PX)

• Block stitching and manual integration in platform in RTL

Raised abstraction for accelerators

• SDx

• SNAP

• Predefined methods for data transfer & automated implementation

Simplified host integration & automated infrastructure creation

Time

Ab

straction

Page 7: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

© 2016 OpenPOWER Foundation

Performance Characterization & Estimation

7

Page 8: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Rooflines for Machine Vision & Learning

© 2016 OpenPOWER Foundation 8

compute bound

Page 9: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Rooflines for Machine Vision & Learning

© 2016 OpenPOWER Foundation 9

Page 10: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

© 2016 OpenPOWER Foundation

Machine Vision & Learning

10

Page 11: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Video Scaler

• A functionality to resize video or image via interpolation

• A basic and popular functionality used in data centers

11

Page 12: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

CAPI Based Video Scaler for Power8

© 2016 OpenPOWER Foundation 12

• Acceleration of video scaling task with Xilinx IP

• Fully integrated with FFMpeg on Power8

• Prototype results• 1280x720 1920x720

• Video scaler implemented in SW: 3.8 frames/sec

• Video scaler implemented in FPGA: 59 frames/sec

• Speedup: ~15X

PCIEPOWER8 CAPP PSL SHIMXilinx Polyphase

Video Scaler

ADM-PCIE-7V3

Implemented in C/C++

scale up

Page 13: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

• Convolutional neural networks are the predominant algorithm for machine learning• Achieving superhuman accuracy

• CNNs have high compute and memory requirements• AlexNet requires 1.456 GOps/frame and 244MB of storage

• Reduced precision networks emerge• 50x reduction in modelsize for 6b at no loss in accuracy [1,2]

• Binary networks show a small loss for large networks [3]

Page 13

Convolutional Neural Networks

Source:

[1] Iandola et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size." (2016).

[2] Forrest N. Iandola, Khalid Ashraf, Matthew W. Moskewicz, Kurt Keutzer “FireCaffe: near-linear acceleration of deep neural network training on compute

clusters”

[3] Sung et al., “Resiliency of Deep Neural Networks Under Quantization”, ICLR’16 (fully connected network layers for phoneme recognition)

Page 14: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

CAPI-Based Neural Network Accelerator

© 2016 OpenPOWER Foundation 14

ADM-PCIE-7V3

POWER8 SHIM

AFU

Streaming

Threshold Threshold Threshold

PSL

Implemented in C/C++

CAPP PCIE

Page 15: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Prototype Results

© 2016 OpenPOWER Foundation 15

CIFAR10 BNN• Input images: 32x32 pixels, RGB image

• 2 (3x3) Conv + Max Pool + 2 (3x3) Conv + Max Pool + 2 Conv + Max Pool + 3 FC

MNIST BNN• Input images: 28x28 pixels, black and white

• Topology: 3 FC layers, 256 neurons each

First prototype results for BNN image classification• Demonstrate order of magnitude improvement at classification rates

Network # OPS / Image FPS - HW TOPS/sec

CIFAR10 BNN 1.2G 6.3k 7.8

MNIST BNN 668k 2.21M 1.4

Page 16: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

© 2016 OpenPOWER Foundation

Summary

16

Page 17: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Summary

• Industry trends in computing platforms• Heterogeneous

• Tightly integrated accelerators

• CAPI provides an easy to integrate FPGAs with POWER8 to build heterogeneous computing platforms

• Platform characterization and benchmarking show• Heterogeneous computing platforms can bring significant acceleration for machine vision &

learning algorithms

• Prototype results demonstrate:• 15x speed-up for video scaler

• Orders of magnitude improvements in classification rates through the use of custom data types and architectures on FPGAs

© 2016 OpenPOWER Foundation 17

Page 18: Leveraging OpenPOWER and FPGAs to Accelerate Machine ...

Q&A

© 2016 OpenPOWER Foundation 18