Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · [email protected] +41 43 456 16...

16
1 Supercomputing Systems AG Phone +41 43 456 16 00 Technopark 1 Fax +41 43 456 16 10 8005 Zürich www.scs.ch Vision trifft Realität. Hardwareaspekte Deep Learning - FPGAs für Computer Vision im Auto Workshop at University of Applied Sciences Ulm 11.7.2017 Felix Eberli, Department Head Embeded & Automotive 12 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC But already in series as many driver assistant systems ADC (Distronic) Blind spot detection Break assist • Pedestrian detection Park pilot Stop & Go Pilot Highway Pilot (steering assist) • :. • Lets see

Transcript of Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · [email protected] +41 43 456 16...

Page 1: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

1

Supercomputing Systems AG Phone +41 43 456 16 00

Technopark 1 Fax +41 43 456 16 10

8005 Zürich www.scs.ch

Vision trifft Realität.

Hardwareaspekte Deep

Learning - FPGAs für

Computer Vision im Auto

Workshop at University of Applied Sciences Ulm 11.7.2017

Felix Eberli, Department Head Embeded & Automotive

12 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

But already in series as many driver assistant systems

• ADC (Distronic)

• Blind spot detection

• Break assist

• Pedestrian detection

• Park pilot

• Stop & Go Pilot

• Highway Pilot (steering assist)

• :.

• Lets see ☺

Page 2: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

2

13 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Application “Traffic Jam Pilot”

• Steering assistant

allows autonomous

driving in traffic

jams up to 30km/h,

assisting above.

green: Radar-Objects Object-Position via 6D-Vision

Sensor view for

an Urban Drive

15 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

SCS company profile

• Founded 1993 and privately owned by Prof. Dr. Anton Gunzinger

• 100+ employees:

Electrical engineers

Software engineers

Physicists

Mathematicians

• Company offices at Technopark Zurich, Switzerland

Page 3: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

3

16 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

SCS Services

Departments

• Embedded & Automotive

• Life Science / Medical

• High Performance Safety

• Embedded

• High Performance Computing

• SW / Public Transport

• SW / Broadcast

• Measure & Decide

Embedded & Automotive

• Feasibility studies

• Hardware (Specification, Design, Schematics, Layout, Production)

• Firmware/IP (FPGA, DSP, GPU, CPU)

• Software (Drivers, Host SW – Windows/Linux)

• Optimizations (ARM , Neon, DSP, EVE, GPU, R-CAR, PC SSE)

17 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

SCS Embedded & Automotive Department

Page 4: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

4

19 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

The Principle of Stereo Vision

2008 world-wide first real-time

implementation of Semi-Global

Matching on an automotive compliant

FPGA

S.Gehrig, F.Eberli, T.Meyer, “A Real-time Low-Power Stereo Vision Engine Using Semi-Global

Matching”,

ICVS 2009 (Best Paper Award)

22 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

SCS Stereo Vision Evaluation Plattform

Page 5: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

5

24 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Example measurement accuracy 3:

Distance measured = 10.549m +/- 0.022m (Baselength = 25cm)

25 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

SCS Video Injection System, Multi camera record, replay and HIL

Page 6: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

6

33 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Current research focuses

on a deeper understanding of the scene

• https://www.cityscapes-dataset.com/examples/

35 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Meine Diplomarbeit vor 20 Jahren –

Neuronales Netz auf DSP portieren

Page 7: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

7

36 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Deep Learning -> CNN

• Convolutional Neural Network

Scene LabelingObject Detection

39 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Rechenbeispiel – Wie lese ich Marketingfolien K

1x 32 Bit Multiplier + Adder @ 1 THz

⇒ 1 TMACC

⇒ 2 TOPS (32bit)

⇒ 8 DLTOPS (8bit)

⇒ 16 TOPS peak (4bit)

• Bandbreite / Partitionierung => Nur 30-70% benutzbar

• Auch NOPS sind OPS

• Typischer Stromverbrauch

• Batch mode? => Latenz:

Page 8: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

8

40 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Berechnung von CNNs: Ablauf

Layer

1classifier

layer

Layer

2

Eingangsbild Scene Labeling

modernes Deep CNN: 5 – 152 Layer

Layer

N...

Source: Chen et al., “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for CNNs”, MIT 2016

https://www.cityscapes-dataset.com/

41 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Berechnung von CNNs: Ablauf

Layer

1classifier

layer

Layer

2

low-level

features

Layer

N

high-level

features

...

Eingangsbild Scene Labeling

Source: features: https://arxiv.org/pdf/1311.2901v3.pdf

Page 9: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

9

42 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Layer

1classifier

layer

Layer

2

Layer

N...

Berechnung von CNNs: Ablauf

Eingangsbild Scene Labeling

Faltung Aktivierung Normalisierung Pooling

43 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Berechnung von CNNs: Ablauf

Layer

1classifier

layer

Layer

2

Layer

N...

Eingangsbild

Aktivierung Normalisierung Pooling

Scene Labeling

Faltung

90 – 99% von Rechenaufwand + Laufzeit

Page 10: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

10

44 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Komplexität von CNN-Architekturen

Beispiel Scene Labeling, GoogLeNet

1 Million

«Mega»

1 Milliarde

«Giga»1 Billion

«Tera»

1 Tausend

256x256

MACC/s

45 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Komplexität von CNN-Architekturen

1 FPS

1920x1080256x256

1 Million

«Mega»

1 Milliarde

«Giga»1 Billion

«Tera»

1 Tausend

MACC/s

Beispiel Scene Labeling, GoogLeNet

Page 11: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

11

46 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Komplexität von CNN-Architekturen

1 FPS

1920x1080256x25630 FPS

1920x1080

1 Million

«Mega»

1 Milliarde

«Giga»1 Billion

«Tera»

1 Tausend

«Kilo»

MACC/s

• minimale Anzahl Rechnungen + Memory Transfers

• bei perfekter Parallelisierung + Data Reuse (kein Tiling)

Beispiel Scene Labeling, GoogLeNet

47 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Komplexität von CNN-Architekturen Image Classification 256x256

MACCs: 100 Millionen – 100 Milliarden

Memory: 10 MB – 1’000 MB

Source: D. Gschwend, “ZynqNet : An FPGA-accelerated Embedded Convolutional Neural Network”, ETHZ 2016

Page 12: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

12

48 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Komplexität von CNN-Architekturen Scene Labeling 1920x1080

MACCs: 100 Mio. – 100 Mia. 10 Mia. – 1’000 Mia. = 1 TMACC

Memory: 10 MB – 1’000 MB 100 MB – 10’000 MB

Source: D. Gschwend, “ZynqNet : An FPGA-accelerated Embedded Convolutional Neural Network”, ETHZ 2016

x10

49 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Hardware-Plattformen für Deep Learning

?

Page 13: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

13

50 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Name Typ Peak MACC/s Mem BW Peak Power

GPU NVidida Drive PX2 float 4’000 Mia. 80 GB/s 80 W

GPU NVidida Titan X float 3’000 Mia. 300 GB/s 250 W

FPGA Kintex XCKU115 int16 3’000 Mia. 50 GB/s 50 ... 100 W

FPGA Arria GX660 float 2’500 Mia. 50 GB/s 50 ... 100 W

VPU Movidius Myriad 2 int16 1000 Mia. 400 GB/s 2 W

GPU NVidia Tegra X1 float 250 Mia. 25 GB/s 10 W

CPU Core i7-6700K float 250 Mia. 30 GB/s 90 W

ASIC Origami int12 150 Mia. 0.5 GB/s 0.7 W

DSP TI C6678 float 80 Mia. 20 GB/s 15 W

ASIC Eyeriss int16 40 Mia. 0.1 GB/s 0.3 W

Scene Labeling GoogLeNet,

1920 x 1080 Pixel, 30 FPS:

2’000 Milliarden = 2 Tera MACCs pro Sekunde

50 GB/s Memory Bandbreite

Hardware-Plattformen für Deep Learning: Rechenkapazitäten

http://wccftech.com/nvidia-drive-

px2-pascal-gtc-2016/

http://www.geforce.com/hardware/

desktop-gpus/geforce-gtx-titan-x/specifications

http://www.xilinx.com/products/technology/dsp.html

https://www.xilinx.com/products/technology/

memory-interfacing.html

https://www.altera.com/products/fpga/features/dsp/

arria10-dsp-block.html

http://goo.gl/xBdTrV

http://uploads.movidius.com/1441734401-

Myriad-2-product-brief.pdf

http://browser.primatelabs.com/geekbench3/7309149

http://international.download.nvidia.com/pdf/tegra/

Tegra-X1-whitepaper-v1.0.pdf

http://people.csail.mit.edu/emer/slides/

2016.02.isscc.eyeriss.slides.pdf

http://asic.ethz.ch/2014/Origami.html

http://www.ti.com/lit/an/sprabk5b/sprabk5b.pdf

52 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

FPGAfloat

FPGAint

GPUTitan X

GPUDrive PX2

ASICeyeriss

ASICorigami

VPU

DSP

CPU

GPUTegra X1

Hardware-Plattformen für Deep Learning: Rechenkapazitäten

power

speed

1W

10W

100W

1’000W

10 Mia. 100 Mia. 1’000 Mia.

«Tera»

10’000 Mia. MACC/s

Implementationsverlust

Page 14: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

14

53 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

FPGAfloat

FPGAint

GPUTitan X

GPUDrive PX2

ASICeyeriss

ASICorigami

VPU

DSP

CPU

GPUTegra X1

Hardware-Plattformen für Deep Learning: Rechenkapazitäten

power

speed

1W

10W

100W

1’000W

10 Mia. 100 Mia. 1’000 Mia.

«Tera»

10’000 Mia. MACC/s

256x25630 FPS

1920x1080

1 FPS

1920x1080

54 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Entwicklungen in Zukunft

“wilder Westen”, noch ein paar Jahre

- sehr viel Bewegung

- grosse Player, monatlich neue Forschungsresultate

optimale Hardware-Plattform für CNNs noch unklar

- Konvergenz von GPU, DSP, VPU, FPGA

- spezialisierte Beschleuniger (fixed-point integer, direct convolution)

- ebenso wichtig: Frameworks, Hersteller-Support, Lizenzbedingungen

Page 15: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

15

55 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Entwicklungen in Zukunft

Abstimmung der Netze auf Zielplattform

Neue Bausteine mit für CNN optimierten Beschleunigern

- noch viel mehr zu erzählen :

� NDA - mit üblichen Verdächtigen und/oder mit SCS Kontakt aufnehmen.

56 Zürich 06.07.2017 © by Supercomputing Systems AG PUBLIC

Komplexität von CNN-Architekturen: Mensch vs. Maschine

• Mehr als ½ des Hirns beschäftigt sich mit Sehen

• Hirn ist 1’000’000x Energie-effizienter

Source: Yu Wang, Tsinghua University, Feb 2016;

https://www.quora.com/How-much-of-the-brain-is-involved-with-vision

10’000 Tera Op/s 80 Tera Op/s* * IBM Watson, 2012

Page 16: Hardwareaspekte Deep Learning -FPGAs für Computer Vision ... · felix.eberli@scs.ch +41 43 456 16 19. Created Date: 7/6/2017 12:31:00 PM ...

16

Supercomputing Systems AG Phone +41 43 456 16 00

Technopark 1 Fax +41 43 456 16 10

8005 Zürich www.scs.ch

Vision meets reality.

Supercomputing Systems AG

[email protected] +41 43 456 16 19