Smoky Mountain Conference 2016

24
Bill Dally, Chief Scientist and SVP of Research September 1, 2016 The Synergy of Big Data and Exascale

Transcript of Smoky Mountain Conference 2016

Page 1: Smoky Mountain Conference 2016

Bill Dally, Chief Scientist and SVP of Research

September 1, 2016

The Synergy of Big Data and Exascale

Page 2: Smoky Mountain Conference 2016

2

A Decade of Scientific Computing with GPUs

2006 2008 2012 2016 2010 2014

Fermi: World’s First HPC GPU

Oak Ridge Deploys World’s Fastest Supercomputer w/ GPUs

World’s First Atomic Model of HIV Capsid

GPU-Trained AI Machine Beats World Champion in Go

Stanford Builds AI Machine using GPUs

World’s First 3-D Mapping of Human Genome

CUDA Launched

World’s First GPU Top500 System

Google Outperform Humans in ImageNet

Discovered How H1N1 Mutates to Resist Drugs

AlexNet beats expert code by huge margin using GPUs

Stream Processing @ Stanford

Page 3: Smoky Mountain Conference 2016

3

The Age of Big Data

>3 Exabytes of Web Data Created Daily

>350 Million Images Uploaded a Day >400 Hours Video Uploaded Every Minute

How can we organize, analyze, understand, benefit from such a trove of data?

Page 4: Smoky Mountain Conference 2016

4

Deep Learning Extracts Meaning from Big Data

Page 5: Smoky Mountain Conference 2016

5

Deep Learning Explodes at Google

Android apps Drug discovery

Gmail Image understanding

Maps Natural language understanding

Photos Robotics research

Speech Translation YouTube

Jeff Dean's talk at TiECon, May 7, 2016

Page 6: Smoky Mountain Conference 2016

6

Deep Learning Everywhere

INTERNET & CLOUD

Image Classification Speech Recognition

Language Translation Language Processing Sentiment Analysis Recommendation

MEDIA & ENTERTAINMENT

Video Captioning Video Search

Real Time Translation

AUTONOMOUS MACHINES

Pedestrian Detection Lane Tracking

Recognize Traffic Sign

SECURITY & DEFENSE

Face Detection Video Surveillance Satellite Imagery

MEDICINE & BIOLOGY

Cancer Cell Detection Diabetic Grading Drug Discovery

Page 7: Smoky Mountain Conference 2016

7

Now “Superhuman” at Many Tasks

Speech recognition

Image classification and detection

Face recognition

Playing Atari games

Playing Go

Page 8: Smoky Mountain Conference 2016

8

Deep learning fueling SCIENCE

Classify Satellite Images for Carbon Monitoring

Analyze Obituaries on the Web for Cancer-related Discoveries

Determine Drug Treatments to Increase Child’s Chance of Survival

NASA AMES

Page 9: Smoky Mountain Conference 2016

9

Using ML to Approximate Fluid Dynamics

“Data-driven Fluid Simulations using Regression Forests” http://people.inf.ethz.ch/ladickyl/fluid_sigasia15.pdf

“… Implementation led to a speed-up of one to three orders of magnitude compared to the state-of-the-art position-based fluid solver and runs in real-time for systems with up to 2 million particles”

Page 10: Smoky Mountain Conference 2016

10

Using ML to Approximate Schrodinger Equation

“Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning”, Rupp et al., Physical Letters

“For larger training sets, N >= 1000, the accuracy of the ML model becomes competitive with mean-field electronic structure theory—at a fraction of the computational cost.”

Page 11: Smoky Mountain Conference 2016

11

Big Data and Scientific Computing are Converging

and Need the Same Hardware Capabilities

Page 12: Smoky Mountain Conference 2016

12

Big Data and HPC Need the Same Hardware

•  Maximum arithmetic Perf/W (ops/J)

•  High memory bandwidth (B/s, B/J)

•  High memory capacity (B)

•  High bandwidth storage (B/s)

Page 13: Smoky Mountain Conference 2016

13

Slight Differences

HPC

Double precision arithmetic

Less memory per FLOPS

More demanding on network bandwidth

More demand for scalability

Big Data

Single or half precision arithmetic

More memory per FLOPS

Less demanding of network bandwidth

Scaling to a few thousand GPUs adequate

Page 14: Smoky Mountain Conference 2016

14

Slight Differences

HPC

Double precision arithmetic

Less memory per FLOPS

More demanding on network bandwidth

More demand for scalability

Big Data

Single or half precision arithmetic

More memory per FLOPS

Less demanding of network bandwidth

Scaling to a few thousand GPUs adequate

Can be addressed by provisioning of memory and network

Page 15: Smoky Mountain Conference 2016

15

System Sketch

Page 16: Smoky Mountain Conference 2016

16

System Defined by Key Components •  Processors (GPUs)

•  Arithmetic (FLOPS/W)

•  Memory hierarchy (B/s, B/J)

•  Memory Component (FG-DRAM) •  Provides Capacity (B, B/$)

•  Bandwidth (B/s, B/J)

•  Network Switch – •  Global bandwidth (B/s)

Page 17: Smoky Mountain Conference 2016

17

System Defined by Key Components •  Processors (GPUs)

•  Arithmetic (FLOPS/W)

•  Memory hierarchy (B/s, B/J)

•  Memory Component (FG-DRAM) •  Provides Capacity (B, B/$)

•  Bandwidth (B/s, B/J)

•  Network Switch – •  Global bandwidth (B/s)

To be economically viable, one component must serve multiple markets

Page 18: Smoky Mountain Conference 2016

18

Enabling Technologies

NVLINK

Target-Independent Programming

Page 19: Smoky Mountain Conference 2016

19

NVLINK – Enables Fast Interconnect, PGAS Memory

GPU

Memory

System Interconnect

GPU

Memory

NVLINK

Page 20: Smoky Mountain Conference 2016

20

Target-Independent Programming

Legion Program

Machine-Independent Specification Tasks: decouple control from machine Logical regions: decouple program data from machine Sequential semantics

Legion

Analysis! Why it matters Reduce programmer pain Extract ALL parallelism Easily transform and remap programs for new machines

Tasks + Data Model =

Powerful Programming

Analysis

Page 21: Smoky Mountain Conference 2016

21

Exascale Gaps Remaining

Page 22: Smoky Mountain Conference 2016

22

Exascale Gaps •  Energy Efficiency

•  Pascal 5.3TFLOPS at 300W ~ 18GF/W (before CPU and network overhead)

•  Need 50GF/W

•  Resilience

•  SEC and DUE rates must be improved for ExaScale

•  Programmability

•  Modern machines are have deep memory hierarchies, are highly parallel, and heterogeneous

•  Need tools to automate optimal mapping of computations

Page 23: Smoky Mountain Conference 2016

23

Conclusion

Page 24: Smoky Mountain Conference 2016

24

The Synergy of Exascale and Big Data •  Explosion of data – ExaBytes per day

•  Meaning extracted by Deep Learning •  Deep learning is everywhere •  Superhuman performance on many tasks

•  Big Data and Scientific Computing need the same things •  Arithmetic (Ops/J), Memory bandwidth (B/s, B/J), Memory Capacity (B), Storage •  GPUs provide all of these •  Differences addressed by provisioning memory and network

•  Enabling technologies for HPC and Deep Learning •  NVLINK – Nearly flat bandwidth between large groups of GPUs (PGAS) •  Target Independent Programming

•  Exascale gaps •  Energy efficiency •  Resilience •  Programmability

•  A GPU is an economically viable solution to ExaScale, Big Data, and Graphics