Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse...

25
© 2013 NVIDIA Mark Harris Chief Technologist, GPU Computing Software, NVIDIA Future Directions for CUDA

Transcript of Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse...

Page 1: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Mark Harris Chief Technologist, GPU Computing Software, NVIDIA

Future Directions for CUDA

Page 2: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Platform for Parallel Computing

The CUDA Platform is a

foundation that supports a

diverse parallel computing

ecosystem.

Platform

Page 3: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

1.0 2.0 3.0 4.0 5.0

C++ Dynamic

Parallelism

C

Device Code

Linking NVCC

Fortran (PGI)

cuda-memcheck

Nsight

Eclipse Ed.

Detect

Shared Memory

Hazards

cuBLAS

Device API 1000+ new NVPP

functions

cuBLAS

cuFFT

Thrust

cuRand

cuSparse

LLVM

New Visual

Profiler

GPU-Aware

MPI

C++ new/delete

Virtual functions

Templates

UVA

nvidia-smi

GPUDirect

Recursion

cuda-gdb

Visual Profiler

Command-

Line Profiler

NVPP

Nsight IDE

OpenACC

Inheritance

Function pointers

Platform for Parallel Computing

Compiler Tool Chain

Programming Languages

Libraries

Developer Tools

Platform

Page 4: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Investing in the Future

Enabling More Programmers

Programming Model

Future Computing Platforms

Platform

Page 5: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Unified Programming Language

Page 6: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

GPU

A

CPU

main

Unified Run-Time Interface

B

C

X

Y

Z

CUDA Dynamic Parallelism

Page 7: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Page 8: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Page 9: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

CUDA UVM Demo

Page 10: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Simpler, More Integrated Programming

16

2

4

6

8

10

12

14

DP G

FLO

PS p

er

Watt

2008 2010 2012 2014 Unified Language

Unified

Run-Time

Unified Virtual

Memory

Tesla Fermi

Kepler

Maxwell

Page 11: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Diversity of Programming Languages

http://www.ohloh.net

Page 12: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Enabling More Programming Languages

Developers want to build

front-ends for

Python, Java, R, DSLs …

Target other processors like

ARM, FPGAs, GPUs, x86 …

CUDA C, C++, Fortran

LLVM Compiler For CUDA

NVIDIA GPUs

x86 CPUs

New Language Support

New Processor Support

Page 13: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Enabling More Programming Languages

CUDA C, C++, Fortran

LLVM Compiler For CUDA

NVIDIA GPUs

x86 CPUs

New Language Support

New Processor Support

Halide (http://halide-lang.org/)

Mozilla Rust

Page 14: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Rapid Development

Powerful Libraries

Commercial Support

Large Community

Page 15: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Is Python Fast Enough for HPC?

Python apps often implement

performance critical functions in C/C++.

Page 16: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Compile Python for Parallel Architectures

Anaconda Accelerate from Continuum Analytics

NumbaPro array-oriented compiler for Python & NumPy

Compile for CPUs or GPUs (uses LLVM + NVIDIA Compiler SDK)

Fast Development + Fast Execution: Ideal Combination

http://continuum.io

Free Academic

License

Page 17: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

10242 Mandelbrot Time Speedup v. Pure Python

Pure Python 4.85s --

NumbaPro (CPU) 0.11s 44x

CUDA Python (K20) .004s 1221x

CUDA Python

CUDA Programming,

Python Syntax

Page 18: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Page 19: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

KAYLA

Page 20: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

CUDA 5 | OpenGL 4.3

Kick starts ARM + CUDA Ecosystem

NAMD Ported in 2 Days

Kayla Development Platform

Quad ARM + Kepler GPU

Quad ARM + Any CUDA GPU

Page 21: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

DEMO: KAYLA

Page 22: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

1.0 2.0 3.0 4.0 5.0

C++ Dynamic

Parallelism

C

Device Code

Linking NVCC

Fortran (PGI)

cuda-memcheck

Nsight

Eclipse Ed.

Detect

Shared Memory

Hazards

cuBLAS

Device API 1000+ new NVPP

functions

cuBLAS

cuFFT

Thrust

cuRand

cuSparse

LLVM

New Visual

Profiler

GPU-Aware

MPI

C++ new/delete

Virtual functions

Templates

UVA

nvidia-smi

GPUDirect

Recursion

cuda-gdb

Visual Profiler

Command-

Line Profiler

NVPP

Nsight IDE

OpenACC

Inheritance

Function pointers

Platform for Parallel Computing

Compiler Tool Chain

Programming Languages

Libraries

Developer Tools

Platform

Page 23: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

5.0

Platform for Parallel Computing

JIT

Linking

JIT

Compilation

Profiler

Step-by-Step Guidance Single-GPU Debugging

Multi-GPU Support ARM Support

Compiler Tool Chain

Programming Languages

Libraries

Developer Tools

C++11

Sparse Solvers

Platform

Page 24: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

Ubiquitous

parallel

programming

Power

Aware

Programming

Hybrid

operating

system

Enablement

Parallel

Compiler

Foundation

Enablement

Optimizing

locality and

computation

Task, Thread

& Data

Parallelism

Today Easier

Parallel

Programming

Future Challenges

Page 25: Future Directions for CUDA...Device API 1000+ new NVPP functions cuBLAS cuFFT Thrust cuRand cuSparse LLVM New Visual Profiler GPU-Aware MPI C++ new/delete Virtual functions Templates

© 2013 NVIDIA

GPUs Everywhere

2012 2015 2018

MPI