GPU Computing with Python and Anaconda: The Next Frontier
-
Upload
nvidia -
Category
Technology
-
view
901 -
download
3
Transcript of GPU Computing with Python and Anaconda: The Next Frontier
![Page 1: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/1.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Computing with Python and Anaconda: The Next Frontier
Accelerate. Connect. Empower.
Stan Seibert
Director of Community Innovation
![Page 2: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/2.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 2
GPUs & Python: A Great Combination
• Python is becoming the glue that binds data
science
• Rapid integration empowers data scientists to
combine new technologies
• This is our goal for Anaconda:
• Free distribution of Python and R for
Win/Mac/Linux
• Includes GPU-accelerated packages:
Caffe, TensorFlow, PyTorch, Theano,
Numba, Pyculib...
![Page 3: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/3.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 3
ReLU
ReLU
ReLU
ReLU
Deep Learning: An Early Success
• Powerful machine learning
technique
• Many great open source options
• Every major package has a Python
interface
• Very compute intensive
➡Perfect for GPU acceleration
![Page 4: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/4.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 4
• Compile numerical
Python functions for
CPU or GPU
• Based on the LLVM
compiler library
• Great for rapid,
custom algorithm
development
Numba: JIT Python Compilation
![Page 5: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/5.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
![Page 6: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/6.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer
![Page 7: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/7.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos?
GPU
ETL/Data
Prep
Database
Machine
Learning
Visualization
Data
Data Data
Data
CPU transfer
CPU transferCPU transfer Why do GPU applications share
data through slow CPU memory?
![Page 8: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/8.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Open Analytics Initiative
Goal:
Standardize data exchange between
GPU analytics applications
Current Members:
MapD, Anaconda, H2O.ai,
BlazingDB, Graphistry, Gunrock
http://gpuopenanalytics.com/
![Page 9: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/9.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 9
Streamlining the Data Science Pipeline
GPU Database
Python Data
Transformation
Generalized
Linear Model
All data stays on the GPU
GDFPacked
Array
Apache
Arrow
![Page 10: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/10.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 10
• A format for tabular data in GPU memory
• Exchange GDF between different libraries
• Move between processes using CUDA IPC
• Based on Apache Arrow
• Code in separate library
• Work in progress to move functionality
into Arrow project
GPU Dataframe (GDF)
![Page 11: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/11.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 11
• A Python library of manipulating GPU Dataframes:
• Create from NumPy arrays and Pandas Dataframes
• Exchange between processes
• Math operations
• Sort, Filter, Join, Group By
• Ideal for data manipulation and feature engineering stages between data source and machine learning
• Not intended to replace dedicated database applications
• Interoperates with our Python compiler for GPU: Numba
PyGDF: Python GPU Dataframes
![Page 12: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/12.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 12
PyGDF: Group By Performance
GPU speedup become
very large above 10
million elements
Aggregation functions
are extremely efficient
on the GPU
![Page 13: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/13.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 13
• Scalable execution task graphs of task graphs from single
computers to 1000+ node clusters
• Scheduler is "resource aware" and can direct GPU tasks to nodes
with appropriate hardware. Great for heterogeneous clusters!
Dask: Distributed Computing
![Page 14: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/14.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary 14
The Future
• In flight:
• Merger of common code into Apache Arrow GPU support
• Node.js interface to GDF (Graphistry)
• Dask GDF: Distributed GPU dataframe
• Other potential future projects:
• Tensor exchange between Python GPU libraries
• GPU shared memory service (Plasma for GPU)
• Can we improve the interaction of unified memory and IPC?
• What do you want to see?
![Page 15: GPU Computing with Python and Anaconda: The Next Frontier](https://reader031.fdocuments.in/reader031/viewer/2022021923/5a6495da7f8b9a63568b4c33/html5/thumbnails/15.jpg)
© 2017 Anaconda, Inc. - Confidential & Proprietary
Learn More
GPU Open Analytics Websitehttp://gpuopenanalytics.com
GOAI Github Organizationhttps://github.com/gpuopenanalytics/
GOAI Google Grouphttps://groups.google.com/forum/#!forum/gpuopenanalytics