A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++...
Transcript of A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++...
![Page 1: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/1.jpg)
Mark Harris Chief Technologist, GPU Computing
UNC Ph.D. 2003
A BRIEF HISTORY OF GPGPU
![Page 2: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/2.jpg)
2
![Page 3: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/3.jpg)
3
A BRIEF HISTORY OF GPGPU
fd
General-Purpose computation on Graphics Processing Units
![Page 4: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/4.jpg)
4
THE FIRST GPGPU: IKONAS RDS-3000 1978: Nick England & Mary Whitton founded Ikonas Graphics Systems
Tim Van Hook wrote microcode for solid modeling, ray tracing (SIGGRAPH ’86)
From a 1985 Video: “All computation is taking place in the Adage 3000 Display”
http://www.virhistory.com/ikonas/ikonas.html
![Page 5: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/5.jpg)
5
UNC PIXEL PLANES AND PIXELFLOW Procedural textures on Pixel Planes 5 (Rhodes et al. 1992)
PixelFlow
100,000+ 100MHz 8-bit processors
Early real-time programmable shading (Olano/Lastra ‘98)
Kedem et al. (‘98) used for unix password cracking
Proceedings 1992 Symposium on Interactive 3D Graphics
Cambridge, Massachusetts 29 March - 1 April 1992
Program Co-Chairs
Marc Levoy Edwin E. Catmull Stanford University Pixar
Symposium Chair
David Zeltzer MIT Media Laboratory
Sponsored by the following organizations:
Office of Naval Research National Science Foundation
USA Ballistic Research Laboratory Hewlett-Packard Silicon Graphics
Sun Microsystems MIT Media Laboratory
In Cooperation with ACM SIGGRAPH
![Page 6: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/6.jpg)
6
GEFORCE 1-3: THE DAWN OF GPGPU (‘99-’01) GeForce 256: First “GPU”
GeForce 3: First programmable GPU
Vertex Shaders – programmable vertex transforms, 32-bit float
Data-dependent, configurable texturing + register combiners
Enabled early GPGPU results
Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2
Larsen &McAllister (2001): first GPU matrix multiplication (8-bit)
Rumpf & Strzodka (2001): first GPU PDEs (diffusion, image segmentation)
NVIDIA SDK Game of Life, Shallow Water (Greg James, 2001)
![Page 7: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/7.jpg)
7
PHYSICALLY BASED SIMULATION ON GEFORCE 3 Approximate simulation of natural phenomena
Boiling liquid,
fluid convection,
chemical reaction-diffusion
Inaccurate due to low GPU precision
“Physically-Based Visual Simulation on Graphics Hardware”. Harris, Coombe, Scheuermann, and Lastra. Graphics Hardware 2002
![Page 8: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/8.jpg)
8
NAMING A TREND
Let’s name this thing that people are doing!
I coined “GPGPU” and created home page November 2002
home on the web to collect research / resources
Interest grew quickly: launched GPGPU.org August 2003
“Application of graphics hardware to non-graphics applications” “General computations on graphics hardware” “Exploiting special-purpose hardware for alternative purposes”
![Page 9: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/9.jpg)
9
GEFORCE FX (2003) : FLOATING POINT PIXELS
True Programmability enabled broader simulation research
Ray Tracing (Purcell, 2002), Photon Maps (Purcell, 2003)
Radiosity (Carr et al., 2003 & Coombe et al., 2004)
PDE solvers
Red-black Gauss-Seidel (Harris et al., 2003)
Conjugate gradient (Bolz et al. 2003, Krueger et al. 2003)
Multigrid (Goodnight et al. 2003)
Physically-based simulation
Fluid and cloud simulation: (Krueger et al. 2003, Harris et al. 2003)
Cloth simulation (Green, 2003)
Ice crystal formation (Kim and Lin, 2003)
FFT (Moreland and Angel, 2003)
High-level language: Brook for GPUs (Buck et al. 2004)
![Page 10: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/10.jpg)
10
GPU CLOUD SIMULATION My Ph.D. Dissertation: visually realistic cloud simulation on GPUs
2D & 3D Incompressible Navier-Stokes fluid
Thermodynamics (latent heat, diffusion)
Water condensation / evaporation
Light scattering simulation for rendering
Programmed in OpenGL with pixel shaders
“Real-Time Cloud Simulation and Rendering”. Mark Harris Ph.D. Dissertation U. of North Carolina. 2003
![Page 11: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/11.jpg)
11
CUDA AND THE G80 GPU (2006)
First GPU arch. and software platform designed for computing
Dedicated computing mode – threads rather than pixels/vertices
General, byte-addressable memory architecture
First C/C++ language and compiler for GPUs
CUDA C++ defines minimally extended subset of C++ with parallelism
2007 began a massive surge in GPGPU development
Not just graphics PhDs
![Page 12: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/12.jpg)
12
ACCELERATING DISCOVERIES
USING A SUPERCOMPUTER POWERED BY 3,000 TESLA PROCESSORS, UNIVERSITY OF ILLINOIS SCIENTISTS PERFORMED THE FIRST ALL-ATOM SIMULATION OF THE HIV VIRUS AND DISCOVERED THE CHEMICAL STRUCTURE OF ITS CAPSID — “THE PERFECT TARGET FOR FIGHTING THE INFECTION.” WITHOUT GPUS, THE SUPERCOMPUTER WOULD NEED TO BE 5X LARGER FOR SIMILAR PERFORMANCE.
![Page 13: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/13.jpg)
13
FROM HPC TO ENTERPRISE DATACENTERS
Government Supercomputing Finance Higher Ed Oil & Gas Consumer Web
Air Force Research Laboratory
Naval Research Laboratory
Tokyo Institute of Technology
![Page 14: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/14.jpg)
14
MACHINE LEARNING USING DEEP NEURAL NETWORKS
Input Result
Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009
Visual Object Recognition Using Deep Convolutional Neural Networks Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985
![Page 15: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/15.jpg)
15
GPU-ACCELERATED DEEP LEARNING
START-UPS Image Detection
Face Recognition
Gesture Recognition
Video Search & Analytics
Speech Recognition & Translation
Image and Video Understanding
Recommendation Engines
Indexing & Search
![Page 16: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/16.jpg)
16
COMMON PROGRAMMING APPROACHES Across Heterogeneous Architectures
x86
Libraries
Programming Languages
Compiler Directives
AmgX cuBLAS
/
![Page 17: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/17.jpg)
17
Unified Memory DRAMATICALLY LOWER DEVELOPER EFFORT
Past Developer View
System Memory
GPU Memory
Developer View With Unified Memory
Unified Memory
![Page 18: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/18.jpg)
18
PARALLELISM IN MAINSTREAM LANGUAGES
Enable more programmers to write parallel software
Give programmers the choice of language to use
Parallel computing support in key languages
C
![Page 19: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/19.jpg)
19
FUTURE C++: PARALLEL STL
Complete set of parallel primitives: for_each, sort, reduce, scan, etc.
ISO C++ committee voted unanimously to accept as official tech. specification working draft
N3960 Technical Specification Working Draft: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4352.html Prototype: https://github.com/n3554/n3554
std::vector<int> vec = ... // previous standard sequential loop std::for_each(vec.begin(), vec.end(), f); // explicitly sequential loop std::for_each(std::seq, vec.begin(), vec.end(), f); // permitting parallel execution std::for_each(std::par, vec.begin(), vec.end(), f);
![Page 20: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/20.jpg)
20
PARTING WORDS OF WISDOM
Stand Up!
“Keep it narrow and doable” – Fred Brooks
“Write a little bit every day” – Fred Brooks
“If you measure it, you can improve it” – Jen-Hsun Huang
“But you have to measure the right thing!”
![Page 21: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/21.jpg)
21
1999: ADVENT OF THE GPU
NVIDIA GeForce 256
Coined the term “Graphics Processing Unit”
“A single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second.”
Register Combiners
configurable multipass shading
Beginning of GPU programmability
Spare 0
Fragment Color
Texture Fetch
General Combiner 0
4 RGB Inputs
Texture 0
Texture 1
Fog Color/Factor
Reg
iste
r Set
6 RGB Inputs
Specular Color
4 Alpha Inputs
3 RGB Outputs 3 Alpha Outputs
General Combiner 1
4 RGB Inputs 4 Alpha Inputs
3 RGB Outputs 3 Alpha Outputs
Final Combiner 1 Alpha Input
Specular Color
![Page 22: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge](https://reader034.fdocuments.in/reader034/viewer/2022042803/5f42a6b4b727702bea31c9f5/html5/thumbnails/22.jpg)
22
EARLY PC & WORKSTATION GRAPHICS Rasterizer, texture unit, z-buffer, frame buffer
Fixed-point math, fixed-function interpolation / texturing
Lengyel et al. (1990) – Robot motion planning
Use rasterizer to fill minkowski sum polygons
HP 9000 TurboSRX Workstation
Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2
Render cones – rasterizer and z-buffer compute voronoi diagram