Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of...
Transcript of Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of...
![Page 1: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/1.jpg)
Introduction to and History of GPU Algorithms
Jeff M. Phillips
November 20, 2013
![Page 2: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/2.jpg)
Early Computer Graphics
X
wi
p
Draw each pixel on screen.For each pixel p:
I Determine if pixel could “see”triangle
I Determine which object “in front”
I If we can “see through” object,what is behind?
I Does light reach that object?
All done on CPU
![Page 3: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/3.jpg)
Early Computer Graphics
Draw each pixel on screen.For each pixel p:
I Determine if pixel could “see”triangle
I Determine which object “in front”
I If we can “see through” object,what is behind?
I Does light reach that object?
All done on CPU
![Page 4: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/4.jpg)
Early Computer Graphics
Draw each pixel on screen.For each pixel p:
I Determine if pixel could “see”triangle
I Determine which object “in front”
I If we can “see through” object,what is behind?
I Does light reach that object?
All done on CPU
![Page 5: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/5.jpg)
Early Computer Graphics
Draw each pixel on screen.For each pixel p:
I Determine if pixel could “see”triangle
I Determine which object “in front”
I If we can “see through” object,what is behind?
I Does light reach that object?
All done on CPU
![Page 6: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/6.jpg)
Blitters in Hardware
1980s.
I Commodore Amiga, IBM
I Block copying of memory;in parallel on CPU
I Copied image bitmapsquickly (for moving GUIs)
![Page 7: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/7.jpg)
3D Graphics
1990s.
3D Gaming!
I OpenGL and DirectX APIs
I GPU directly implementedthese APIsfixed functional pipeline
I nVidia vs. ATI vs. 3dfx
![Page 8: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/8.jpg)
Early GPUs
“Fixed Functional Pipeline”
I All games / 3D Graphics lookedall about the same
I Triangle Rasterization = veryefficient
I RayTracing looked, better, but tooslow, took much memory!
![Page 9: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/9.jpg)
OpenGL
Was OpenGL the first GPU language?
No.
I Just a specification!
I Hardware vendor implementedspecification (sometimes slightvariation).
I before 2.0, entirely fixed-function
I after 2.0, some different effectsadded
DirectX: a Windows library.
I Direct3D is the graphics component
![Page 10: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/10.jpg)
OpenGL
Was OpenGL the first GPU language?No.
I Just a specification!
I Hardware vendor implementedspecification (sometimes slightvariation).
I before 2.0, entirely fixed-function
I after 2.0, some different effectsadded
DirectX: a Windows library.
I Direct3D is the graphics component
![Page 11: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/11.jpg)
OpenGL
Was OpenGL the first GPU language?No.
I Just a specification!
I Hardware vendor implementedspecification (sometimes slightvariation).
I before 2.0, entirely fixed-function
I after 2.0, some different effectsadded
DirectX: a Windows library.
I Direct3D is the graphics component
![Page 12: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/12.jpg)
Early GPU programming
Direct3D 8.0 (2000) andOpenGL 2.0 (2004) addedsupport for assembly languageprogramming for shaders.
I nVidia GeForce 3I ATI Radeon 8000
Direct3D 9.0 added High Level Shader Language (HLSL)
I nVidia GeForce FX 5000I ATI Radeon 9000
More minor increments...
![Page 13: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/13.jpg)
Early GPU programming
Direct3D 8.0 (2000) andOpenGL 2.0 (2004) addedsupport for assembly languageprogramming for shaders.
I nVidia GeForce 3I ATI Radeon 8000
Direct3D 9.0 added High Level Shader Language (HLSL)
I nVidia GeForce FX 5000I ATI Radeon 9000
More minor increments...
![Page 14: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/14.jpg)
Early GPU programming
Direct3D 8.0 (2000) andOpenGL 2.0 (2004) addedsupport for assembly languageprogramming for shaders.
I nVidia GeForce 3I ATI Radeon 8000
Direct3D 9.0 added High Level Shader Language (HLSL)
I nVidia GeForce FX 5000I ATI Radeon 9000
More minor increments...
![Page 15: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/15.jpg)
Early GPU Pipeline
I Vertex data sent via graphics API (e.g. OpenGL, DirectX)
I vertex data processed by vertex shader
I vertex shader outputs pixels
I fragment shader processes pixels
![Page 16: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/16.jpg)
Early GPU Pipeline
Early-on (Direct3D 10, GeForce 8000, Radeon 2000): vertex /fragment shaders had different hardware.
I slightly different rules
I Direct3D 10 (Windows Vista) added geometry shader, unifiedhardware
GPUs now use same core to run all shaders
![Page 17: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/17.jpg)
Early GPU Pipeline
Early-on (Direct3D 10, GeForce 8000, Radeon 2000): vertex /fragment shaders had different hardware.
I slightly different rules
I Direct3D 10 (Windows Vista) added geometry shader, unifiedhardware
GPUs now use same core to run all shaders
![Page 18: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/18.jpg)
Shader Languages
No longer write in assembly!
I GLSL, HLSL, cG, offer C-style shader programming
I write two main() functions which are run on each vertex/pixel
I Auxiliary functions and local variables
I output by setting position and color (write to special variables)
![Page 19: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/19.jpg)
CUDACompute Unified Device Architecture
I created by nVidia
I came with GeForce 8000 line
I runs general C code (not restricted graphics APIs)
I Linear Memory Access (no buffer objects)
I runs thousands of separate scalar cores
![Page 20: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/20.jpg)
Other GPU patterns
ATI Stream SDK
I closer to assembly
Apple / Kronos Group (OpenGL) started OpenCL initiative (2008)
I released 2009
I supported by nVidia and ATI
I not specific to GPU (support for CPU SSE)
DirectX 11 added DirectCompute Shaders
I similar to OpenCL
I tied with Direct3D
I added hull and domain shaders to pipeline
I allows high-detail geometry created on GPU, not PCI-E bus
OpenGL 4 similar to Direct 11
I also added two stages to pipeline
![Page 21: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/21.jpg)
Other GPU patterns
ATI Stream SDK
I closer to assembly
Apple / Kronos Group (OpenGL) started OpenCL initiative (2008)
I released 2009
I supported by nVidia and ATI
I not specific to GPU (support for CPU SSE)
DirectX 11 added DirectCompute Shaders
I similar to OpenCL
I tied with Direct3D
I added hull and domain shaders to pipeline
I allows high-detail geometry created on GPU, not PCI-E bus
OpenGL 4 similar to Direct 11
I also added two stages to pipeline
![Page 22: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/22.jpg)
Other GPU patterns
ATI Stream SDK
I closer to assembly
Apple / Kronos Group (OpenGL) started OpenCL initiative (2008)
I released 2009
I supported by nVidia and ATI
I not specific to GPU (support for CPU SSE)
DirectX 11 added DirectCompute Shaders
I similar to OpenCL
I tied with Direct3D
I added hull and domain shaders to pipeline
I allows high-detail geometry created on GPU, not PCI-E bus
OpenGL 4 similar to Direct 11
I also added two stages to pipeline
![Page 23: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/23.jpg)
Other GPU patterns
ATI Stream SDK
I closer to assembly
Apple / Kronos Group (OpenGL) started OpenCL initiative (2008)
I released 2009
I supported by nVidia and ATI
I not specific to GPU (support for CPU SSE)
DirectX 11 added DirectCompute Shaders
I similar to OpenCL
I tied with Direct3D
I added hull and domain shaders to pipeline
I allows high-detail geometry created on GPU, not PCI-E bus
OpenGL 4 similar to Direct 11
I also added two stages to pipeline
![Page 24: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/24.jpg)
GPU Programming
Top of line:
I 3 Teraflops
I 100+ GB/s memory access bandwidth
I high-speed atomic operations
Now easier to program:
I nVidia’s Fermi architcture supports C++
I MATLAB integration
Many applications:
I Folding@Home
I Photoshop
I Mathematica 8
I large scale data mining
I physics fluid simulation
I computational ecology
![Page 25: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/25.jpg)
GPU Program Model
We will focus on computational properties and data analysis (notgraphics)
I Suited for highly parallel, fine-grain parallel programs
I Suited for regular number-crunching
I Need to model hierarchy of processors and memory
![Page 26: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/26.jpg)
GPU Program Model
We will focus on computational properties and data analysis (notgraphics)
I Suited for highly parallel, fine-grain parallel programs
I Suited for regular number-crunching
I Need to model hierarchy of processors and memory
![Page 27: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/27.jpg)
GPU HierarchyEach processor (SM) hasprivate L1 Cache
I 16-48 kB (small)I not coherent (CRCW
causes problems)I (256-512 kB on CPU)
Modern systems, L2 Cache
I 512 - 768 kBI (8-15 MB on CPU)
Memory bandwidth is fast!
I 100 - 200 GB/sI but ... separate from CPUI (24-32 GB/s on CPU)
Memory size is small!
I 768MB - 12GBI and ... separate from CPUI (6 - 128 GB on CPU)
![Page 28: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/28.jpg)
GPU HierarchyEach processor (SM) hasprivate L1 Cache
I 16-48 kB (small)I not coherent (CRCW
causes problems)I (256-512 kB on CPU)
Modern systems, L2 Cache
I 512 - 768 kBI (8-15 MB on CPU)
Memory bandwidth is fast!
I 100 - 200 GB/sI but ... separate from CPUI (24-32 GB/s on CPU)
Memory size is small!
I 768MB - 12GBI and ... separate from CPUI (6 - 128 GB on CPU)
![Page 29: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/29.jpg)
GPU HierarchyEach processor (SM) hasprivate L1 Cache
I 16-48 kB (small)I not coherent (CRCW
causes problems)I (256-512 kB on CPU)
Modern systems, L2 Cache
I 512 - 768 kBI (8-15 MB on CPU)
Memory bandwidth is fast!
I 100 - 200 GB/sI but ... separate from CPUI (24-32 GB/s on CPU)
Memory size is small!
I 768MB - 12GBI and ... separate from CPUI (6 - 128 GB on CPU)
![Page 30: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/30.jpg)
GPU HierarchyEach processor (SM) hasprivate L1 Cache
I 16-48 kB (small)I not coherent (CRCW
causes problems)I (256-512 kB on CPU)
Modern systems, L2 Cache
I 512 - 768 kBI (8-15 MB on CPU)
Memory bandwidth is fast!
I 100 - 200 GB/sI but ... separate from CPUI (24-32 GB/s on CPU)
Memory size is small!
I 768MB - 12GBI and ... separate from CPUI (6 - 128 GB on CPU)
![Page 31: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/31.jpg)
NVidia GeForce 8800 GTX
G80 series
I 128 stream processors:
I 16 multiprocessors
I a multiprocessor has 8 processor units
Higher in hierarchy, more shared memoryLower in hierarchy, less shared/private memory
Nvidia Quadro K6000: 2880 Parallel CUDA Cores
![Page 32: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/32.jpg)
NVidia GeForce 8800 GTX
G80 series
I 128 stream processors:
I 16 multiprocessors
I a multiprocessor has 8 processor units
Higher in hierarchy, more shared memoryLower in hierarchy, less shared/private memory
Nvidia Quadro K6000: 2880 Parallel CUDA Cores
![Page 33: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/33.jpg)
NVidia GeForce 8800 GTX - Fermi
G80 series
I 128 stream processors:
I 16 multiprocessors
I a multiprocessor has 8processor units
Higher in hierarchy, moreshared memoryLower in hierarchy, lessshared/private memory
![Page 34: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/34.jpg)
GPU Hype
Much hype of 100-200x speed-up on GPU!
I not always fair comparison: 128 GPU cores vs 1CPU core
I optimized GPU code vs. un-optimized CPU code
I work in single precision (double precision slow on GPU)
I not counting memory transfer time
I As CUDA functionality increased, so did its overhead!
But sometimes GPU is very useful.
Cheap, highly parallel computer!
![Page 35: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/35.jpg)
GPU in MatlabpMatlab: Parallel Matlab Toolbox v2.0.1
cpu_x = rand(1,100000000)*10*pi;
gpu_x = gpuArray(cpu_x);
gpu_y = sin(gpu_x);
cpu_y = gather(gpu_y);
![Page 36: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/36.jpg)
GPU in Matlab
pMatlab: Parallel Matlab Toolbox v2.0.1
cpu_x = rand(1,100000000)*10*pi;
gpu_x = gpuArray(cpu_x);
gpu_y = big-trig-function(gpu_x);
cpu_y = gather(gpu_y);
![Page 37: Introduction to and History of GPU Algorithmsjeffp/teaching/MCMD/GPU-intro.pdfGPU Hype Much hype of 100-200x speed-up on GPU! I not always fair comparison: 128 GPU cores vs 1CPU core](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f0387177e708231d4097f9f/html5/thumbnails/37.jpg)
Attribution
These slides borrow from material by
I Mathieu Desbrun
I Supercomputing Blog:http://supercomputingblog.com/cuda-tutorials/
I Walking Randomly:http://www.walkingrandomly.com/?p=3730