Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching...
-
Upload
chad-cannon -
Category
Documents
-
view
215 -
download
0
Transcript of Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching...
![Page 1: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/1.jpg)
Training Program onGPU Programming
with CUDA
31st July, 7th Aug, 14th Aug 2011CUDA Teaching Center @ UoM
![Page 2: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/2.jpg)
Training Program on GPU Programming with CUDA
Sanath JayasenaCUDA Teaching Center @ UoM
Day 1, Session 1
Introduction
![Page 3: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/3.jpg)
Outline
• Training Program Description• CUDA Teaching Center at UoM
Subject Matter• Introduction to GPU Computing• GPU Computing with CUDA• CUDA Programming Basics
July-Aug 2011 3CUDA Training Program
![Page 4: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/4.jpg)
Overview of Training Program
• 3 Sundays, starting 31st July• Schedule and program outline
• Main resource persons– Sanath Jayasena, Jayathu Samarawickrama, Kishan
Wimalawarna, Lochandaka Ranathunga• Dept of Computer Science & Eng, Dept of Electronic &
Telecom. Engineering (of Faculty of Engineering) and Faculty of IT
July-Aug 2011 CUDA Training Program 4
![Page 5: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/5.jpg)
CUDA Teaching Center
• UoM was selected as a CTC– A group of people from multiple Depts– http://research.nvidia.com/content/cuda-teaching-centers
• Benefits– Donation of hardware by NVIDIA (GeForce
GTX480s and Tesla C2070)– Access to other resources
• Expectations– Use of the resources for teaching/research,
industry collaborationJuly-Aug 2011 CUDA Training Program 5
![Page 6: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/6.jpg)
GPU Computing: Introduction
• Graphics Processing Units (GPUs)– high-performance many-core processors that can
be used to accelerate a wide range of applications
• GPGPU - General-Purpose computation on Graphics Processing Units
• GPUs lead the race for floating-point performance since start of 21st century
• GPUs are being used as parallel processors
July-Aug 2011 CUDA Training Program 6
![Page 7: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/7.jpg)
GPU Computing: Introduction
• General computing, until end of 20th century– Relied on the advances in hardware to increase the
speed of software/apps• Slowed down since then due to
– Power consumption issues– Limited productivity within a single processor
• Switch to multi-core and many-core models – Multiple processing units (processor cores) used in
each chip to increase the processing power– Impact on software developers?
July-Aug 2011 CUDA Training Program 7
![Page 8: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/8.jpg)
GPU Computing: Introduction
• A sequential program will only run on one of the cores, which will not become any faster
• With each new generation of processors – Software that will continue to enjoy performance
improvement will be parallel programs– Where, multiple threads of execution cooperate to
achieve the functionality faster
July-Aug 2011 CUDA Training Program 8
![Page 9: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/9.jpg)
CPU-GPU Performance Gap
July-Aug 2011 CUDA Training Program 9
Source: CUDA Prog. Guide 4.0
![Page 10: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/10.jpg)
CPU-GPU Performance Gap
July-Aug 2011 CUDA Training Program 10
Source: CUDA Prog. Guide 4.0
![Page 11: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/11.jpg)
GPGPU & CUDA
• GPU designed as a numeric computing engine – Will not perform well on some tasks as CPUs– Most applications will use both CPUs and GPUs
• CUDA– NVIDIA’s parallel computing architecture aimed at
increasing computing performance by harnessing the power of the GPU
– A programming model
July-Aug 2011 CUDA Training Program 11
![Page 12: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/12.jpg)
More Details on GPUs
• GPU is typically a computer card, installed into a PCI Express 16x slot
• Market leaders: NVIDIA, Intel, AMD (ATI)– Example NVIDIA GPUs (donated to UoM)
GeForce GTX 480 Tesla 2070
July-Aug 2011 12CUDA Training Program
![Page 13: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/13.jpg)
Example SpecificationsGTX 480 Tesla 2070
Peak double precision floating point performance
650 Gigaflops 515 Gigaflops
Peak single precision floating point performance
1300 Gigaflops 1030 Gigaflops
CUDA cores 480 448
Frequency of CUDA Cores
1.40 GHz 1.15 GHz
Memory size (GDDR5) 1536 MB 6 GigaBytes
Memory bandwidth 177.4 GBytes/sec 150 GBytes/sec
ECC Memory NO YES
July-Aug 2011 13CUDA Training Program
![Page 14: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/14.jpg)
CPU vs. GPU Architecture
The GPU devotes more transistors for computation
July-Aug 2011 14CUDA Training Program
![Page 15: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/15.jpg)
CPU-GPU Communication
July-Aug 2011 15CUDA Training Program
![Page 16: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/16.jpg)
CUDA Architecture• CUDA is NVIDA’s solution to access the GPU• Can be seen as an extension to C/C++
CUDA Software Stack
July-Aug 2011 16CUDA Training Program
![Page 17: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/17.jpg)
CUDA ArchitectureThere are two main parts
1.Host (CPU part)-Single Program, Single Data
2.Device (GPU part)-Single Program, Multiple
Data
July-Aug 2011 17CUDA Training Program
![Page 18: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/18.jpg)
CUDA Architecture
GRID ArchitectureJuly-Aug 2011 18CUDA Training Program
The Grid1.A group of threads all running
the same kernel2.Can run multiple grids at once
The Block1.Grids composed of blocks2.Each block is a logical unit containing a number of coordinating threads and some amount of shared memory
![Page 19: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/19.jpg)
Some Applications of GPGPU
Computational Structural Mechanics
Bio-Informatics and Life Sciences
Computational Electromagnetics and Electrodynamics
Computational Finance
July-Aug 2011 19CUDA Training Program
![Page 20: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/20.jpg)
Some Applications…
Computational Fluid Dynamics
Data Mining, Analytics, and Databases
Imaging and Computer Vision
Medical Imaging
July-Aug 2011 20CUDA Training Program
![Page 21: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/21.jpg)
Some Applications…
Molecular Dynamics
Numerical Analytics
Weather, Atmospheric, Ocean Modelingand Space Sciences
July-Aug 2011 21CUDA Training Program
![Page 22: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/22.jpg)
CUDA ProgrammingBasics
![Page 23: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/23.jpg)
Accessing/Using the CUDA-GPUs
• You have been given access to our cluster– User accounts on 192.248.8.13x– It is a Linux system
• CUDA Toolkit and SDK for development– Includes CUDA C/C++ compiler for GPUs (“nvcc”)– Will need C/C++ compiler for CPU code
• NVIDIA device drivers needed to run programs– For programs to communicate with hardware
July-Aug 2011 CUDA Training Program 23
![Page 24: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/24.jpg)
Example Program 1• “__global__” says
the function is to be compiled to run on a “device” (GPU), not “host” (CPU)
• Angle brackets “<<<“ and “>>>” for passing params/args to runtime
July-Aug 2011 CUDA Training Program 24
#include <cuda.h>
#include <stdio.h>
__global__ void kernel (void) { }
int main (void)
{
kernel <<< 1, 1 >>> ();
printf("Hello World!\n");
return 0;
}
A function executed on the GPU (device) is usually called a “kernel”
![Page 25: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/25.jpg)
Example Program 2 – Part 1
July-Aug 2011 CUDA Training Program 25
As can be seen in next slide:
•We can pass parameters to a kernel as we would with any C function
• We need to allocate memory to do anything useful on a device, such as return values to the host
![Page 26: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/26.jpg)
Example Program 2 – Part 2int main (void) {
int c, *dev_c;
cudaMalloc ((void **) &dev_c, sizeof (int));
add <<< 1, 1 >>> (2,7, dev_c);
cudaMemcpy(&c, dev_c, sizeof(int),
cudaMemcpyDeviceToHost);
printf(“2 + 7 = %d\n“, c);
cudaFree(dev_c);
return 0;
}
July-Aug 2011 CUDA Training Program 26
![Page 27: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/27.jpg)
Example Program 3
Within host (CPU) code, call the kernel by using <<< and >>> specifying the grid size (number of blocks) and/or the block size (number of threads) - (more details later)
July-Aug 2011 27CUDA Training Program
![Page 28: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/28.jpg)
Example Program 3 …contd
July-Aug 2011 28CUDA Training Program
Note:Details on threads and thread IDs will come later
![Page 29: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/29.jpg)
Example Program 4
July-Aug 2011 29CUDA Training Program
![Page 30: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/30.jpg)
Grids, Blocks and Threads
July-Aug 2011 30CUDA Training Program
• A grid of size 6 (3x2 blocks)
• Each block has 12 threads (4x3)
![Page 31: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/31.jpg)
Conclusion
• In this session we discussed– Introduction to GPU Computing– GPU Computing with CUDA– CUDA Programming Basics
• Next session– Data Parallelism– CUDA Programming Model– CUDA Threads
July-Aug 2011 CUDA Training Program 31
![Page 32: Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649e2c5503460f94b1b0b7/html5/thumbnails/32.jpg)
References for this Session
• Chapters 1 and 2 of: D. Kirk and W. Hwu, Programming Massively Parallel Processors, Morgan Kaufmann, 2010
• Chapters 1-4 of: E. Kandrot and J. Sanders, CUDA by Example, Addison-Wesley, 2010
• Chapters 1-2 of: NVIDIA CUDA C Programming Guide, NVIDIA Corporation, 2006-2011 (Versions 3.2 and 4.0)
July-Aug 2011 CUDA Training Program 32