GPU - An Introduction

Post on 28-Jan-2015

123 views 3 download

Tags:

description

GPU ,GPU Architecture, CUDA, TLP

Transcript of GPU - An Introduction

Graphics Processing UnitGraphics Processing Unit

DHAN V SAGARCB.EN.P2CSE13007

Introduction

It is a processor optimized for 2D/3D graphics, video, visual computing, and display.

It is highly parallel, highly multi threaded multiprocessor optimized for visual computing.

It provide real-time visual interaction with computed objects via graphics images, and video.

History

● Up to late 90's– No GPUs– Much simpler VGA controller

● Consisted of– A memory controller– Display generator + DRAM

● DRAM was either shared with CPU or private

History

● By 1997– More complex VGA controllers

● Incorporated 3D accelerating functions in hardware

– Triangle set up and rasterization– Texture mapping and shading

A combination of shapes(Lines, polygons, letters, …)

into an image consisting of individual pixels

History

● By 2000– Single chip graphics processor incorporated

nearly all functions of graphics pipeline of high-end workstations

● Beginning of the end of high-end workstation market

– VGA controller was renamed Graphic Processing Units

Current Trends

Well defined APIs

Open GL:Open standard for 3D graphics programming

Web GL:Open GL extension for web

DirectX:Set of MS multimedia programming interfaces (Direct3D for 3D graphics)

Can implement novel graphics algorithms

Use GPUs for non-conventional applications

Current Trends

Combining powers of CPU and GPU - heterogeneous architectures

GPUs become scalable parallel processors

Moving from hardware-defined pipelining architectures to more flexible programmable architectures

Architechture Evolution

CPU

Graphics card

Display

Memoryfloating point co-processors attached to microprocessors.

Interest to provide hardware support for displays

Led to graphics processing units (GPUs)

GPUs with dedicated pipelines

Input stage

Vertex shader stage

Geometry shader stage

Rasterizer stage

Frame buffer

Pixel shading stage

Graphics

memory

Graphics chips generally had a pipeline structure

individual stages performing Specialized operations, finallyleading to loading frame buffer for Display

Individual stages may have access to graphics memory for storing intermediate computed data.

PROGRAMMING GPUS

• Will focus on parallel computing applications

• Must decompose problem into set of parallel computations

• Ideally two-level to match GPU organization

Example

Data are inbig array

Small array

Small array

Small array

Small array

Small array

Tiny Tiny

Tiny Tiny

GPGU and CUDA

GPGU

● General-Purpose computing on GPU

● Uses traditional graphics API and graphics pipeline

CUDA

● Compute Unified Device Architecture

● Parallel computing platform and programming model

● Invented by NVIDIA

● Single Program Multiple Data approach

CUDA

➢ CUDA programs are written in C

➢ Within C programs, call SIMT “kernel” routines that are executed on GPU

➢ Provides three abstractions

➢ Hierarchy of thread groups➢ Shared memory➢ Barrier synchronization

Cont..

CUDA

● Lowest level of parallelism – CUDA Thread

● Compiler + Hardware can gang 1000s of CUDA threads together leads to various levels of parallelism within the GPU

● MIMD,SIMD,Instruction level Parallelism

Single Instruction, Multiple Thread (SIMT)

Conventional C Code

// Invoke DAXPY

dapxy(n,2.0,x,y);

// DAXPY in C

void daxpy(int n,double a,double *x, double *y)

{

for (int i=0;i<n;++i)

y[i] = a*x[i] + y[i];

}

Corresponding CUDA Code

// Invoke DAXPY with 256 threads per Thread Block

_host_

int nblocks = (n+255)/256;

daxpy<<<nblocks,256>>>(n,2.0,x,y);

//DAXPY in CUDA

_device_

Void daxpy(int n,double a,double *x, double *y)

{

int i = blockIdX.x*blockDim.x+threadIdx.x;

if(i<n) y[i]=a*x[i]+y[i];

}

Cont...

● _device_ (OR) _global_ --- functions of GPU

● _host_ --- functions of the system processor

● CUDA variables declared in the _device_ are allocated to the GPU Memory,which is acessable by all the multithreaded SIMD processors

● Function call syntax for the function uses GPU is

name<<<dimGrid,dimBlock>>>(..parameterlist..)

● GPU Hardware handles Threads

● Threads are blocked together and executed in group of 32 threads – Thread Block

● The hardware that executes a whole block of threats is called a Multithreaded SIMD Processor

ReferenceReference

http://en.wikipedia.org/wiki/Graphics_processing_unit

http://www.nvidia.com/object/cuda_home_new.html

http://computershopper.com/feature/200704_the_right_gpu_for_you

http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf

http://en.wikipedia.org/wiki/Larrabee_(GPU)#cite_note-siggraph-9

http://www.nvidia.com/geforce

“Larrabee: A Many-Core x86 Architecture for Visual Computing”, Kruger and Westermann, International Conf. on Computer Graphics and Interactive Techniques, 2005

“ An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness”Sunpyo Hong,Hyesoon Kim

Thank You..