3 d to _hpc

40
An Introduction to GPU 3D Games to HPC Krishnaraj Rao Presented at Bangalore DV Club, 03/12/2010

Transcript of 3 d to _hpc

Page 1: 3 d to _hpc

An Introduction to GPU3D Games to HPC

Krishnaraj RaoPresented at Bangalore DV Club, 03/12/2010

Page 2: 3 d to _hpc

Agenda

3D GraphicsThe Big Picture

Quick Overview

Programming Model

Importance of 3D

High Performance Parallel ComputingWhy GPUs for HPPC?

Available APIs

GPU Computing architecture

Q & A

Page 3: 3 d to _hpc

The Big Picture ! Movies

Creation

Capture Models Scene API

Rendering Post Processing

Creation

Page 4: 3 d to _hpc

The Big Picture - Games

Creation

Capture Models Scene API

Rendering Post Processing

Creation

!"#$%

DriversHLSL,Cg

Page 5: 3 d to _hpc

Models end up in World Space

Y

X

Z

Light Source

Screen

View Pointor Camera

World Coordinate Space

Worldspace includes everything!Position and orientation for allitems is needed to accurately calculatetransformations into screen space.

Page 6: 3 d to _hpc

View Transformation world ends up on Screen

Screen Coordinate Space

Page 7: 3 d to _hpc

Simple Interactive 3D Graphics App

A simple exampleStatic scene geometry, moving viewer

Repeat this loop:CPU takes user input from joystick or mouse

CPU re-calculates viewer position, view direction, and light positions in 3-D world space

GPU clears memory and draws the complete scene geometry with the new viewer and light positions

Repeat forever

VertexEngine

Setup Raster

Z Cull

FragmentEngine

Texture

Raster Ops

ReadJoystickPosition

Update Viewer Position and Light

Direction

Draw all Scene

Objects

Page 8: 3 d to _hpc

Adding Programmability to the Graphics Pipeline

3D Applicationor Game

3D API:OpenGL or Direct3D

ProgrammableVertex

Processor

PrimitiveAssembly

Rasterization & Interpolation

3D API Commands

Transformed Vertices

Assembled Polygons, Lines, and

Points

GPU Command &

Data Stream

ProgrammableFragmentProcessor

RasterizedPre-transformed

Fragments

TransformedFragments

RasterOperations

Framebuffer

Pixel Updates

GPUFront End

Pre-transformed Vertices

Vertex Index Stream

Pixel Location Stream

CPU ! GPU Boundary

Page 9: 3 d to _hpc

NVIDIA Confidential

A History of Innovation

1999GeForce 256

22 Million Transistors

2002GeForce463 MillionTransistors

2003GeForce FX130 Million Transistors

2004GeForce 6 222 Million Transistors

1995NV1

1 Million Transistors

2005GeForce 7 302 Million Transistors

2008GeForce GTX 200

1.4 BillionTransistors

2006-2007GeForce 8 754 Million Transistors

"#$but what do all these extra transistors do?

Page 10: 3 d to _hpc

GPU continues to offload CPU work

GeomGather

GeomProc

TriangleProc

PixelProc

Z / Blend

GPUCPU

GeomGather

GeomProc

TriangleProc

PixelProc

Z / Blend

GPUCPU

GeomGather

GeomProc

TriangleProc

PixelProc

Z / Blend

GPUCPU

Physics and AI

Scene Mgmt

GeomGather

GeomProc

TriangleProc

PixelProc

Z / Blend

GPUCPU

Physics and AI

Scene Mgmt

1996

2000

2004

2008

Page 11: 3 d to _hpc

Programming Model

API: Set of functions, procedures or classes that an OS, library or service provides to support requests made by computer programs

DirectX: Collection of APIs to handle multimedia, esp. game programming and video tasks, on MS platforms.

OpenGL (Open Graphics Library) is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics.

Page 12: 3 d to _hpc

Why is 3D Graphics important?More than just Fun and Games....

Tokyo, Japan California Coastline

Page 13: 3 d to _hpc

3D Consumer Applications

Music

Vista

Photos Maps

PDFsOffice

Page 14: 3 d to _hpc

GPUS IN HPC

Page 15: 3 d to _hpc

MassiveData

Parallelism

Data Fits in Cache Huge Data Sets

!"#$%&'#()#*)+,#-.//#,/

InstructionLevel

Parallelism

Page 16: 3 d to _hpc

GPU Processing Power

!"#

!"#$#%&'()&*+,

*-.&/0123

45.-&(6789:

$"#

#;<2=&>012&?@&AB,

-&/0123

4.*&'6789:

>9C

'9C

CPU, meet your new partner!

Page 17: 3 d to _hpc

With floating-point math and textures, graphics processors can be used for more than just graphics

%&%&'$($)%*+*,-.$&/,012*$3140/56+7$1+$%&'28

Lots of ongoing research mapping algorithms and problems onto programmable GPUs

Solving Linear Equations

Black-Scholes Options Pricing

Rigid- and Soft-Body Dynamics

Middleware layers being developed to accelerate )*9*$:-+;98$7-4*$0<926:2$1+$%&'2$=HavokFX)

Beyond Graphics

Page 18: 3 d to _hpc

What is GPGPU ?

General Purpose computation using GPUin applications other than 3D graphics

GPU accelerates critical path of application

Data parallel algorithms leverage GPU attributesLarge data arrays, streaming throughput

Fine-grain SIMD parallelism

Floating point (FP) computation

%,*-5$>1,$)*4?-,,-226+7.9$0-,-..*.8$-.71,65<42

Applications ! see //GPGPU.orgGame effects (FX) physics, image processing

Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting

Page 19: 3 d to _hpc

A quiet buildup of potential

Calculation Throughput and Memory Bandwidth: 10X

Equivalent performance at fraction of power & cost

GPU in every PC ! pervasive presence and massive impact

%&'2$<-@*$-.A-92$?**+$0-,-..*.$)4/.56-:1,*8

Natively designed to handle massive threading

Every pixel is a thread

Increased precision (fp32), programmability, flexibility

GPUs are a mass-market parallel processor

Economies of scale

Peak floating point performance is much higher than comparable CPUs

Why Computation on the GPU?

ATI x1900XT!$400 (video card)!250 GFLOPs (SP Float)!46 GB main memory BW

Intel Core 2 Duo E6600!$400 (processor only)!40 GFLOPS (SP Float)!8.5 GB main memory BW

Page 20: 3 d to _hpc

Why Computation on the GPU?

Supercomputing PerformanceInherently Parallel Architecture

1000+ cores, massively parallel processing

250x the compute performance of a PC

Personal)B+*$Researcher, One C/0*,:140/5*,8

Supercomputer in a desktop system

Plugs into standard power strip

AccessibleProgram in C, C++, Fortran for Windows or Linux

Available from OEMs and resellers worldwide and priced like a workstation

Page 21: 3 d to _hpc

Compute Applications

Computational Fluid Dynamics

Computer Aided Engineering

Digital Content Creation

Electronic Design Automation

Finance

Game Physics

Graphics

Imaging and Computer Vision

Medical Imaging

Numerics

Bio-Informatics and Life Sciences

Computational Chemistry

Computational Electromagnetics & Electrodynamics

Data Mining, Analytics & Databases

MATLAB Acceleration

Molecular Dynamics

Weather, Atmospheric, Ocean Modeling, and Space Sciences

Libraries

Oil & Gas

Programming Tools

Ray Tracing

Signal Processing

Video & Audio

Page 22: 3 d to _hpc

Heterogeneous Computing

Multi-Core

CPU

Parallel-Core

GPU

Page 23: 3 d to _hpc

APIS FOR HETEROGENEOUS COMPUTING

Page 24: 3 d to _hpc

APIs for Heterogeneous Computing

CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA. Programmers use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. Both low/high level APIs are provided

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors.

Microsoft DirectCompute is an API that supports General-purpose computing on GPUs on Microsoft Win Vista or Win 7. DirectCompute is part of the Microsoft DirectX collection of APIs.

Page 26: 3 d to _hpc

One Host+ one or more Compute DevicesEach Compute Device is composed of one or more Compute Units

Each Compute Unit is further divided into one or more Processing Elements

OpenCL: Platform Model & Program Structure

Page 27: 3 d to _hpc

CUDA Parallel Computing Architecture

ISA and hardware compute engine

Includes a C-compiler plus support for OpenCL and

DX11 Compute

Architected to natively support all computational interfaces

(standard languages and APIs)

Page 28: 3 d to _hpc

Shared back-end compiler and optimization technology

OpenCL and C for CUDA

OpenCL

C for CUDA

PTX

GPU

Entry point for developers who prefer high-level C

Entry point for developers who want

low-level API

Option 1

Page 29: 3 d to _hpc

146X

Medical Imaging

U of Utah

36X

Molecular Dynamics

U of Illinois, Urbana

18X

Video Transcoding

Elemental Tech

50X

MatlabComputing

AccelerEyes

100X

Astrophysics

RIKEN

149X

Financial simulation

Oxford

47X

Linear AlgebraUniversidad

Jaime

20X

3D UltrasoundTechniscan

130X

Quantum Chemistry

U of Illinois, Urbana

30X

Gene Sequencing

U of Maryland

CUDA SuccessDScience & ComputationNot 2x or 3x, but speedups are 20x to 150x

Page 30: 3 d to _hpc

$100K - $1MAccessibility

Pe

rfo

rma

nc

e

250x

< $10 K

TeslaPersonal

Supercomputer

E1;-9F2

Workstations1x

250xFaster

100x more affordable20x less power consumption

SupercomputingCluster

Page 31: 3 d to _hpc

C1.@6+7$5<*$G1,.;F2$H125$3140.*I$

Challenges

Oil & Gas

Science

Medicine

Broadcast Space Exploration

Film

Auto Design

Page 32: 3 d to _hpc

Grand Computing Challenges

Renewable Energy

Personalized Medicine

Mathematics for Scientific Discovery

InformationData Mining

Machines That Think

Natural Human Machine

Interaction

Predict Environmental

Changes

Economic Analysis

Page 33: 3 d to _hpc

Final Thoughts

GPU and heterogeneous parallel architecture will revolutionize computing

Parallel computing needed to solve some of the most interesting and important human challenges ahead

Learning parallel programming is imperative for students in computing and sciences

Page 34: 3 d to _hpc

From Virtua Fighter to Tsubame

1995 ! NV1 2008 ! GT200

0.8M transistors 1,200M transistors

50MHz 1.3GHz

1M Bytes 4G Bytes

0 GFLOPS 1 TFLOPS

Another 1000x in 15 years?

Page 35: 3 d to _hpc

BACKUP

Page 36: 3 d to _hpc

Graphics API History

Page 37: 3 d to _hpc

Open GL

1992: OpenGL 1.0

1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing)

1998: OpenGL 1.2 (3D Textures, BGRA pixel format)

1998: OpenGL 1.2.1 (Multi-Texture)

2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures)

2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation)

2003: OpenGL 1.5 (Vertex Attr from Vid Mem)

2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex)

2006: OpenGL 2.1 (GLSL1.2, sRGB Textures)

2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures)

2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI)

2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)

Page 38: 3 d to _hpc

OpenGL ES

Designed for hand-held and embedded devicesGoal is smaller footprint to support OpenGL

PlayStation 3 and cell phone industry adopting ES

OpenGL ES 1.1Strips out anything deemed extra in OpenGL

Keeps conventional fixed-function vertex and fragment processing

OpenGL ES 2.0Adds programmable vertex and fragment shaders

Shaders specified in binary format

Drops support for fixed-function vertex and fragment processing

Page 39: 3 d to _hpc

OpenGL ES ! Cont

OpenGL ES 1.0 : Symbian OS, Android Platform

OpenGL ES 1.0+ : Playstation 3

OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models)

Open GL ES 2.0 : iPhone 3GS, iPOD touch

Page 40: 3 d to _hpc

DirectX

GDI: legacy Windows graphics API ~1985

DirectX 1.0 ! 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput)

DirectX 3.0 ! 1996 (Rasterization only 3D Support, Akward prog. Model, Not

successful)

DirectX 5.0 ! 1997 (Draw Primitives, DirectX vs OpenGL War)

DirectX 6.0 ! 1998 (Multitexture, OGL/Glide features, Texture Compression)

DirectX 7.0 ! 1999 (Geometry HW accleration and Blending, Cube mapping)

DirectX 8.0 ! 2000/1 (Programable VS/PS Shaders, XBOX)

DirectX 9.0 ! 2002-2003 (More programmability, Branching, FP pixel prog.)

DirectX 9.0c ! 2004 (ShaderModel 3.0)

DirectX 10.0 ! 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output)

DirectX 10.1 ! 2008 (SM4.1, Better Image Quality)

DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)