Evolution of the Graphics Process Units Dr. Zhijie Xu [email protected].

Evolution of the Graphics Process Units

Dr. Zhijie [email protected]

A few words about me

Research Interests: Simulation – VR – Game – (backing to) VR and CG.

Fascinated by the R&D in CG and progresses on rendering devices

Retina display and brain wave control

Outline

History of the GPUs Process Paradigm and Programming

Model Current Research Hotspots Future Trend

Forewords from the IEEE Visualization 2005 Conference

Desktop computer architecture is at a turning point. In the last two years, CPU speeds have nearly stopped increasing and all major CPU manufacturers have announced multi-core, parallel processors.

Future performance improvements will predominantly come from parallelism rather than from an ever-increasing uni-processor clock speed.

Commodity graphics processors (GPUs), in contrast, already contain many parallel processing units and are capable of sustaining computation rates greater than ten times that of a modern CPU. The GPU programming model, however, is very different from traditional CPU models.

What is a GPU?

“A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated graphics rendering device for a personal computer or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly-parallel structure makes them more effective than typical CPUs for a range of complex algorithms.”

- Definition from wikipedia.org

Radeon 9800 Pro

History of GPUs

The pre-GPU era VGAs in the 80s

4 (or even 5) generations of GPUs in the last decade

Fixed functions vs. programmability API support

OpenGL, Direct3D (v6.0 to v9.0) Shader models (v1.0 – v3.0)

History of GPUs – generations in the function’s term

First-Generation GPUs Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s

Voodoo3;DX6 feature set. Second-Generation GPUs

1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and S3’s Savage3D; T&L; OpenGL and DX7;Configurable.

Third-Generation GPUs 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB,

DX7/8; Vertex Programmability + ASM Fourth-Generation GPUs

2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M T/S.

I have just seen Radeon X1900 last Thursday

History of GPUs - generations in the stream processing’s term

Pre-NV2x: no explicit support for stream processing. Kernel operations are usually hidden in the API and provide too little flexibility for general use.

NV2x: kernel stream operations are now explicitly under the programmer's control but only for vertex processing (fragments are still using old paradigms). No branching support severely hampers flexibility but some algorithms can be run (notably, low-precision fluid simulation).

RD3xx: increased performance and precision with limited support for branching/looping in both vertex and fragment processing. The model is now flexible enough to cover many purposes.

NV4x: Very flexible branching support although some limitations still exists on the number of operations to be executed and strict recursion depth. Performance is estimated to be from 20 to 44GFLOPs.

What GPUs are capable of?

Why shifting from CPU to GPU?

Why not just keep increasing the CPU speed and leave the GPU to handle what is its best?

CPU speed is reaching a bottle neck (how many transistors can be integrated on a chip) Solution, in the future, nano technology, short

term, dual core machines (double CPUs), clustered CPUs, …, even grid computing and supercomputing

GPU facing the same problem, but still have space to press on due to its task specific designs and parallelism paradigm

Hungers for More Computational Powers – volume, speed, accuracy

Supercomputing (parallel computing) Applications, particle dynamics, network analysis,

finite element analysis, ocean tide analysis, virtual universe simulation, airplane design, other military simulation, etc.

Japanese Earth Simulator, champion of 2002 (5120 NEC CPUs)

IBM Blue Gene winner in 2005 (65536 Duel-core PowerPC CPUs)

What’s missing in the formula? - COST

Process Paradigm and Programming Model

Real-time computer graphics hardware is transiting from supporting a few fixed algorithms to being fully programmable. At the same time, the performance of graphics processors (GPUs) is increasing at a rapid rate because GPUs can effectively exploit the enormous parallelism available in graphics computations.

These improvements in GPU flexibility and performance are likely to continue in the future, and will allow developers to write increasingly sophisticated and diverse programs that execute on the GPU.

From Sequential to Parallel ParadigmConventional, sequential paradigm

for(int i = 0; i < 100 * 4; i++) result[i] = source0[i] + source1[i];

Parallel SIMD paradigm, packed registers

for(int el = 0; el < 100; el++)vector_sum(result[el], source0[el], source1[el]);

Parallel Stream paradigm (SIMD/MIMD)

streamElements 100 streamElementFormat 4 numbers elementKernel "@arg0+@arg1" result = kernel(source0, source1)

Stream processing is a relatively new, yet very successful paradigm to allow parallel processing at never-seen-before efficiency with minimal effort. Compared to existing architectures, stream processors are able to provide up to 20X the performance at the same power dissipation and die size.

GPU Rendering PipelineSource nVidia – “Vertex Shader Introduction”

Data Flow in the Pipeline

A scene description: vertices, triangles, colors, lighting

Transformations that map the scene to a camera viewpoint

“Effects”: texturing, shadow mapping, lighting calculations

Rasterizing: converting geometry into pixels

Pixel processing: depth tests, stencil tests, and other per-pixel operations.

The Motivation for High Level Languages Graphics hardware has

become increasingly more powerful

Programming powerful hardware with assembly code is hard

GeForce FX supports programs more than 1,000 assembly instructions long

Programmers need the benefits of a high-level language: Easier

programming Easier code reuse Easier debugging

Assembly …DP3 R0, c[11].xyzx, c[11].xyzx;RSQ R0, R0.x;MUL R0, R0.x, c[11].xyzx;MOV R1, c[3];MUL R1, R1.x, c[0].xyzx;DP3 R2, R1.xyzx, R1.xyzx;RSQ R2, R2.x;MUL R1, R2.x, R1.xyzx;ADD R2, R0.xyzx, R1.xyzx;DP3 R3, R2.xyzx, R2.xyzx;RSQ R3, R3.x;MUL R2, R3.x, R2.xyzx;DP3 R2, R1.xyzx, R2.xyzx;MAX R2, c[3].z, R2.x;MOV R2.z, c[3].y;MOV R2.w, c[3].y;LIT R2, R2;...

HLLfloat4 cSpecular = pow(max(0, dot(Nf, H)), phongExp).xxx; float4 cPlastic = Cd * (cAmbient + cDiffuse) + Cs * cSpecular;

GPU Programming

Game Applications: Per-pixel lighting Vertex displacement Furs and Shines (ATi demos) Various Shading Models (Treasure box

and RenderMonkey) Bump map creation and the virtual

earth

One more reason to have a decent Graphics Card with a decent GPU mounted .. Microsoft Windows Vista Operating System

To be released at the end of this year Aero glass 3D interface More than half of all PCs (more than 63% of 203million

PCs) won’t support it because the integrated graphics adaptor only support Windows2000 and WindowsXP’s 2D interface

Aero Glass is part of the Vista’s interface – Aero, which requires the graphics card to support DirectX9.0c, for example, Nvidia GeForce5900

In 2005, there were over 22.3 million standalone graphics cards (market value over 10 billion dollars) sold globally, in which more than 72% (13.4 million) can support Aero Glass

Microsoft announced last week, the next big game title released – Ring II – will only run on Vista

Vista causes legal battles with PC manufacturers

Non-Game Applications: GPGPU

Recent advances in programmability and architectural design have enabled the use of GPU processors for general purpose computation.

Applications in: Linear algebra Geometric Computing Database and Stream Mining GPU Ray Tracing Advanced Image Processing Computational Fluid dynamics (CFD) and

Finite Element Analysis

Problems Need to be Solved

Significant barriers exist for the developer who wishes to use the inexpensive power of commodity graphics hardware, whether for in-game simulation of physics or for conventional computational science.

These chips are designed for and driven by video game development; the programming model is unusual, the programming environment is tightly constrained, and the underlying architectures are largely secret.

Potential Research Areas

GPGPU Building Blocks Mapping computational concepts to the GPU Linear algebra Sorting and searching Geometric Computing High-level Languages and Debugging Tools

Computational Building blocks Math: Linear Algebra, Finite Difference, Finite

Element General Algorithms: Searching, Sorting, etc.

Progress on GPGPU

GPGPU Programming Library GLIT, Accelerator

Increased pressure on manufacturers from "GPGPU users" to improve hardware design, usually focusing on adding more flexibility to the programming model.

Summary

The graphics processor (GPU) on today's commodity video cards has evolved into an extremely powerful and flexible processor.

The latest graphics architectures provide tremendous memory bandwidth and computational horsepower, with fully programmable vertex and pixel processing units that support vector operations up to full IEEE floating point precision.

High level languages have emerged for graphics hardware, making this computational power accessible. Architecturally, GPUs are highly parallel streaming processors optimized for vector operations, with both MIMD (vertex) and SIMD (pixel) pipelines.

GPUs are capable of general-purpose computation beyond the graphics applications for which they were designed. But application programming barriers need to be taken down.

Questions

Evolution of the Graphics Process Units Dr. Zhijie Xu [email protected].

Documents

Transcript of Evolution of the Graphics Process Units Dr. Zhijie Xu [email protected].