2007.05.30 Jongwon Kim

28
Comparison of next generation graphic processers 1 2007.05.30 Jongwon Kim Comparison of next generation graphic processors, Nvidia G80 and ATI R600

description

Comparison of next generation graphic processors, Nvidia G80 and ATI R600. 2007.05.30 Jongwon Kim. Agenda. Old and new pipeline DirectX 10 Two competitors in graphic card Nvidia G80 architecture ATI R600 architecture Comparison of G80 and R600 Q & A. Old pipeline. - PowerPoint PPT Presentation

Transcript of 2007.05.30 Jongwon Kim

Page 1: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 1

2007.05.30Jongwon Kim

Comparison of next generation graphic processors, Nvidia G80 and ATI R600

Page 2: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 2

Agenda

• Old and new pipeline• DirectX 10• Two competitors in graphic card• Nvidia G80 architecture• ATI R600 architecture• Comparison of G80 and R600• Q & A

Page 3: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 3

Old pipeline

• The traditional graphics (transform & lighting) pipeline▪ Geometry pipeline

• Modeling transformation• Per-vertex lighting & shading• Viewing transformation• Projection transformation• Clipping• Triangle setup

▪ Scan-line conversion▪ Rendering/rasterization

• Triangle setup• Texturing, fragment shading• Alpha, stencil and depth testing• Frame buffer blending• Anti-aliasing (optional)

Page 4: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 4

Pipeline with shaders

• Vertex information input▪ Vertex Data▪ High-order primitive tessellation

• Geometry pipeline▪ T&L engine▪ vertex shaders▪ Viewports and clipping

• Pixel and texture blending▪ Multi-texturing▪ pixel shaders▪ fog blending

• Rasterization▪ alpha, stencil and depth testing▪ frame buffer blending

Page 5: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 5

The situation today

• No single graphics hardware target • CPU-bound games and applications

▪ Bandwidth and CPU cycles are the bottleneck in multiple areas (physics, AI)

▪ Large amount of CPU resources spent directing the GPU

• GPU overly-specialized

GPUGPU

CPUCPU

Page 6: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 6

DirectX 10

• Big changes from DirectX 9▪ No more fixed function▪ State grouping▪ Reduced CPU load

• remove overhead from DirectX 9▪ Unified pixel shader and vertex shader▪ Shader model 4.0▪ Geometry Shader

• ex) generalize displacement map▪ Texture array

• dynamically indexable in the shader▪ Predicated draw

• GPU only process occlusion query▪ Stream out

• generated geometry easily redrawn▪ But support only Window vista

Texture Arrays Format Reinterpretation

Stream OutputResource ViewsInput Assembler

Immediate offset on Memory AccessInteger/Bitwise Instructions

Comparison FilteringConstant Buffers

State ObjectsShared-Exponent HDR Compression (RGBE)

Block-Compressed Formats for bump/normal maps

128 texture slots8 Render targets

More interstage communicationInstance, Vertex, Primitive identifiers

Per-primitive Clip distancePredicated Rendering

Alpha-to-CoverageMultisample ReadbackBetter cubemap filtering

Input Assembler…

Texture Arrays Format Reinterpretation

Stream OutputResource ViewsInput Assembler

Immediate offset on Memory AccessInteger/Bitwise Instructions

Comparison FilteringConstant Buffers

State ObjectsShared-Exponent HDR Compression (RGBE)

Block-Compressed Formats for bump/normal maps

128 texture slots8 Render targets

More interstage communicationInstance, Vertex, Primitive identifiers

Per-primitive Clip distancePredicated Rendering

Alpha-to-CoverageMultisample ReadbackBetter cubemap filtering

Input Assembler…

Page 7: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 7

Pipeline of the DirectX 10

• Input assembler• Vertex shader 4.0• Geometry shader• Rasterizer (scan conversion)• Pixel shader 4.0• Output merger

InputAssembler

Vertex Buffer

Index Buffer

Texture

Texture

Texture

Depth/ Stencil

Render Target

Stream Output

VertexShader

GeometryShader

Rasterizer/Interpolator

PixelShader

OutputMerger

Page 8: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 8

Shader programming

• What is a shader?▪ A part of the graphics renderer, which is responsible for calculati

ng the color of an object▪ The shader can apply transformations to a large set of elements

at a time for every vertex of a model or to each pixel in an area of the screen

▪ GPU (graphics processing unit) can provide shading functions▪ Shader functions introduced in the OpenGL version 1.5 and in th

e DirectX 8

• Why the shader is need?▪ Shader is well suited to parallel processing▪ GPU have a multi-core design to facilitate parallel processing

Page 9: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 9

Shader model 4.0

• Unified shading architecture▪ Program can assign shader units (stream processors) as vertex o

r pixel shader▪ New HDR(high dynamic range) format

• FP16(64bit), FP24(96bit), FP32(128bit)• R11G11B1110, R9G9B9+5 (half size, same dynamic range)

Page 10: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 10

Geometry shader

• Geometry shader▪ GPU can’t create new data, shaded only in shader model 3.0▪ Gemetry shader can add or remove vertecies▪ Displacement mapping, stencil shadow extrusion, piont sprit crea

tion, motion blur etc.▪ Created data moves through stream output. And send input asse

mbler

Page 11: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 11

Why unify shader

Vertex Shader

Pixel Shader

Fully Loaded

Partially LoadedComplex Geometry

Processing

Vertex Shader

Pixel Shader

Fully Loaded

Partially LoadedComplex Geometry

Processing

Vertex Shader

Pixel Shader

Fully Loaded

Partially Loaded

Complex PixelProcessing

Vertex Shader

Pixel Shader

Fully Loaded

Partially Loaded

Complex PixelProcessing

Page 12: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 12

Unified shader

Complex Geometryand Pixel Processing

Vertex, Pixel & Geometry Shader

Unified Shader Architecture

Complex Geometryand Pixel Processing

Vertex, Pixel & Geometry Shader

Unified Shader Architecture

Page 13: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 13

Two competitors

• NVIDIA Corporation▪ Worldwide leader in GPU technologies▪ Major supplier of PC mother board chipset, graphics cards▪ Developed GPU for game console, Xbox and PlayStation 3▪ Well known products

• RIVA TNT, NVIDIA GeForce series, NVIDIA nForce series

• ATI Technologies▪ Major supplier of GPU, PC mother board chipset, graphics cards▪ Purchased by AMD in October 2006▪ Developed GPU for game console, Nintendo 64, Xbox 360, Wii▪ Well known products

• Mach32, Rage series, Radeon series, Xpress series

Page 14: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 14

G80 and R600

Nvidia G80 ATI R600Release date Nov-06 May-07

Transitors 681 million 700 million

Die size 20x21mm

Core clock 575Mhz 742Mhz

Shader clock 1350Mhz

Stream processor 128 320

Shader processing 518.4GFLOPS 475GFLOPS

Memory clock 900Mhz 825Mhz

Memory IF bus 384bit 512bit

Memory bandwidth 86.4GB/s 105.6GB/s

Memory size 768MB 512MB

Texture fill rate 36.8GT/s 11.9GT/s

Geometry rate 742Mtris/s

Fab 90nm 80nm

ROPs 24 16

Bus type GDDR3 GDDR3

Page 15: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 15

Nvidia G80, 8800GTX

Page 16: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 16

ATI R600, HD2900XT

Page 17: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 17

Stream processor unit

• R600 - 320 independent ALU unit (with 64 Special ALU)

• G80 - 128 ALU + 128 Special ALU

Page 18: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 18

Cluster architecture

• R600▪ 320 = 4 cluster * 16 * 5 scalar unit▪ 8 thread = 2 arbiter * 4 cluster

• G80▪ 128 = 8 cluster * 16 shader unit▪ 16 thread = 2 instruction fetcher * 8 cluster

Page 19: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 19

Clock war

• R600▪ Core clock is the same with shader clock, 742Mhz

• G80▪ Shader clock is higher than graphics core clock▪ Shader clock 1350Mhz▪ Core clock 575Mhz

Page 20: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 20

PCI Express

PCI Express

Ring Stop

Ring Stop

RingRingStopStop

RingRingStopStop

RingRingStopStop

RingRingStopStop

Memory interface

• R600▪ 512 bit ring bus

• Simplifies routing to improve scalability• Reduces wire delay• Reduces number of repeaters required• 105.6GB/s

• G80▪ 384 bit crossbar bus▪ 86.4GB/s

Page 21: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 21

HW tessellation

▪ ATI has provided HW tessellation, TruForm▪ Xbox 360 already include this▪ DirectX 11 will include

Page 22: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 22

Nvidia G80 architecture

Page 23: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 23

Nvidia G80 diagram

Page 24: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 24

CUDA thread computing

• Compute unified device architecture

Page 25: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 25

CUDA thread computing

Page 26: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 26

ATI R600 architecture

• Command Processor• Setup Engine• Ultra-Threaded

DispatchProcessor

• Stream Processing Units

• Texture Units & Caches

• Memory Read/Write Cache & Stream Out Buffer

• Shader Export• Render Back-Ends

Z/S

tencil

Cache

Color Cache

VertexAssembler

Command Processor

GeometryAssembler

Scan Converter /Rasterizer

Interpolators

Hie

rarc

hic

al Z

ShaderC

on

sta

nt C

ache

Verte

x Inde

x F

etc

h

Str

eam

Ou

t B

uff

er

L2 T

extu

re C

ach

e

ProgrammableTessellator

Ultra-Threaded Dispatch Processor

Shader Export

ShaderIn

stru

ctio

n C

ache

Mem

ory

Re

ad/W

rite

Cache

L1 T

extu

re C

ach

e

Verte

x Ca

che

StreamStreamProcessingProcessing

UnitsUnits

Render BackRender Back--EndsEnds

Te

xtu

re U

nits

Te

xtu

re U

nits

SetupSetupEngineEngine

Page 27: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 27

ATI R600 diagram

Page 28: 2007.05.30 Jongwon Kim

Comparison of next generation graphic processers 28

Q&A

– mailto://[email protected]– Links

– http://developer.nvidia.com– http://www.beyond3d.com– http://www.extremetech.com/article2/0,1697,2053309,00.asp– http://en.wikipedia.org/wiki/Radeon_R600– http://en.wikipedia.org/wiki/GeForce_8_series– http://www.beyond3d.com/content/reviews/1/