The Architecture of High-End Mobile Graphics...

34
© Imagination Technologies p1 www.imgtec.com March 2013 The Architecture of High-End Mobile Graphics Hardware

Transcript of The Architecture of High-End Mobile Graphics...

Page 1: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p1 www.imgtec.com

March 2013

The Architecture of High-End Mobile Graphics Hardware

Page 2: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p2

<1 Minute Introduction to Imagination Technologies

IP Licensing Company – anything you need to build an SoC

Focus on PowerVR Graphics and Caustic Ray Tracing at GDC 2013

Customer’s SoC

PowerVR Video

PowerVR Display

Customer & 3rd Party IP

PowerVR Graphics

IMGworks customised IP

Ensigma communications

Audio

Pixels

HelloSoft VoLTE & V.VoIP

PowerVR Vision

MIPS processors

Meta Audio Caustic

Ray Tracing Solutions

Flow Cloud

connectivity

TV &

Radio

Camera

Page 3: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p3

Immediate Mode Renderer (IMR)

Buffers kept in system memory

High bandwidth use, power consumption & latency

Each triangle is processed to completion in submission order

Wastes processing time and thus power due to “overdraw”

‘Early-Z’ techniques help but are only as good as your geometry sorting

Page 4: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p4

Concept: Tiling

Frame buffer sub-divided into Tiles

32x32 pixels per tile, for example

Varies by device

Geometry is sorted into affected tiles

Allows each tile to be processed independently

Small number of fragments per tile

Allows on-chip memory to be used

Page 5: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p5

Tile Based Renderer (TBR)

Rasterizing performed per-tile

Allows the use of fast, on-chip, buffers

Each triangle is processed to completion in submission order

Wastes processing time and thus power due to “overdraw”

‘Early-Z’ techniques help but are only as good as your geometry sorting

Page 6: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p6

Concept: Deferred Rendering

Fragments - Two stage process

Hidden Surface Removal (HSR)

Shading

HSR is pixel perfect

Only visible fragments pass, no ‘overdraw’

Only requires position data

Less bandwidth & processing, saves power

HSR is submission order independent

No need for applications to submit geometry front to back

Page 7: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p7

Tile Based Deferred Renderer (TBDR) = PowerVR

Rasterizing performed per-tile

Allows the use of fast, on-chip, buffers

Hidden Surface Removal (HSR) reduces overdraw

Pixel perfect, and submission order independent, no geometry sorting needed

Optimised to only retrieve information required, saving even more bandwidth

Saves power and bandwidth

Page 8: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p8

PowerVR Series5 Implementation OpenGL ES 2.0 Shader Based GPU

System

Memory

Bus

PowerVR SGX

Pixel Co-Processor

Pixel Data Master (ISP)

Texturing Co-Processor

Vertex Data Master

(Geometry)

Tiling Co-Processor

(TA)

Texturing Co-Processor

General Purpose Data Master

Pixel Data Master (ISP)

Pixel Co-Processor

Universal Scalable Shader Engine

Thread Scheduler

Thread Scheduler

Thread Scheduler

Multi-Threaded Execution Unit

Multi-Threaded Execution Unit

Multi-Threaded Execution Unit

Coarse Grain

Scheduler

(CGS)

Host

CPU

Bus

Multi-level Cache

Control and Register Bus Host CPU Interface

System Memory Interface System Memory Bus

Page 9: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p9

Multi-Core Logic

PowerVR Series5XT Implementation More Performance & More Features

Task Distribution

Crossbar

System Level Cache

Master Bus Interfaces

Page 10: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p10

Mobile Device Graphics Performance Evolution Ever more graphics performance at an ever faster pace…

x

20x

40x

60x

80x

100x

120x

140x

160x

180x

200x

220x

240x

260x

280x

300x

Rela

tiv

e G

rap

hic

s P

erf

orm

an

ce

2009 2010 2011 2012 2013 2014

Nothing is free in 3D Graphics…

More performance costs more power !

Page 11: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p11

PowerVR Series6 “Rogue” Implementation OpenGL ES 3.0 and Focus on Power Efficiency

Series 6 ‘Rogue’ GPU

System

Memory

Bus

Vertex Data

Master

Tiling Co-Processor

Tessellation Co-Processor

Compute Data

Master

Pixel Data Master

Pixel Co-Processor

Unified Shading Cluster Array

Coarse Grain

Scheduler

Host

CPU

Bus

Core Management Unit (META)

Control and Register Bus Host CPU Interface

System Memory Interface

System Memory Bus

Texture Unit

USCn-1 USCn

Multi-level Memory Cache Unit (MCU)

Texture Unit

USC0 USC1

2D Core (PTLA)

Page 12: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p12

OpenGL ES 3.0 – Power Efficiency Focus

Single Geometry Submission fills “Multiple Render Targets”

Reduces bandwidth and geometry processing = lower power & higher efficiency

Multiple Render Targets

Deferred Shading using MRTs Cartoon Rendering using MRTs

Page 13: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p13

OpenGL ES 3.0 – Power Efficiency Focus

Instancing enables efficient drawing of many copies of the same object

Reduced CPU workload (API calls) resulting in lower power and higher efficiency

Instancing

Instancing of Buildings for City Rendering Instancing of Leaves

Page 14: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p14

OpenGL ES 3.0 – Power Efficiency Focus

Transform Feedback allows write out of vertex shader results to memory

Re-using “cached” results equates to higher efficiency by avoiding duplicate calculations

Transform Feedback

Transform feedback used to cache

morphing and skinning animation

Transform feedback used to implement

physics based animation

Page 15: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p15

ES1.1/2.0 Market Opportunity for Developers More than 1 Billion PowerVR devices !

Page 16: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p16

ES3.0 Market Opportunity for Developers Nothing much yet…

Page 17: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p17

PowerVR Market Opportunity for Developers Enable Near OpenGL ES 3.0 Feature Set on all Series5XT Devices !

Series5XT

Page 18: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p18

OpenGL ES 3.0 Extension List for Series5XT Start developing today - Full Support in SDK 3.1r2 (coming soon)

Core ES3.0 Feature Series5XT with DDK1.10

Vertex Texturing Yes (All SGX cores)

FP32 ALU Precision (HighP) Yes (All SGX cores)

EXT_draw_buffers Yes, enables MRTs

EXT_occlusion_query_boolean Yes

OES_texture_float Yes

OES_texture_half_float Yes

EXT_texture_rg Yes

EXT_texture_mixmax Yes

EXT_multisampled_render_to_texture Yes

GL_IMG_uniform_buffer_object Yes

More features to come… Yes…

Page 19: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p19 www.imgtec.com

March 2013

Golden Rules

Page 20: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p20

Common Bottlenecks Based on past observation

Most Likely

CPU Usage

Bandwidth Usage

CPU/GPU Synchronisation

Fragment Shader Instructions

Geometry Upload

Texture Upload

Vertex Shader Instructions

Geometry Complexity

Least Likely

Use our PVRTune and PVRTrace Tools

to help identify The Bottleneck

Page 21: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p21

Warning!

Some of these rules may seem obvious to you…

…we still see them broken everyday…

…if you know them, please bear with us

Page 22: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p22

Understand Your Target Device

No two devices are identical

Even when they look the same

Different SoCs will have different bottlenecks

Make sure you test against different chips

Make sure you understand the hardware

You don’t want your optimisation to make things worse

Clearly, you’re already doing this….your here

Golden Rule 1

Page 23: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p23

Don’t Waste GPU Time

The Principle of “Good Enough”

Don't waste polygons on un-needed detail

Textures should never be much larger than their size on screen

Why waste time loading a 1Kx1K texture if it’s never going to appear bigger than 128x128?

If the user won't notice it, don’t waste time processing it

Golden Rule 2

Page 24: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p24

Promote Calculations up The Chain

Don’t do a calculation you don’t need to do

If you can do it once per scene, do it once per scene

If you can’t, try and do it per vertex

There are generally fewer vertices in a scene than fragments.

If you can, pre-bake

E.g. lighting

Remember, ‘Good Enough’

Golden Rule 3

Page 25: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p25

Don’t Access an Active Render Target

Accessing a render target from the CPU is very bad for performance

If it’s not done properly it will synchronise the GPU and CPU….This is Bad™

Golden Rule 4

Page 26: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p26

Accessing Render Targets Safely

Use EGL_KHR_fence_sync

Use CPU side handles to GPU mapped memory to avoid blocking calls

E.g. GraphicsBuffer (or gralloc) on Android

Golden Rule 4 Cont.

Page 27: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p27

Avoid Updating Active Assets

Assets may need to stay the same for multiple frames

We refer to this as an asset’s ‘Lifespan’

Golden Rule 5

Changing a texture during its lifespan may cause ‘Ghosting’

Changing a buffer during its lifespan is blocking

This can be managed using circular buffers, similarly to render targets

Page 28: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p28

Use VBOs and Indexed Geometry

VBOs benefit from driver level optimisations

Vertex Array Objects (VAOs) may be even better

Index your geometry

It makes your data smaller

It also benefits from driver level optimisations

Use static VBOs ideally, and consider the assets lifespan

Don’t use a VBO for dynamic data

Golden Rule 6

Page 29: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p29

Batch Your Draw Calls

Group static objects, and draw once

Static objects are objects that are static relative to each other

Sort objects by render state

Emphasis on texture and program state changes

Try using texture atlases

Remember Golden Rule 5 if your going to update the contents

Golden Rule 7

Page 30: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p30

Compress Your Textures

The lower the bitrate the less bandwidth consumed

Use PVRTC & PVRTC2, at 2 & 4bpp RGB/RGBA

Don’t confuse this with PNG or JPG which are

decompressed in memory

Usually to 24bpp or 32bpp

PVRTC is read directly from the compressed form

It stays in memory at 2bpp or 4bpp

Use MIP-Mapping and remember ‘Good Enough’

Golden Rule 8

Page 31: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p31

Alpha Test/Discard & Alpha Blend

Alpha Test removes advantages of ‘Early-Z’ techniques and HSR

Fragment visibility isn’t known until fragment shader is run

Prefer Alpha Blending, and render in the order Opaque, Alpha Test, Alpha Blend

Makes best use of HSR

Golden Rule 9

Page 32: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p32

Use ‘Clear’ and ‘DiscardFrameBuffer’

Calling ‘Clear’ ensures the previous render isn’t uploaded to the GPU

By default, the depth/stencil buffers are written to memory at the end of a render

Calling DiscardFrameBufferExt(…) ensures these buffers aren’t written to system memory

Look for the ‘GL_EXT_discard_framebuffer’ extension

Do both if you can!

Golden Rule 10

Page 33: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p33

Questions ?

Catch a between Sessions & Ask

Or Visit Booth 512 on the GDC2013 Expo Floor

Or drop us an email: [email protected]

and visit www.powervrinsider.com for latest info

Page 34: The Architecture of High-End Mobile Graphics Hardwarecdn.imgtec.com/sdk-presentations/GDC2013_The...The Architecture of High-End Mobile Graphics Hardware ... Fragment visibility isn’t

© Imagination Technologies p34 www.imgtec.com

March 2013