The Architecture of High-End Mobile Graphics...
Transcript of The Architecture of High-End Mobile Graphics...
© Imagination Technologies p1 www.imgtec.com
March 2013
The Architecture of High-End Mobile Graphics Hardware
© Imagination Technologies p2
<1 Minute Introduction to Imagination Technologies
IP Licensing Company – anything you need to build an SoC
Focus on PowerVR Graphics and Caustic Ray Tracing at GDC 2013
Customer’s SoC
PowerVR Video
PowerVR Display
Customer & 3rd Party IP
PowerVR Graphics
IMGworks customised IP
Ensigma communications
Audio
Pixels
HelloSoft VoLTE & V.VoIP
PowerVR Vision
MIPS processors
Meta Audio Caustic
Ray Tracing Solutions
Flow Cloud
connectivity
TV &
Radio
Camera
© Imagination Technologies p3
Immediate Mode Renderer (IMR)
Buffers kept in system memory
High bandwidth use, power consumption & latency
Each triangle is processed to completion in submission order
Wastes processing time and thus power due to “overdraw”
‘Early-Z’ techniques help but are only as good as your geometry sorting
© Imagination Technologies p4
Concept: Tiling
Frame buffer sub-divided into Tiles
32x32 pixels per tile, for example
Varies by device
Geometry is sorted into affected tiles
Allows each tile to be processed independently
Small number of fragments per tile
Allows on-chip memory to be used
© Imagination Technologies p5
Tile Based Renderer (TBR)
Rasterizing performed per-tile
Allows the use of fast, on-chip, buffers
Each triangle is processed to completion in submission order
Wastes processing time and thus power due to “overdraw”
‘Early-Z’ techniques help but are only as good as your geometry sorting
© Imagination Technologies p6
Concept: Deferred Rendering
Fragments - Two stage process
Hidden Surface Removal (HSR)
Shading
HSR is pixel perfect
Only visible fragments pass, no ‘overdraw’
Only requires position data
Less bandwidth & processing, saves power
HSR is submission order independent
No need for applications to submit geometry front to back
© Imagination Technologies p7
Tile Based Deferred Renderer (TBDR) = PowerVR
Rasterizing performed per-tile
Allows the use of fast, on-chip, buffers
Hidden Surface Removal (HSR) reduces overdraw
Pixel perfect, and submission order independent, no geometry sorting needed
Optimised to only retrieve information required, saving even more bandwidth
Saves power and bandwidth
© Imagination Technologies p8
PowerVR Series5 Implementation OpenGL ES 2.0 Shader Based GPU
System
Memory
Bus
PowerVR SGX
Pixel Co-Processor
Pixel Data Master (ISP)
Texturing Co-Processor
Vertex Data Master
(Geometry)
Tiling Co-Processor
(TA)
Texturing Co-Processor
General Purpose Data Master
Pixel Data Master (ISP)
Pixel Co-Processor
Universal Scalable Shader Engine
Thread Scheduler
Thread Scheduler
Thread Scheduler
Multi-Threaded Execution Unit
Multi-Threaded Execution Unit
Multi-Threaded Execution Unit
Coarse Grain
Scheduler
(CGS)
Host
CPU
Bus
Multi-level Cache
Control and Register Bus Host CPU Interface
System Memory Interface System Memory Bus
© Imagination Technologies p9
Multi-Core Logic
PowerVR Series5XT Implementation More Performance & More Features
Task Distribution
Crossbar
System Level Cache
Master Bus Interfaces
© Imagination Technologies p10
Mobile Device Graphics Performance Evolution Ever more graphics performance at an ever faster pace…
x
20x
40x
60x
80x
100x
120x
140x
160x
180x
200x
220x
240x
260x
280x
300x
Rela
tiv
e G
rap
hic
s P
erf
orm
an
ce
2009 2010 2011 2012 2013 2014
Nothing is free in 3D Graphics…
More performance costs more power !
© Imagination Technologies p11
PowerVR Series6 “Rogue” Implementation OpenGL ES 3.0 and Focus on Power Efficiency
Series 6 ‘Rogue’ GPU
System
Memory
Bus
Vertex Data
Master
Tiling Co-Processor
Tessellation Co-Processor
Compute Data
Master
Pixel Data Master
Pixel Co-Processor
Unified Shading Cluster Array
Coarse Grain
Scheduler
Host
CPU
Bus
Core Management Unit (META)
Control and Register Bus Host CPU Interface
System Memory Interface
System Memory Bus
Texture Unit
USCn-1 USCn
Multi-level Memory Cache Unit (MCU)
Texture Unit
USC0 USC1
2D Core (PTLA)
© Imagination Technologies p12
OpenGL ES 3.0 – Power Efficiency Focus
Single Geometry Submission fills “Multiple Render Targets”
Reduces bandwidth and geometry processing = lower power & higher efficiency
Multiple Render Targets
Deferred Shading using MRTs Cartoon Rendering using MRTs
© Imagination Technologies p13
OpenGL ES 3.0 – Power Efficiency Focus
Instancing enables efficient drawing of many copies of the same object
Reduced CPU workload (API calls) resulting in lower power and higher efficiency
Instancing
Instancing of Buildings for City Rendering Instancing of Leaves
© Imagination Technologies p14
OpenGL ES 3.0 – Power Efficiency Focus
Transform Feedback allows write out of vertex shader results to memory
Re-using “cached” results equates to higher efficiency by avoiding duplicate calculations
Transform Feedback
Transform feedback used to cache
morphing and skinning animation
Transform feedback used to implement
physics based animation
© Imagination Technologies p15
ES1.1/2.0 Market Opportunity for Developers More than 1 Billion PowerVR devices !
© Imagination Technologies p16
ES3.0 Market Opportunity for Developers Nothing much yet…
© Imagination Technologies p17
PowerVR Market Opportunity for Developers Enable Near OpenGL ES 3.0 Feature Set on all Series5XT Devices !
…
Series5XT
© Imagination Technologies p18
OpenGL ES 3.0 Extension List for Series5XT Start developing today - Full Support in SDK 3.1r2 (coming soon)
Core ES3.0 Feature Series5XT with DDK1.10
Vertex Texturing Yes (All SGX cores)
FP32 ALU Precision (HighP) Yes (All SGX cores)
EXT_draw_buffers Yes, enables MRTs
EXT_occlusion_query_boolean Yes
OES_texture_float Yes
OES_texture_half_float Yes
EXT_texture_rg Yes
EXT_texture_mixmax Yes
EXT_multisampled_render_to_texture Yes
GL_IMG_uniform_buffer_object Yes
More features to come… Yes…
© Imagination Technologies p19 www.imgtec.com
March 2013
Golden Rules
© Imagination Technologies p20
Common Bottlenecks Based on past observation
Most Likely
CPU Usage
Bandwidth Usage
CPU/GPU Synchronisation
Fragment Shader Instructions
Geometry Upload
Texture Upload
Vertex Shader Instructions
Geometry Complexity
Least Likely
Use our PVRTune and PVRTrace Tools
to help identify The Bottleneck
© Imagination Technologies p21
Warning!
Some of these rules may seem obvious to you…
…we still see them broken everyday…
…if you know them, please bear with us
© Imagination Technologies p22
Understand Your Target Device
No two devices are identical
Even when they look the same
Different SoCs will have different bottlenecks
Make sure you test against different chips
Make sure you understand the hardware
You don’t want your optimisation to make things worse
Clearly, you’re already doing this….your here
Golden Rule 1
© Imagination Technologies p23
Don’t Waste GPU Time
The Principle of “Good Enough”
Don't waste polygons on un-needed detail
Textures should never be much larger than their size on screen
Why waste time loading a 1Kx1K texture if it’s never going to appear bigger than 128x128?
If the user won't notice it, don’t waste time processing it
Golden Rule 2
© Imagination Technologies p24
Promote Calculations up The Chain
Don’t do a calculation you don’t need to do
If you can do it once per scene, do it once per scene
If you can’t, try and do it per vertex
There are generally fewer vertices in a scene than fragments.
If you can, pre-bake
E.g. lighting
Remember, ‘Good Enough’
Golden Rule 3
© Imagination Technologies p25
Don’t Access an Active Render Target
Accessing a render target from the CPU is very bad for performance
If it’s not done properly it will synchronise the GPU and CPU….This is Bad™
Golden Rule 4
© Imagination Technologies p26
Accessing Render Targets Safely
Use EGL_KHR_fence_sync
Use CPU side handles to GPU mapped memory to avoid blocking calls
E.g. GraphicsBuffer (or gralloc) on Android
Golden Rule 4 Cont.
© Imagination Technologies p27
Avoid Updating Active Assets
Assets may need to stay the same for multiple frames
We refer to this as an asset’s ‘Lifespan’
Golden Rule 5
Changing a texture during its lifespan may cause ‘Ghosting’
Changing a buffer during its lifespan is blocking
This can be managed using circular buffers, similarly to render targets
© Imagination Technologies p28
Use VBOs and Indexed Geometry
VBOs benefit from driver level optimisations
Vertex Array Objects (VAOs) may be even better
Index your geometry
It makes your data smaller
It also benefits from driver level optimisations
Use static VBOs ideally, and consider the assets lifespan
Don’t use a VBO for dynamic data
Golden Rule 6
© Imagination Technologies p29
Batch Your Draw Calls
Group static objects, and draw once
Static objects are objects that are static relative to each other
Sort objects by render state
Emphasis on texture and program state changes
Try using texture atlases
Remember Golden Rule 5 if your going to update the contents
Golden Rule 7
© Imagination Technologies p30
Compress Your Textures
The lower the bitrate the less bandwidth consumed
Use PVRTC & PVRTC2, at 2 & 4bpp RGB/RGBA
Don’t confuse this with PNG or JPG which are
decompressed in memory
Usually to 24bpp or 32bpp
PVRTC is read directly from the compressed form
It stays in memory at 2bpp or 4bpp
Use MIP-Mapping and remember ‘Good Enough’
Golden Rule 8
© Imagination Technologies p31
Alpha Test/Discard & Alpha Blend
Alpha Test removes advantages of ‘Early-Z’ techniques and HSR
Fragment visibility isn’t known until fragment shader is run
Prefer Alpha Blending, and render in the order Opaque, Alpha Test, Alpha Blend
Makes best use of HSR
Golden Rule 9
© Imagination Technologies p32
Use ‘Clear’ and ‘DiscardFrameBuffer’
Calling ‘Clear’ ensures the previous render isn’t uploaded to the GPU
By default, the depth/stencil buffers are written to memory at the end of a render
Calling DiscardFrameBufferExt(…) ensures these buffers aren’t written to system memory
Look for the ‘GL_EXT_discard_framebuffer’ extension
Do both if you can!
Golden Rule 10
© Imagination Technologies p33
Questions ?
Catch a between Sessions & Ask
Or Visit Booth 512 on the GDC2013 Expo Floor
Or drop us an email: [email protected]
and visit www.powervrinsider.com for latest info
© Imagination Technologies p34 www.imgtec.com
March 2013