In The Trenches Optimizing UE4 for Intel

24
Jeff Rous – Intel Niklas Smedberg – Epic Games In The Trenches: Optimizing UE4 For Intel

Transcript of In The Trenches Optimizing UE4 for Intel

Page 1: In The Trenches Optimizing UE4 for Intel

Jeff Rous – Intel

Niklas Smedberg – Epic Games

In The Trenches: Optimizing UE4 For Intel

Page 2: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others.

Legal

Copyright © 2016 Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.

Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.

Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance

Iris™ graphics is available on select systems. Consult your system manufacturer.

Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries.

Page 3: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others.

Agenda

Rationale

How We Measured

Common Pain Points

Shader Optimizations

Optimizing for DX12

Android x86/x64 Support

Page 4: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 4

Why Work Together?

Benefits all games that use the engine

UE4 runs on more hardware

Intel is 18% GPU share as of last Steam survey

Optimizations help everyone – high end to phone

Common goals

Leading edge APIs like DX12 are going to power tomorrow’s games

Android is a large market and key for Epic and Intel

Page 5: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 5

Intel® HD Graphics: Roadmap

Sandy Bridge

Intel® 2nd Gen Core™ Processor

• 32nm

• Feature Level 10.1

• Up to 12 EUs

2011

Ivy Bridge

Intel® 3nd Gen Core™ Processor

• 22nm

• Feature Level 11.0

• Up to 16EUs

2012

Haswell

Intel® 4nd Gen Core™ Processor

• Feature Level 11.1• DX Extensions

• GT3 (40 EUs)• EDRAM• Iris Pro™, Iris™

brands

2013

Broadwell

Intel® 5nd Gen Core™ Processor

• 14nm

• Feature Level 11.2

• Up to 48 EUs

2014

Skylake

Intel® 6th Gen Core™ Processor

• Feature Level 12.0

• GT4 (72 EUs)• GT3e 15/28W• DX12 HW

2015-16

Up to 30X faster graphics over last 5 years

Page 6: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 6

Intel® HD Graphics: EDRAM

Basic facts

Located on the same package with CPU 64-128MB Bandwidth – 50 GB/Sec each way

(100BGB/sec total BW) Acts as 4th level $ Just works: no API required to use and take

advantage

Bandwidth Saving

Increasing compute requires more bandwidth

EDRAM helps to reduce BW consumption and improve EU efficiency

Just works but efficiency can be improved by re-using frame data

CPU Package

Intel 6rd Gen Core™ chip

CPU Core

CPU Core

CPU Core

Ring-bus

CPUCore

LL

$SystemMemory

Gfx Core

EDRAM

Page 7: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 7

How We Measured – Intel GPA

Use ToggleDrawEvents command

Frame debugging and live mode

Experiment!

Page 8: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 8

How We Measured

ProfileGPU command

Stat commands

Windows Performance Analyzer

Intel Extreme Tuning Utility

Page 9: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 9

Intel Pain Points – Memory Bandwidth

Memory bandwidth is at a premium with integrated graphics

Gbuffers are memory hungry. UE4 is configurable where you can change the format, eliminate, or even combine channels. Scaling the resolution of gbuffersis good to a point.

Page 10: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 10

Intel Pain Points – Dense Geometry

Sub pixel or very dense mesh vertex shader execution can’t be covered by pixel shader execution leading to hardware starving. Use LOD where possible.

Clipper can get bottlenecked in the worst cases. Use frustum culling on bounding boxes at the very least. Occlusion culling for hidden objects.

Page 11: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 11

A Word About Power

Intel graphics typically in low power systems.

Less CPU usage means more graphics.

Page 12: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 12

Shaders – Local Memory

64 byte cache lines benefit from loop unrolling a great deal.

Avoid small loads in tight loops

Page 13: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 13

Shaders – Branching and Sampling

Using lots of temporaries can starve the hardware.

Branching is expensive if the loads are inside the conditional blocks.

Group the loads as early in the shader as possible to help cover latency.

Page 14: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 14

Demo - DX12 In Engine Metrics

Page 15: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 15

DX12 Performance – Fast Clear

Specify optional D3D12_CLEAR_COLOR when calling CreateCommittedResource

Intel hardware has fast clear path for 1 bit per pixel clear values eg. (1,0,1,0)

When clearing, use the up front specified color for maximum performance.

~9% performance gain on Elemental Demo on DX12!

In the engine today

Page 16: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 16

DX12 Performance – Root Signature

Blueprint of resources available

Root constants

Root descriptors

Descriptor tables

Constants that sit directly in root are copied to each invocation of the shader (pushed) rather than read from memory when used (pulled)

Can significantly speed up shader execution

Automatically handled by driver in DX11

Page 17: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 17

Video - GPA Live Metrics on Android

Page 18: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 18

Android x86/x64 Support

Native apps reduce CPU load, startup times and power consumption

Supported in UE4 today through editor menu

Requires source build

Package as fat or separated APKs

Popular toolchains support x86/x64

Intel INDE

Google NDK

Nvidia CodeWorks

Page 19: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 19

OpenGL ES 3.1 + Android Extension Pack

Supported on latest Intel tablets (Acer Predator 8, Lenovo Yoga Tab 3)

Enabled in UE4 for highest end mobile visuals

Runs with deferred renderer

ASTC textures

PC features are now on mobile

Compute shaders

Indirect drawing

Page 20: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 20

And Announcing…Fast ASTC compression

Next gen format (OpenGL ES, Vulkan)

Very good compression on RGB/RGBA for variety of block sizes

UE4 is adding support for Intel’s fast texture compressor for ASTC

44x speed improvement

Quality comparable to ARM compressor

UE4 uses Intel’s BC6H/BC7 compressors already

Aiming for 4.12 release

Page 21: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 21

ASTC Quality Comparison

Zoomed in portion of a 2048x2048 normal map

Original: 12 MB ETC1: 2 MB ASTC 6x6: 1.8 MB

Page 22: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 22

Wrap up

Test on Intel hardware early. UE4 is powerful but you can easily bring down a high end discrete card. With proper optimization UE4 runs really well on Intel hardware.

Take advantage of scaling features in UE4 – Epic has done a lot of work to support lower end hardware.

UE4 is mobile ready – Take advantage of built-in Android x86/x64 and OpenGL ES 3.1 support in your games for better performance and visuals.

Page 23: In The Trenches Optimizing UE4 for Intel
Page 24: In The Trenches Optimizing UE4 for Intel

Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be

claimed as the property of others. 24

Links

Intel Developer Zone (software.intel.com)

Unreal Engine 4 (unrealengine.com)

Intel GPA (software.intel.com/en-us/gpa)

ISPC Texture Compressor sample (software.intel.com/en-us/articles/fast-ispc-texture-compressor-update)

Using Android x86 on UE4 (software.intel.com/en-us/articles/Unreal-Engine-4-with-x86-Support)

Intel Software Occlusion Culling sample (software.intel.com/en-us/articles/software-occlusion-culling)