Unity - Internals: memory and performance

45
Internals: Memory and Performance Codemotion Milano, 29/11/2014

description

by Marco Trivellato - In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part will cover implementation details and their memory vs cycles tradeoffs in both Unity4 and the upcoming Unity5.

Transcript of Unity - Internals: memory and performance

Page 1: Unity - Internals: memory and performance

Internals: Memory and Performance

Codemotion

Milano, 29/11/2014

Page 2: Unity - Internals: memory and performance

About me

Field Engineer @ Unity Technologies

Past:

o Worked as Software Engineer on several

games at EA, Next Level Games, Milestone

Page 3: Unity - Internals: memory and performance

Agenda

• Quick Update

• Memory Overview

• Memory vs Cycles

• Graphics

• Scripting

Page 4: Unity - Internals: memory and performance

Latest News

• New CEO

• Unity 4.6 / New UI

• Unity 5.0

• Support for Apple iOS 64 bit

• WebPlayer

Page 5: Unity - Internals: memory and performance

MEMORY OVERVIEW

Native and Managed Memory, Garbage Collection

Page 6: Unity - Internals: memory and performance

Memory Overview

• Native (internal)

– Assets data, game objects and components

– Engine internals

• Managed (Mono)

– Scripts objects (managed DLLs)

– wrappers for Unity objects

• Native Dlls

– User’s and 3rd parties Dlls

Page 7: Unity - Internals: memory and performance

Managed Memory Internals

• Allocates system heap blocks for internal allocator

• Will allocate new heap blocks when needed

• Garbage collector cleans up

• Heap blocks are kept in Mono for later use – Memory can be given back to the system after a

while

– …but it depends on the platform don’t count on it

• Fragmentation can cause new heap blocks even though memory is not exhausted

Page 8: Unity - Internals: memory and performance

Reference vs Value Types

Value types (bool, int,

float, struct, ...)

• Exist in stack memory

• De-allocated when

removed from the stack

• No Garbage

Reference types

(classes)

• Exist on the heap and

are handled by the

mono/.net GC

• De-allocated when no

longer referenced

• Lots of Garbage

Page 9: Unity - Internals: memory and performance

Garbage Collection

• Roots are not collected in a GC.Collect– Thread stacks– CPU Registers– GC Handles (used by Unity to hold onto

managed objects)– Static variables!!

• Collection time scales with managed heap size– The more you allocate, the slower it gets

Page 10: Unity - Internals: memory and performance

Temporary Allocations

• Don’t use FindObjects or LINQ

• Use StringBuilder for string concatenation

• Reuse large temporary work buffers

• ToString()

• .tag use CompareTag() instead

Page 11: Unity - Internals: memory and performance

Internal Temporary Allocations

Some Examples:

– GetComponents<T>

– Vector3[] Mesh.vertices

– Camera[] Camera.allCameras

– foreach

• does not allocate by definition

• However, there can be a small allocation, depending on

the implementation of .GetEnumerator()

5.x: We are working on new non-allocating versions

Page 12: Unity - Internals: memory and performance

Data Layout

struct Stuff

{

int a;

float b;

bool c;

string name;

};

Stuff[] arrayOfStuff;

int[] As;

float[] Bs;

bool[] Cs;

string[] names;

Page 13: Unity - Internals: memory and performance

Memory Fragmentation

• Memory fragmentation is hard to account for– Fully unload dynamically allocated content

– Switch to a blank scene before proceeding to next level

• This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory

• Ensure you clear out variables so GC.Collect will remove as much as possible

• Avoid allocations where possible

• Reuse objects where possible within a scene play

• Clear them out for map load to clean the memory

Page 14: Unity - Internals: memory and performance

Wrappers: Disposable Types

Some Objects used in scripts have large

native backing memory in unity

– Memory not freed for some time…

WWWDecompression buffer

Compressed file

Decompressed file

Managed Native

Page 15: Unity - Internals: memory and performance

Garbage Collection

• GC.Collect– Runs on the main thread when

• Mono exhausts the heap space

• Or user calls System.GC.Collect()

• Finalizers– Run on a separate thread

• Controlled by mono

• Can have several seconds delay

• Unity native memory– Dispose() cleans up internal memory

• Eventually called from finalizer

• Manually call Dispose() to cleanup

Main thread Finalizer thread

www = null;

new(someclass);

//no more heap

-> GC.Collect();

www.Dispose();

Page 16: Unity - Internals: memory and performance

Wrappers for Unity Objects

• Inherit from Object

• Types:– GameObject

– Assets: Texture2D, AudioClip, Mesh, etc…

– Components: MeshRenderer, Transform,

MonoBehaviour

• Native Memory is released when Destroy

is called

Page 17: Unity - Internals: memory and performance

Best Practices

• Reuse objects Use object pools

• Prefer stack-based allocations Use struct instead of class

• System.GC.Collect can be used to trigger collection

• Calling it 6 times returns the unused memory to the OS

• Manually call Dispose to cleanup immediately

Page 18: Unity - Internals: memory and performance

MEMORY VS CYCLES

Writable Meshes, Static & Dynamic Batching

Page 19: Unity - Internals: memory and performance

Mesh Read/Write Option

• It allows you to modify the mesh at run-time

• If enabled, a system-copy of the Mesh will remain in memory

• It is enabled by default

• In some cases, disabling this option will not reduce the memory usage

– Skinned meshes

– iOS

Page 20: Unity - Internals: memory and performance

Non-Uniform scaled Meshes

We need to correctly transform vertex normals

• Unity 4.x:

– transform the mesh on the CPU

– create an extra copy of the data

• Unity 5.0

– Scaled on GPU

– Extra memory no longer needed

Page 21: Unity - Internals: memory and performance

Static Batching

What is it ?

• It’s an optimization that reduces number of draw calls

and state changes

How do I enable it ?

• In the player settings + Tag the object as static

Page 22: Unity - Internals: memory and performance

Static Batching cont.ed

How does it work internally ?

• Build-time: Vertices are transformed to world-space

• Run-time: Index buffer is created with indices of visible objects

Unity 5.0:

• Re-implemented static batching without copying of index buffers

• Beware of misleading stats

Page 23: Unity - Internals: memory and performance

Dynamic Batching

What is it ?

• Similar to Static Batching but it batches non-static

objects at run-time

How do I enable it ?

• In the player settings

• no need to tag. it auto-magically works…

Page 24: Unity - Internals: memory and performance

Dynamic Batching cont.ed

How does it work internally ?

• objects are transformed to world space on

the CPU

• Temporary VB & IB are created

• Rendered in one draw call

Page 25: Unity - Internals: memory and performance

GRAPHICS

Render Paths, Command Buffers, Shadows

Page 26: Unity - Internals: memory and performance

Render Paths

• Vertex Lit

• Forward Rendering

• First pass for ambient + directional light

• One additional pass for each light hitting the object

• Deferred Lighting

• Two Geometry passes + Lighting

• GBuffer: Normal + Specular, Depth

Page 27: Unity - Internals: memory and performance

Deferred Shading

• New Render Path in Unity 5

• Only one Geometry pass

• On Platforms with MRTs

• Fallback is Forward Rendering

Page 28: Unity - Internals: memory and performance

Deferred Shading

Depth buffer + 4x32bit RTs:

• RT0: diffuse color (rgb), unused (a)

• RT1: spec color (rgb), roughness (a)

• RT2: normal (rgb), unused (a).

10.10.10.2 when available.

• RT3: emission/light (rgb), unused (a)

• Z: depth buffer & stencil

Page 29: Unity - Internals: memory and performance

Command Buffers

• Command buffers

hold list of

rendering

commands

• They can be set to

execute at various

points during

camera rendering

Page 30: Unity - Internals: memory and performance

Shadows

• Directional Light:

• Use CSM, up to 4 cascades

• they are rendered into screen space to a

32bit RT

• Point Light:

• Render 6 cube faces

• Spot Light:

• One shadow map per light

Page 31: Unity - Internals: memory and performance

Mesh Skinning

Different Implementations depending on platform:• x86: SSE

• iOS/Android/WP8: Neon optimizations

• D3D11/XBoxOne/GLES3.0: GPU

• XBox360, WiiU: GPU (memexport)

• PS3: SPU

• WiiU: GPU w/ stream out

Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances

Page 32: Unity - Internals: memory and performance

Best Practices

• Try different Render Paths– Performance depends on scene and platform

• Mix Realtime and Baked Lighting

• Use Level-Of-Detail Techniques

– Mesh, Texture, Shader

Page 33: Unity - Internals: memory and performance

SCRIPTING

Scripting API and JIT compilation performance, allocations

Page 34: Unity - Internals: memory and performance

GetComponent<T>

It asks the GameObject, for a component of the specified type:

• The GO contains a list of Components

• Each Component type is compared to T

• The first Component of type T (or that derives from T), will be returned to the caller

• Not too much overhead but it still needs to call into native code

Page 35: Unity - Internals: memory and performance

Property Accessors

• Most accessors will be removed in Unity 5.0

• The objective is to reduce dependencies,

therefore improve modularization

• Transform will remain

• Existing scripts will be converted. Example:

in 5.0:

Page 36: Unity - Internals: memory and performance

Transform Component

• this.transform is the same as GetComponent<Transform>()

• transform.position/rotation needs to:

– find Transform component

– Traverse hierarchy to calculate absolute position

– Apply translation/rotation

• transform internally stores the position relative to the parent

– transform.localPosition = new Vector(…) simple

assignment

– transform.position = new Vector(…) costs the same if

no father, otherwise it will need to traverse the hierarchy

up to transform the abs position into local

• finally, other components (collider, rigid body, light, camera,

etc..) will be notified via messages

Page 37: Unity - Internals: memory and performance

WWW class properties

WWW.texture: Allocates a new Texture2D

…another example is WWW.audioClip

Page 38: Unity - Internals: memory and performance

Object.Instantiate

API:

• Object Instantiate(Object, Vector3, Quaternion);

• Object Instantiate(Object);

Implementation:

• Clone GameObject Hierarchy and Components

• Copy Properties

• Awake

• Apply new Transform (if provided)

Page 39: Unity - Internals: memory and performance

Object.Instantiate cont.ed

• Awake can be expensive

• AwakeFromLoad (main thread)– clear states

– internal state caching

– pre-compute

Unity 5.0:

• Allocations have been reduced

• Some inner loops for copying the data have been optimized

Page 40: Unity - Internals: memory and performance

JIT Compilation

What is it ?• The process in which machine code is generated from

CIL code during the application's run-time

Pros:

• It generates optimized code for the current platform

Cons:

• Each time a method is called for the first time, the application will suffer a certain performance penalty because of the compilation

Page 41: Unity - Internals: memory and performance

JIT compilation spikes

What about pre-JITting ?

• RuntimeHelpers.PrepareMethod does not work:

…better to use MethodHandle.GetFunctionPointer()

Page 42: Unity - Internals: memory and performance

CONCLUSIONS

Page 43: Unity - Internals: memory and performance

Best Practices

• Don’t make assumptions

• Platform X != Platform Y

• Profile on target device

• Editor != Player

• Managed Memory is not returned to Native Land!

• For best results…: Profile early and regularly

Page 44: Unity - Internals: memory and performance

Want to know more ?

• Unite: http://unity3d.com/unite/archive

• Blog: http://blog.unity3d.com

• Forum: http://forum.unity3d.com

• Support: [email protected]

Page 45: Unity - Internals: memory and performance

That’s it!

Questions?

@m_trive | [email protected]