Download - Unity Internals: Memory and Performance

Transcript
Page 1: Unity Internals: Memory and Performance

Unity Internals: Memory and Performance Moscow, 16/05/2014 Marco Trivellato – Field Engineer

Page 2: Unity Internals: Memory and Performance

Page 6/9/14 2

This Talk Goals and Benefits

Page 3: Unity Internals: Memory and Performance

Page

Who Am I ? •  Now Field Engineer @ Unity

•  Previously, Software Engineer •  Mainly worked on game engines •  Shipped several video games:

•  Captain America: Super Soldier •  FIFA ‘07 – FIFA ’10 •  Fight Night: Round 3

6/9/14 3

Page 4: Unity Internals: Memory and Performance

Page

Topics •  Memory Overview •  Garbage Collection •  Mesh Internals •  Scripting •  Job System •  How to use the Profiler

6/9/14 4

Page 5: Unity Internals: Memory and Performance

Page 6/9/14 5

Memory Overview

Page 6: Unity Internals: Memory and Performance

Page

Memory Domains •  Native (internal)

•  Asset Data: Textures, AudioClips, Meshes •  Game Objects & Components: Transform, etc.. •  Engine Internals: Managers, Rendering, Physics, etc..

•  Managed - Mono •  Script objects (Managed dlls) •  Wrappers for Unity objects: Game objects, assets,

components •  Native Dlls

•  User’s dlls and external dlls (for example: DirectX)

6/9/14 6

Page 7: Unity Internals: Memory and Performance

Page

Native Memory: Internal Allocators •  Default •  GameObject •  Gfx •  Profiler 5.x: We are considering to expose an API for using a native allocator in Dlls

6/9/14 7

Page 8: Unity Internals: Memory and Performance

Page

Managed Memory •  Value types (bool, int, float, struct, ...)

•  Exist in stack memory. De-allocated when removed from the stack. No Garbage.

•  Reference types (classes) •  Exist on the heap and are handled by the mono/.net

GC. Removed when no longer being referenced. •  Wrappers for Unity Objects :

•  GameObject •  Assets : Texture2D, AudioClip, Mesh, … •  Components : MeshRenderer, Transform, MonoBehaviour

6/9/14 8

Page 9: Unity Internals: Memory and Performance

Page

Mono Memory Internals •  Allocates system heap blocks for internal allocator •  Will allocate new heap blocks when needed •  Heap blocks are kept in Mono for later use

•  Memory can be given back to the system after a while •  …but it depends on the platform è don’t count on it

•  Garbage collector cleans up •  Fragmentation can cause new heap blocks even

though memory is not exhausted

6/9/14 9

Page 10: Unity Internals: Memory and Performance

Page 6/9/14 10

Garbage Collection

Page 11: Unity Internals: Memory and Performance

Page

Unity Object wrapper

•  Some Objects used in scripts have large native backing memory in unity •  Memory not freed until Finalizers have run

6/9/14 11

WWW Decompression buffer

Compressed file

Decompressed file

Managed Native

Page 12: Unity Internals: Memory and Performance

Page

Mono Garbage Collection •  GC.Collect

•  Runs on the main thread when •  Mono exhausts the heap space •  Or user calls System.GC.Collect()

•  Finalizers •  Run on a separate thread

•  Controlled by mono •  Can have several seconds delay

•  Unity native memory •  Dispose() cleans up internal

memory •  Eventually called from finalizer •  Manually call Dispose() to cleanup

6/9/14 12

Main thread Finalizer thread

www = null; new(someclass); //no more heap -> GC.Collect();

www.Dispose();

.....

Page 13: Unity Internals: Memory and Performance

Page

Garbage Collection •  Roots are not collected in a GC.Collect

•  Thread stacks •  CPU Registers •  GC Handles (used by Unity to hold onto managed

objects) •  Static variables!!

•  Collection time scales with managed heap size •  The more you allocate, the slower it gets

6/9/14 13

Page 14: Unity Internals: Memory and Performance

Page

GC: does lata layout matter ? struct Stuff {

int a; float b; bool c; string leString;

} Stuff[] arrayOfStuff; << Everything is scanned. GC takes more time VS int[] As; float[] Bs; bool[] Cs; string[] leStrings; << Only this is scanned. GC takes less time.

6/9/14 14

Page 15: Unity Internals: Memory and Performance

Page

GC: Best Practices •  Reuse objects è Use object pools •  Prefer stack-based allocations è Use struct

instead of class •  System.GC.Collect can be used to trigger

collection •  Calling it 6 times returns the unused memory to

the OS •  Manually call Dispose to cleanup immediately

6/9/14 15

Page 16: Unity Internals: Memory and Performance

Page

Avoid temp allocations •  Don’t use FindObjects or LINQ •  Use StringBuilder for string concatenation •  Reuse large temporary work buffers •  ToString() •  .tag è use CompareTag() instead

6/9/14 16

Page 17: Unity Internals: Memory and Performance

Page

Unity API Temporary Allocations Some Examples: •  GetComponents<T> •  Vector3[] Mesh.vertices •  Camera[] Camera.allCameras •  foreach

•  does not allocate by definition •  However, there can be a small allocation, depending on the

implementation of .GetEnumerator()

5.x: We are working on new non-allocating versions

6/9/14 17

Page 18: Unity Internals: Memory and Performance

Page

Memory fragmentation •  Memory fragmentation is hard to account for

•  Fully unload dynamically allocated content •  Switch to a blank scene before proceeding to next level

•  This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory

•  Ensure you clear out variables so GC.Collect will remove as much as possible

•  Avoid allocations where possible •  Reuse objects where possible within a scene play •  Clear them out for map load to clean the memory

6/9/14 18

Page 19: Unity Internals: Memory and Performance

Page

Unloading Unused Assets •  Resources.UnloadUnusedAssets will trigger asset

garbage collection •  It looks for all unreferenced assets and unloads them •  It’s an async operation •  It’s called internally after loading a level

•  Resources.UnloadAsset is preferable •  you need to know exactly what you need to Unload •  Unity does not have to scan everything

•  Unity 5.0: Multi-threaded asset garbage collection

6/9/14 19

Page 20: Unity Internals: Memory and Performance

Page 6/9/14 20

Mesh Internals Memory vs. Cycles

Page 21: Unity Internals: Memory and Performance

Page

Mesh Read/Write Option •  It allows you to modify the mesh at run-time •  If enabled, a system-copy of the Mesh will remain in

memory •  It is enabled by default •  In some cases, disabling this option will not reduce the

memory usage •  Skinned meshes •  iOS

Unity 5.0: disable by default – under consideration

6/9/14 21

Page 22: Unity Internals: Memory and Performance

Page

Non-Uniform scaled Meshes We need to correctly transform vertex normals •  Unity 4.x:

•  transform the mesh on the CPU •  create an extra copy of the data

•  Unity 5.0 •  Scaled on GPU •  Extra memory no longer needed

6/9/14 22

Page 23: Unity Internals: Memory and Performance

Page

Static Batching What is it ? •  It’s an optimization that reduces number of draw

calls and state changes How do I enable it ? •  In the player settings + Tag the object as static

6/9/14 23

Page 24: Unity Internals: Memory and Performance

Page

Static Batching How does it work internally ? •  Build-time: Vertices are transformed to world-

space •  Run-time: Index buffer is created with indices of

visible objects

Unity 5.0: •  Re-implemented static batching without copying of

index buffers

6/9/14 24

Page 25: Unity Internals: Memory and Performance

Page

Dynamic Batching What is it ? •  Similar to Static Batching but it batches non-static

objects at run-time How do I enable it ? •  In the player settings •  no need to tag. it auto-magically works…

6/9/14 25

Page 26: Unity Internals: Memory and Performance

Page

Dynamic Batching How does it work internally ? •  objects are transformed to world space on the

CPU •  Temporary VB & IB are created •  Rendered in one draw call

Unity 5.x: we are considering to expose per-platform parameters

6/9/14 26

Page 27: Unity Internals: Memory and Performance

Page

Mesh Skinning Different Implementations depending on platform: •  x86: SSE •  iOS/Android/WP8: Neon optimizations •  D3D11/XBoxOne/GLES3.0: GPU •  XBox360, WiiU: GPU (memexport) •  PS3: SPU •  WiiU: GPU w/ stream out

Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances

6/9/14 27

Page 28: Unity Internals: Memory and Performance

Page 6/9/14 28

Scripting

Page 29: Unity Internals: Memory and Performance

Page

Unity 5.0: Mono •  No upgrade •  Mainly bug fixes •  New tech in WebGL: IL2CPP

•  http://blogs.unity3d.com/2014/04/29/on-the-future-of-web-publishing-in-unity/

•  Stay tuned: there will be a blog post about it

6/9/14 29

Page 30: Unity Internals: Memory and Performance

Page

GetComponent<T> It asks the GameObject, for a component of the specified type: •  The GO contains a list of Components •  Each Component type is compared to T •  The first Component of type T (or that derives from

T), will be returned to the caller •  Not too much overhead but it still needs to call into

native code

6/9/14 30

Page 31: Unity Internals: Memory and Performance

Page

Unity 5.0: Property Accessors •  Most accessors will be removed in Unity 5.0 •  The objective is to reduce dependencies,

therefore improve modularization •  Transform will remain •  Existing scripts will be converted. Example:

in 5.0:

6/9/14 31

Page 32: Unity Internals: Memory and Performance

Page

Transform Component •  this.transform is the same as GetComponent<Transform>() •  transform.position/rotation needs to:

•  find Transform component •  Traverse hierarchy to calculate absolute position •  Apply translation/rotation

•  transform internally stores the position relative to the parent •  transform.localPosition = new Vector(…) è simple

assignment •  transform.position = new Vector(…) è costs the same if

no father, otherwise it will need to traverse the hierarchy up to transform the abs position into local

•  finally, other components (collider, rigid body, light, camera, etc..) will be notified via messages

6/9/14 32

Page 33: Unity Internals: Memory and Performance

Page

Instantiate API: •  Object Instantiate(Object, Vector3, Quaternion); •  Object Instantiate(Object);

Implementation: •  Clone GameObject Hierarchy and Components •  Copy Properties •  Awake •  Apply new Transform (if provided)

6/9/14 33

Page 34: Unity Internals: Memory and Performance

Page

Instantiate cont..ed •  Awake can be expensive •  AwakeFromLoad (main thread)

•  clear states •  internal state caching •  pre-compute

Unity 5.0: •  Allocations have been reduced •  Some inner loops for copying the data have been

optimized

6/9/14 34

Page 35: Unity Internals: Memory and Performance

Page

JIT Compilation What is it ? •  The process in which machine code is generated from CIL

code during the application's run-time

Pros: •  It generates optimized code for the current platform

Cons: •  Each time a method is called for the first time, the

application will suffer a certain performance penalty because of the compilation

6/9/14 35

Page 36: Unity Internals: Memory and Performance

Page

JIT compilation spikes What about pre-JITting ? •  RuntimeHelpers.PrepareMethod does not work:

…better to use MethodHandle.GetFunctionPointer()

6/9/14 36

Page 37: Unity Internals: Memory and Performance

Page 6/9/14 37

Job System

Page 38: Unity Internals: Memory and Performance

Page

Unity 5.0: Job System (internal) The goals of the job system:

•  make it easy to write very efficient job based multithreaded code

•  The jobs should be able to run safely in parallel to script code

6/9/14 38

Page 39: Unity Internals: Memory and Performance

Page

Job System: Why ? Modern architectures are multi-core: •  XBox 360: 3 cores •  PS4/Xbox One: 8 cores

…which includes mobile devices: •  iPhone 4S: 2 cores •  Galaxy S3: 4 cores

6/9/14 39

Page 40: Unity Internals: Memory and Performance

Page

Job System: What is it ? •  It’s a Framework that we are going to use in

existing and new sub-systems •  We want to have Animation, NavMesh, Occlusion,

Rendering, etc… run as much as possible in parallel

•  This will ultimately lead to better performance

6/9/14 40

Page 41: Unity Internals: Memory and Performance

Page

Unity 5.0: Profiler Timeline View It’s a tool that allows you to analyse internal (native) threads execution of a specific frame

6/9/14 41

Page 42: Unity Internals: Memory and Performance

Page

Unity 5.0: Frame Debugger

6/9/14 42

Page 43: Unity Internals: Memory and Performance

Page 6/9/14 43

Conclusions

Page 44: Unity Internals: Memory and Performance

Page

Budgeting Memory How much memory is available ? •  It depends… •  For example, on 512mb devices running iOS 6.0:

~250mb. A bit less with iOS 7.0

What’s the baseline ? •  Create an empty scene and measure memory •  Don’t forget that the profiler requires some

memory •  For example: on Android 15.5mb (+ 12mb profiler)

6/9/14 44

Page 45: Unity Internals: Memory and Performance

Page

Profiling •  Don’t make assumptions •  Profile on target device •  Editor != Player •  Platform X != Platform Y •  Managed Memory is not returned to Native Land!

For best results…: •  Profile early and regularly

6/9/14 45

Page 46: Unity Internals: Memory and Performance

Page 6/9/14 46

Questions ? [email protected] - Twitter: @m_trive