USENIX 2001, Boston, Ma. Solaris Internals Solaris Internals
Unity - Internals: memory and performance
-
Upload
codemotion -
Category
Technology
-
view
914 -
download
5
description
Transcript of Unity - Internals: memory and performance
Internals: Memory and Performance
Codemotion
Milano, 29/11/2014
About me
Field Engineer @ Unity Technologies
Past:
o Worked as Software Engineer on several
games at EA, Next Level Games, Milestone
Agenda
• Quick Update
• Memory Overview
• Memory vs Cycles
• Graphics
• Scripting
Latest News
• New CEO
• Unity 4.6 / New UI
• Unity 5.0
• Support for Apple iOS 64 bit
• WebPlayer
MEMORY OVERVIEW
Native and Managed Memory, Garbage Collection
Memory Overview
• Native (internal)
– Assets data, game objects and components
– Engine internals
• Managed (Mono)
– Scripts objects (managed DLLs)
– wrappers for Unity objects
• Native Dlls
– User’s and 3rd parties Dlls
Managed Memory Internals
• Allocates system heap blocks for internal allocator
• Will allocate new heap blocks when needed
• Garbage collector cleans up
• Heap blocks are kept in Mono for later use – Memory can be given back to the system after a
while
– …but it depends on the platform don’t count on it
• Fragmentation can cause new heap blocks even though memory is not exhausted
Reference vs Value Types
Value types (bool, int,
float, struct, ...)
• Exist in stack memory
• De-allocated when
removed from the stack
• No Garbage
Reference types
(classes)
• Exist on the heap and
are handled by the
mono/.net GC
• De-allocated when no
longer referenced
• Lots of Garbage
Garbage Collection
• Roots are not collected in a GC.Collect– Thread stacks– CPU Registers– GC Handles (used by Unity to hold onto
managed objects)– Static variables!!
• Collection time scales with managed heap size– The more you allocate, the slower it gets
Temporary Allocations
• Don’t use FindObjects or LINQ
• Use StringBuilder for string concatenation
• Reuse large temporary work buffers
• ToString()
• .tag use CompareTag() instead
Internal Temporary Allocations
Some Examples:
– GetComponents<T>
– Vector3[] Mesh.vertices
– Camera[] Camera.allCameras
– foreach
• does not allocate by definition
• However, there can be a small allocation, depending on
the implementation of .GetEnumerator()
5.x: We are working on new non-allocating versions
Data Layout
struct Stuff
{
int a;
float b;
bool c;
string name;
};
Stuff[] arrayOfStuff;
int[] As;
float[] Bs;
bool[] Cs;
string[] names;
Memory Fragmentation
• Memory fragmentation is hard to account for– Fully unload dynamically allocated content
– Switch to a blank scene before proceeding to next level
• This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory
• Ensure you clear out variables so GC.Collect will remove as much as possible
• Avoid allocations where possible
• Reuse objects where possible within a scene play
• Clear them out for map load to clean the memory
Wrappers: Disposable Types
Some Objects used in scripts have large
native backing memory in unity
– Memory not freed for some time…
WWWDecompression buffer
Compressed file
Decompressed file
Managed Native
Garbage Collection
• GC.Collect– Runs on the main thread when
• Mono exhausts the heap space
• Or user calls System.GC.Collect()
• Finalizers– Run on a separate thread
• Controlled by mono
• Can have several seconds delay
• Unity native memory– Dispose() cleans up internal memory
• Eventually called from finalizer
• Manually call Dispose() to cleanup
Main thread Finalizer thread
www = null;
new(someclass);
//no more heap
-> GC.Collect();
www.Dispose();
Wrappers for Unity Objects
• Inherit from Object
• Types:– GameObject
– Assets: Texture2D, AudioClip, Mesh, etc…
– Components: MeshRenderer, Transform,
MonoBehaviour
• Native Memory is released when Destroy
is called
Best Practices
• Reuse objects Use object pools
• Prefer stack-based allocations Use struct instead of class
• System.GC.Collect can be used to trigger collection
• Calling it 6 times returns the unused memory to the OS
• Manually call Dispose to cleanup immediately
MEMORY VS CYCLES
Writable Meshes, Static & Dynamic Batching
Mesh Read/Write Option
• It allows you to modify the mesh at run-time
• If enabled, a system-copy of the Mesh will remain in memory
• It is enabled by default
• In some cases, disabling this option will not reduce the memory usage
– Skinned meshes
– iOS
Non-Uniform scaled Meshes
We need to correctly transform vertex normals
• Unity 4.x:
– transform the mesh on the CPU
– create an extra copy of the data
• Unity 5.0
– Scaled on GPU
– Extra memory no longer needed
Static Batching
What is it ?
• It’s an optimization that reduces number of draw calls
and state changes
How do I enable it ?
• In the player settings + Tag the object as static
Static Batching cont.ed
How does it work internally ?
• Build-time: Vertices are transformed to world-space
• Run-time: Index buffer is created with indices of visible objects
Unity 5.0:
• Re-implemented static batching without copying of index buffers
• Beware of misleading stats
Dynamic Batching
What is it ?
• Similar to Static Batching but it batches non-static
objects at run-time
How do I enable it ?
• In the player settings
• no need to tag. it auto-magically works…
Dynamic Batching cont.ed
How does it work internally ?
• objects are transformed to world space on
the CPU
• Temporary VB & IB are created
• Rendered in one draw call
GRAPHICS
Render Paths, Command Buffers, Shadows
Render Paths
• Vertex Lit
• Forward Rendering
• First pass for ambient + directional light
• One additional pass for each light hitting the object
• Deferred Lighting
• Two Geometry passes + Lighting
• GBuffer: Normal + Specular, Depth
Deferred Shading
• New Render Path in Unity 5
• Only one Geometry pass
• On Platforms with MRTs
• Fallback is Forward Rendering
Deferred Shading
Depth buffer + 4x32bit RTs:
• RT0: diffuse color (rgb), unused (a)
• RT1: spec color (rgb), roughness (a)
• RT2: normal (rgb), unused (a).
10.10.10.2 when available.
• RT3: emission/light (rgb), unused (a)
• Z: depth buffer & stencil
Command Buffers
• Command buffers
hold list of
rendering
commands
• They can be set to
execute at various
points during
camera rendering
Shadows
• Directional Light:
• Use CSM, up to 4 cascades
• they are rendered into screen space to a
32bit RT
• Point Light:
• Render 6 cube faces
• Spot Light:
• One shadow map per light
Mesh Skinning
Different Implementations depending on platform:• x86: SSE
• iOS/Android/WP8: Neon optimizations
• D3D11/XBoxOne/GLES3.0: GPU
• XBox360, WiiU: GPU (memexport)
• PS3: SPU
• WiiU: GPU w/ stream out
Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances
Best Practices
• Try different Render Paths– Performance depends on scene and platform
• Mix Realtime and Baked Lighting
• Use Level-Of-Detail Techniques
– Mesh, Texture, Shader
SCRIPTING
Scripting API and JIT compilation performance, allocations
GetComponent<T>
It asks the GameObject, for a component of the specified type:
• The GO contains a list of Components
• Each Component type is compared to T
• The first Component of type T (or that derives from T), will be returned to the caller
• Not too much overhead but it still needs to call into native code
Property Accessors
• Most accessors will be removed in Unity 5.0
• The objective is to reduce dependencies,
therefore improve modularization
• Transform will remain
• Existing scripts will be converted. Example:
in 5.0:
Transform Component
• this.transform is the same as GetComponent<Transform>()
• transform.position/rotation needs to:
– find Transform component
– Traverse hierarchy to calculate absolute position
– Apply translation/rotation
• transform internally stores the position relative to the parent
– transform.localPosition = new Vector(…) simple
assignment
– transform.position = new Vector(…) costs the same if
no father, otherwise it will need to traverse the hierarchy
up to transform the abs position into local
• finally, other components (collider, rigid body, light, camera,
etc..) will be notified via messages
WWW class properties
WWW.texture: Allocates a new Texture2D
…another example is WWW.audioClip
Object.Instantiate
API:
• Object Instantiate(Object, Vector3, Quaternion);
• Object Instantiate(Object);
Implementation:
• Clone GameObject Hierarchy and Components
• Copy Properties
• Awake
• Apply new Transform (if provided)
Object.Instantiate cont.ed
• Awake can be expensive
• AwakeFromLoad (main thread)– clear states
– internal state caching
– pre-compute
Unity 5.0:
• Allocations have been reduced
• Some inner loops for copying the data have been optimized
JIT Compilation
What is it ?• The process in which machine code is generated from
CIL code during the application's run-time
Pros:
• It generates optimized code for the current platform
Cons:
• Each time a method is called for the first time, the application will suffer a certain performance penalty because of the compilation
JIT compilation spikes
What about pre-JITting ?
• RuntimeHelpers.PrepareMethod does not work:
…better to use MethodHandle.GetFunctionPointer()
CONCLUSIONS
Best Practices
• Don’t make assumptions
• Platform X != Platform Y
• Profile on target device
• Editor != Player
• Managed Memory is not returned to Native Land!
• For best results…: Profile early and regularly
Want to know more ?
• Unite: http://unity3d.com/unite/archive
• Blog: http://blog.unity3d.com
• Forum: http://forum.unity3d.com
• Support: [email protected]