Gamedev-grade debugging

92
Leszek Godlewski Programmer, Nordic Games Gamedev-grade debugging Source: hp://igetyourfail.blogspot.com/2009/01/reaching-out-tale-of-failed-skinning.html

description

Part of the proceedings of Code Mesh 2014: http://www.codemesh.io/ Source ODP file (includes videos and presenter notes): https://www.dropbox.com/s/720qkkn27uaz43u/ggd.odp?dl=0 Video games are complex and non-deterministic systems. So complex, in fact, that some days the everyday breakpoint just doesn't cut it when you're looking for that next bug. Drawing from the experience of deploying three large titles to four platforms, this talk will discuss the different approaches and borderline magical tricks to debugging different parts of a game: noise filtering when our breakpoint is hit way too often, memory stomping, time-dependent bugs, rendering glitches… Story of a game programmer's life.

Transcript of Gamedev-grade debugging

Page 1: Gamedev-grade debugging

Leszek GodlewskiProgrammer, Nordic Games

Gamedev-grade debugging

Source: http://igetyourfail.blogspot.com/2009/01/reaching-out-tale-of-failed-skinning.html

Page 2: Gamedev-grade debugging

Nordic Games GmbH

● Started in 2011 as a sister company to Nordic Games Publishing (We Sing)

● Base IP acquired from JoWooD and DreamCatcher (SpellForce, The Guild, Aquanox, Painkiller)

● Initially focusing on smaller, niche games● Acquired THQ IPs in 2013 (Darksiders, Titan Quest, Red

Faction, MX vs. ATV)● Now shifting towards being a production company with

internal devs● Since fall 2013: internal studio in Munich, Germany

(Grimlore Games)

Page 3: Gamedev-grade debugging

Who is this guy?

Leszek GodlewskiProgrammer, Nordic Games (early 2014 – Nov 2014)

– Linux port of DarksidersFreelance Programmer (Sep 2013 – early 2014)

– Linux port of Painkiller Hell & Damnation– Linux port of Deadfall Adventures

Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013)– Painkiller Hell & Damnation, Deadfall Adventures

Page 4: Gamedev-grade debugging

Agenda

How is gamedev different?Bug speciesCase studiesConclusions

Page 5: Gamedev-grade debugging

How is gamedev different?

StartStart Exit?Exit?

EndEnd

Yes

NoUpdateUpdate DrawDraw

Page 6: Gamedev-grade debugging

33 milliseconds

How much time you have to get shit done™– 30 Hz → 33⅓ ms per frame

– 60 Hz → 16⅔ ms per frame

EditorEditor

Level toolsLevel tools

Asset toolsAsset tools

EngineEngine

PhysicsPhysics

RenderingRendering AudioAudio

NetworkNetwork

PlatformPlatform

InputInput

Networkback-endNetworkback-end

GameGame

UIUI LogicLogic AIAI

Page 7: Gamedev-grade debugging

Interdisciplinary working environment

Designers– Game, Level, Quest, Audio…

Artists– Environment, Character, 2D, UI, Concept…

Programmers– Gameplay, Engine, Tools, UI, Audio…

WritersComposersActorsProducersPR & Marketing Specialists…

} Tightlywoventeams

Page 8: Gamedev-grade debugging

Severe, fixed hardware constraints

Main reason for extensive use of native code

Page 9: Gamedev-grade debugging

Different trade-offs

Robustness

C

ost

Performance

Fun

/Coo

lnes

s

Enterprise/B2B/webdev Gamedev

Page 10: Gamedev-grade debugging

Indeterminism & complexity

Leads to poor testability– Parts make no sense in isolation

– What exactly is correct?

– Performance regressions?

Source: https://github.com/memononen/recastnavigation

Page 11: Gamedev-grade debugging

Aversion to general software engineering

ModellingObject-Oriented ProgrammingDesign patternsC++ STLTemplates in general…

Page 12: Gamedev-grade debugging

Agenda

How is gamedev different?Bug speciesCase studiesConclusions

Page 13: Gamedev-grade debugging

Bug species

Source: http://benigoat.tumblr.com/post/100306422911/press-b-to-crouch

Page 14: Gamedev-grade debugging

General programming bugs

Memory access violationsMemory stomping/buffer overflowsInfinite loopsUninitialized variablesReference cyclesFloating point precision errorsOut-Of-Memory/memory fragmentationMemory leaksThreading errors

Page 15: Gamedev-grade debugging

Bad maths

Incorrect transform order– Matrix multiplication not commutative

– AB ≠ BA

Incorrect transform space

Source: http://leadwerks.com/wiki/index.php?title=TFormQuat

Page 16: Gamedev-grade debugging

Temporal bugs

Incorrect update order– for (int i = 0; i < entities.size(); ++i)

entities[i].update();

Incorrect interpolation/blending– Bad alpha term

– Bad blending mode (additive/modulate)

Deferred effects– After n frames

– After n times an action happens

– n may be random, indeterministic

Page 17: Gamedev-grade debugging

Graphical glitches

Incorrect render stateShader code bugsPrecision

Source: http://igetyourfail.blogspot.com/2009/01/visit-lake-fail-this-weekend.html

Page 18: Gamedev-grade debugging

Content bugs

Incorrect scriptsBuggy assets

Source: http://www.polycount.com/forum/showpost.php?p=1263124&postcount=10466

Page 19: Gamedev-grade debugging

Worst part?

Most cases are two or more of the aforementioned, intertwined

Page 20: Gamedev-grade debugging

Agenda

How is gamedev different?Bug speciesCase studiesConclusions

Page 21: Gamedev-grade debugging

Case studies

Most material captured by

Page 22: Gamedev-grade debugging

Video settings not updating

Page 23: Gamedev-grade debugging

Incorrect weapon after demon mode foreshadowing

Page 24: Gamedev-grade debugging

Post-death sprint camera anim

Page 25: Gamedev-grade debugging

Corpses teleported on death

Page 26: Gamedev-grade debugging

Corpses teleported on death

In normal gameplay, pawns have simplified movement– Sweep the actor's collision primitive through the world

– Slide along slopes, stop against walls

Source: http://udn.epicgames.com/Three/PhysicalAnimation.html

Page 27: Gamedev-grade debugging

Corpses teleported on death

Upon death, pawns switch to physics-based movement (ragdoll)

Source: http://udn.epicgames.com/Three/PhysicalAnimation.html

Page 28: Gamedev-grade debugging

Corpses teleported on death (cont.)

Physics bodies have separate state from the game actor– Actor does not drive physics bodies, unless requested

– If actor is driven by physics simulation, their location is synchronized to the hips bone body's

Source: http://udn.epicgames.com/Three/PhysicalAnimation.html

Page 29: Gamedev-grade debugging

Corpses teleported on death (cont.)

Idea: breakpoint in FarMove()?– One function because world octree is updated

– Function gets called a gazillion times per frame �

– Terrible noise

Breakpoint condition?– Teleport from arbitrary point A to arbitrary point B

– Distance?

Breakpoint sequence?– Break on death instead

– When breakpoint hit, break in FarMove()

Page 30: Gamedev-grade debugging

Corpses teleported on death (cont.)

Cause: physics body driving the actor with out-of-date state

Fix: request physics body state synchronization to animation before switching to ragdoll

Page 31: Gamedev-grade debugging

Weapons floating away from the player

Page 32: Gamedev-grade debugging

Weapons floating away from the player

Page 33: Gamedev-grade debugging

Weapons floating away from the player

Extremely rare, only encountered on consoles– Reproduction rate somewhere at 1 in 50 attempts

– And never on developer machines �

Player pawn in a special state for the rollercoaster ride– Many things could go wrong

For the lack of repro, sprinkled the code with debug logs

Page 34: Gamedev-grade debugging

Weapons floating away from the player (cont.)

Cause: incorrect update order– for (int i = 0; i < entities.size(); ++i)

entities[i].update();

– Player pawn forced to update after rollercoaster car

– Possible for weapons to be updated before player pawns

Fix: enforce weapon update after player pawns

Page 35: Gamedev-grade debugging

Characters with “rapiers”

Page 36: Gamedev-grade debugging

Characters with “rapiers”

UE3 has ”content cooking” as part of game build pipeline– Redistributable builds are ”cooked” builds

Artifact appears only in cooked builds

Page 37: Gamedev-grade debugging

Characters with “rapiers” – cont.

Logs contained assertions for ”out-of-bounds vertices”Mesh vertex compression scheme

– 32-bit float → 16-bit short int (~50% savings)

– Find bounding sphere for all vertices

– Normalize all vertices to said sphere radius

– Map [-1; 1] floats to [-32768; 32767] 16-bit integers

Assert condition– for (int i = 0; i < 3; ++i)

assert(v[i] >= -1.f && v[i] <= 1.f,”Out-of-bound vertex!”);

Page 38: Gamedev-grade debugging

Characters with “rapiers” – cont.

v[i] was NaN– Interesting property of NaN: all comparisons fail

– Even with itself● float f = nanf();bool b = (f == f);// b is false

How did it get there?!Tracked the NaN all the way down to the raw engine

asset!

Page 39: Gamedev-grade debugging

Characters with “rapiers” (cont.)

Cause: ???Fix: re-export the mesh from 3D software

– Magic!

Page 40: Gamedev-grade debugging

Meta-case: undeniable assertion

Page 41: Gamedev-grade debugging

Undeniable assertion

Happened while debugging ”rapiers”Texture compression library without sourcesFlood of non-critical assertions

– For almost every texture

– Could not ignore in bulk �

– Terrible noise

Solution suggestion taken from [SINILO12]

Page 42: Gamedev-grade debugging

Undeniable assertion (cont.)

Enter disassembly

Page 43: Gamedev-grade debugging

Undeniable assertion (cont.)

Locate assert message function call instruction

Page 44: Gamedev-grade debugging

Undeniable assertion (cont.)

Enter memory view and look up the adress– 0xE8 is the CALL opcode

– 4-byte address argument

Page 45: Gamedev-grade debugging

Undeniable assertion (cont.)

NOP it out!– 0x90 is the NOP opcode

Page 46: Gamedev-grade debugging

Undeniable assertion (cont.)

Page 47: Gamedev-grade debugging

Incorrect player movement

Page 48: Gamedev-grade debugging

Incorrect player movement

Page 49: Gamedev-grade debugging

Incorrect player movement

Recreating player movement from one engine in another (Pain Engine → Unreal Engine 3)

Different physics engines (Havok vs PhysX)Many nuances

– Air control

– Jump and fall heights

– Slope & stair climbing & sliding down

Page 50: Gamedev-grade debugging

Incorrect player movement (cont.)

Main nuance: capsule vs cylinder

Page 51: Gamedev-grade debugging

Incorrect player movement (cont.)

Switching our pawn collision to capsule-based was not an option

Emulate by sampling the ground under the cylinder instead

No clever way to debug, just make it ”bug out” and break in debugger

Page 52: Gamedev-grade debugging

Incorrect player movement (cont.)

Situation when getting stuckCause: vanilla UE3 code sent a player locked between

non-walkable surfaces into the ”falling” stateFix: keep the player ”walking”

Page 53: Gamedev-grade debugging

Incorrect player movement (cont.)

Situation when moving without player intentAdded visualization of sampling, turned on collision

displayCause: undersamplingFix: increase radial sampling resolution1) 2)

Page 54: Gamedev-grade debugging

Blinking full-screen damage effects

Page 55: Gamedev-grade debugging

Blinking full-screen damage effects

Post-process effects are organized in one-way chains

Page 56: Gamedev-grade debugging

Blinking full-screen damage effects (cont.)

No debugger available to observe the PP chainRolled my own overlay that walked and dumped the

chain contents

MaterialEffect 'Vignette' Param 'Strength' 0.83 [IIIIIIII ]MaterialEffect 'FilmGrain' Param 'Strength' 0.00 [ ]UberPostProcessEffect 'None' SceneHighLights (X=0.80,Y=0.80,Z=0.80) SceneMidTones (X=0.80,Y=0.80,Z=0.80) …MaterialEffect 'Blood' Param 'Strength' 1.00 [IIIIIIIIII]

Page 57: Gamedev-grade debugging

Blinking full-screen damage effects (cont.)

Cause: entire PP chain override– Breakpoint in chain setting revealed the level script as the source

– Overeager level designer ticking one checkbox too many when setting up thunderstorm effects

Fix: disable chain overriding altogether– No use case for it in our game anyway

Page 58: Gamedev-grade debugging

Incorrect animation states

Page 59: Gamedev-grade debugging

Incorrect animation states

Page 60: Gamedev-grade debugging

Incorrect animation states

Page 61: Gamedev-grade debugging

Incorrect animation states

Animation in UE3 is done by evaluating a tree– Branches are weight-blended (either replacement or additive blend)

– Sequences (raw animations) for whole-skeleton poses

– Skeletal controls for fine-tuning of individual bones

Source: http://udn.epicgames.com/Three/AnimTreeEditorUserGuide.html

Page 62: Gamedev-grade debugging

Incorrect animation states (cont.)

Prominent case for domain-specific debuggersNo tools for that in UE3, rolled my own visualizer

– Allows inspection of animation state, but not the reasons for transitions

– Still requires conventional debugging, but narrows it down greatly

– Walks the animation tree and dumps active branches and its parameters

Page 63: Gamedev-grade debugging

Incorrect animation states (cont.)

We have developed sort of an animation bug checklistInspect the animation state in slow motion

– Is the correct blending mode used?

Inspect the AI and cutscene state– Capable of animation overrides

Inspect the assets (animation sequences)– Is the root bone correctly oriented?

– Is the root bone motion correct?

– Are inverse kinematics targets present and correctly placed?

– Is the mesh skeleton complete and correct?

Page 64: Gamedev-grade debugging

Incorrect animation states (cont.)

Incorrect blend of reload animation– Cause: bad root bone orientation in animation sequence

Left hand off the weapon– Cause: left hand inverse kinematics was off

– Fix: revise IK state control code

Left hand incorrectly oriented– Cause: bad IK target marker orientation on weapon mesh

Page 65: Gamedev-grade debugging

Viewport stretched when portals are in view

Page 66: Gamedev-grade debugging

Viewport stretched when portals are in view

Graphics debugging is:– Tracing & recording graphics API (OpenGL/Direct3D) calls

– Replaying the trace

– Reviewing the renderer state and resources

Trace may be somewhat unreadable at first…

Page 67: Gamedev-grade debugging

Viewport stretched when portals are… (cont.)

Traces may be annotated for clarity– Direct3D: ID3DUserDefinedAnnotation

– OpenGL: GL_KHR_debug (more info: [GODLEWSKI01])

Page 68: Gamedev-grade debugging

Viewport stretched when portals are… (cont.)

Quick renderer state inspection revealed that viewport dimensions were off

– 1024x1024, 1:1 aspect ratio instead of 1280x720, 16:9

– Shadow map resolution?

Found the latest glViewport() call– Shadow map indeed

Why wasn't the viewport updated for main scene rendering?

Page 69: Gamedev-grade debugging

Viewport stretched when portals are… (cont.)

Renderer state changes are expensive– New state needs to be validated

– Modern graphics APIs are asynchronous

– State reading may requrie synchronization → stalls

Cache the current renderer state to avoid redundant calls– Cache ↔ state divergence → bugs!

Page 70: Gamedev-grade debugging

Viewport stretched when portals are… (cont.)

Cause: cache ↔ state divergence– Difference between Direct3D and OpenGL: viewport dimensions as part

of render target state, or global state

Fix: tie viewport dimensions to render target in the cache

Page 71: Gamedev-grade debugging

Black artifacts

Page 72: Gamedev-grade debugging

Black artifacts

Page 73: Gamedev-grade debugging

Black artifacts

Page 74: Gamedev-grade debugging

Black artifacts

Page 75: Gamedev-grade debugging

Black artifacts

Page 76: Gamedev-grade debugging

Black artifacts

First thing to do is to inspect the stateNothing suspicious found, turned to shadersOn OpenGL 4.2+, shaders could be debugged in NSight…OpenGL 2.1, so had to resort to early returns from shader

with debug colours– Shader equivalent of debug logs, a.k.a. ”Your Mum's Debugger”

”Shotgun debugging” with is*() functionsisnan() returned true!

Page 77: Gamedev-grade debugging

Black artifacts (cont.)

Cause: undefined behaviour in NVIDIA's pow() implementation

– Results are undefined if x < 0.Results are undefined if x = 0 and y <= 0. [GLSL120]

– Undefined means the implementation is free to do whatever● NVIDIA returns QNaN the Barbarian (displayed as black, poisoning

all involved calculations)● Other vendors usually return 0

Fix: for all pow() calls, clamp either:– Arguments to their proper ranges

– Output to [0; ∞)

Page 78: Gamedev-grade debugging

Mysterious crash

Page 79: Gamedev-grade debugging

Mysterious crash

Game in content lock (feature freeze) for a whilePlaystation 3 port nearly doneCrash ~3-5 frames after entering a specific roomFirst report included a perfectly normal callstack but no

obvious reasonQA reassigned to another task, could not pursue moreConcluded it must've been an OOM crash

Page 80: Gamedev-grade debugging

Mysterious crash (cont.)

Bug comes back, albeit with wildly different callstackAsked QA to reproduce mutliple times, including other

platforms– No crashes on X360 & Windows!

Totally different callstack each timeConfusion!

– OOM? Even in 512 MB developer mode (256 MB in retail units)?

– Bad content?

– Console OS bug?

– Audio thread?

– ???

Page 81: Gamedev-grade debugging

Mysterious crash (cont.)

Reviewed a larger sample of callstacksMost ended in dlmalloc's integrity checks

– Assertions triggered upon allocations and frees

Memory stomping…? Could it be…?

Page 82: Gamedev-grade debugging

Mysterious crash (cont.)

Started researching memory debuggingNo tools provided by SonyAttempted to use debug allocators (dmalloc et al.)

– Most use the concept of memory fences

– Difficult to hook up to UE3

malloc

Regular allocation Fenced allocation

malloc

Page 83: Gamedev-grade debugging

Mysterious crash (cont.)

Found and integrated a community-developed tool, Heap Inspector [VANDERBEEK14]

– Memory analyzer

– Focused on consumption and usage patterns monitoring

– Records callstacks for allocations and frees

Several reproduction attempts revealed a correlation– Crash adress

– Construction of a specific class

Gotcha!

Page 84: Gamedev-grade debugging

Mysterious crash (cont.)

// class declarationclass Crasher extends ActorComponent;var int DummyArray[1024];

// in ammo consumption codeCrash = new class'Crasher';Comp = new class'ActorComponent'

(Crash);

Page 85: Gamedev-grade debugging

Mysterious crash (cont.)

// class declarationclass Crasher extends ActorComponent;var int DummyArray[1024];

// in ammo consumption codeCrash = new class'Crasher';Comp = new class'ActorComponent'

(Crash);

Page 86: Gamedev-grade debugging

Mysterious crash (cont.)

Cause: buffer overflow vulnerability in UnrealScript VM– No manifestation on X360 & Windows due to larger allocation

alignment value (8 vs 16 bytes)

Fix: make copy-construction with subclassed object as template fail

I wish I had Valgrind! [GODLEWSKI02]

Page 87: Gamedev-grade debugging

Agenda

How is gamedev different?Bug speciesCase studiesConclusions

Page 88: Gamedev-grade debugging

Takeaway

Time is of the essence!Always on a tight scheduleConstantly in motion

– Temporal visualization is key

– Custom, domain-specific tools

Complex and indeterministic– Difficult to automate testing

– Wide knowledge required

Prone to bugs outside the code– Custom, domain-specific tools, again

Page 89: Gamedev-grade debugging

Takeaway (cont.)

Rendering is a whole separate beast– Absolutely custom tools in isolation from the rest of the game

– Still far from ideal usability

Good to know your machine down to the metalGood memory debugging tools make a world's differenceYou are never safe, not even in managed languages!

Page 90: Gamedev-grade debugging

@ l g o d l e w s k i @ n o r d i c g a m e s . a tt @ T h e I n e Q u a ti o n

K w w w. i n e q u a ti o n . o r g

Questions?

Page 91: Gamedev-grade debugging

F u r t h e r N o r d i c G a m e s i n f o r m a ti o n :K w w w. n o r d i c g a m e s . a t

D e v e l o p m e n t i n f o r m a ti o n :K w w w. g r i m l o r e g a m e s . c o m

Thank you!

Page 92: Gamedev-grade debugging

References

SINILO12 – Sinilo, M. ”Coding in a debugger” [link] GODLEWSKI01 – Godlewski, L. ”OpenGL (ES) debugging” [link] GLSL120 – Kessenich, J. ”The OpenGL® Shading Language”, Language Version: 1.20, Document

Revision: 8, p. 57 [link] VANDERBEEK14 – van der Beek, J. ”Heap Inspector” [link] GODLEWSKI02 – Godlewski, L. ”Advanced Linux Game Programming” [link]