One Year of Porting - Post-mortem of two Linux/SteamOS launches
-
Upload
leszek-godlewski -
Category
Software
-
view
25.895 -
download
2
description
Transcript of One Year of Porting - Post-mortem of two Linux/SteamOS launches
One Year of PortingPost-mortem of two Linux/SteamOS launches
Leszek Godlewski
Who is this guy?
Leszek GodlewskiProgrammer, Nordic Games (early 2014 – now)
– Unannounced projectFreelance Programmer (Sep 2013 – early 2014)
– Linux port of Painkiller Hell & Damnation– Linux port of Deadfall Adventures
Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013)
– Painkiller Hell & Damnation, Deadfall Adventures
Focus
● Not sales figures
● Not business viability
● Not game-specific bugs
● Not the Steam Controller – oops! �
● Platform-specific problems
● Mistakes made & mitigation attempts
Agenda
● The ports
● Laying down the foundations– Build system
– Compilers
– Linking
– Boilerplate
● Release and feedback– User issues
– Crash handling
– GLSL shader linking
The ports
Painkiller Hell & Damnation (The Farm 51)
Deadfall Adventures (The Farm 51)
Facts
● Unreal Engine 3
● All major Linux distros– SteamOS, Debian, Ubuntu, Fedora, Arch, Gentoo
● All official drivers– NVIDIA, AMD, Intel (i965)
● Some open-source drivers– Gallium r600, Gallium radeonsi
Facts
● Most of UE3 middlewares have Linux versions– In our case: PhysX, FaceFX, Scaleform Gfx, lzopro, Bink...
● Introduced open-source middlewares– SDL 2.x, GLEW, Steam Runtime
● UE3's build system – Unreal Build Tool– Handles everything make does
– Written in C#, fixed up to run in Mono on Linux
● UE3's content packaging (cooking) system– Linux target based on Mac OSX
Facts
● QA department unfamiliar with Linux– Basic training was required
● Installing & running software (including from the command line), file permissions, driver installation, gathering system information...
– Mostly reported false positives in the beginning
● Spare time project over ~13 months– After leaving The Farm 51 employment – contracted for further
outsourcing directly by TF51
– Occasional support from individual members of TF51 staf● Kudos to Piotr Bąk and Wojciech Knopf!
Overlap
● Noticed how a lot of work was based on OSX code?
● Happens all the time– POSIX
– OpenGL/OpenAL
MacOS X
Linux Mobile
Agenda
● The ports
● Laying down the foundations– Build system
– Compilers
– Linking
– Boilerplate
● Release and feedback– User issues
– Crash handling
– GLSL shader linking
Laying down the foundations
Starting point
● Epic's OpenGL 2.1 and OpenAL back-ends– OpenGL mode somewhat functional in Windows developer builds
● Epic's Mac OSX port– Limited test builds for Mac OSX had been made before
– Mac OSX binary builds supported via remote compiltion
– Existing Mac OSX target for game content packaging (cooking)
● Both of the above – somewhat... unfinished �
● On Windows, the games shipped 32-bit binaries only
Building the build tool – C# & Mono
● Patched the Unreal Build Tool to build & run on Mono in Linux– Mono can handle most .NET commandline apps all right
● Added support for Linux toolchains (duh)
● Fixed hardcoding of backslashes in paths– Path.Join() instead
● Fixed regexes on large strings (C++ sources) blowing up the stack– Break up the string into smaller parts
Cross-compiling for 32/64-bit
● Yes, I agree, 32-bit should die, but one may not be allowed to kill it
● gcc -m32/-m64 is not enough!– Sets target code generation
– But not headers & libraries (CRT, OpenMP, libgcc etc.)
● Fixed (on Debian & friends) by installing gcc-multilib– Dependency package for non-default architectures (i.e. i386 on an amd64
system and vice versa)
Clang
● Clang is faster– gcc: 3m47s
– Clang: 3m05s
● Clang has diferent diagnostics than gcc
● Clang has C++ preprocessor macro compatibility with gcc– Declares __GNUC__ etc.
● Clang has commandline compatibility with gcc– Can easily switch back & forth between gcc and Clang
Clang - caveats
● Object files may be incompatible with gcc & fail to link (need full rebuilds)
● gcc is more mature than Clang– Clang has generated faulty code for me (YMMV)
● Slight inconsistencies in C++ standard strictness– Templates
– Anonymous structs/unions
– May need to add this-> in some places
– May need to name some anonymous types
So – Clang or gcc?
Both:
● Clang – quick iterations during development
● gcc – final shipping binaries
Linking – GNU ld
● Default linker on Linux
● Ancient
● Single-threaded
● Requires specification of libraries in the order of reverse dependency...
● We are not doomed to use it!
Linking – GNU gold
● Multi-threaded linker for ELF binaries– ld: 18s
– gold: 5s
● Developed at Google, now officially part of GNU binutils
● Drop-in replacement for ld– May need an additional parameter or toolchain setup
● clang++ -B/usr/lib/gold-ld ...● g++ -fuse-ld=gold ...
● Still needs libs in the order of reverse dependency...
Linking – library groups
● Major headache/game-breaker with circular dependencies– ”Proper” fix: re-specify the same libraries over and over again
● Declare library groups instead– Wrap library list with --start-group –end-group
● Shorthand: -(, -)● g++ foo.obj -Wl,-\( -lA -lB -Wl,-\)
● Caveat: results in exhaustive symbol search within the group– Manual warns of possible performance hit
– Not observed here, but keep that in mind!
Caching the gdb-index
● Large codebase generates heavy debug symbols (hundreds of megabytes)
● gdb generates the index for quick symbol lookup...
● ...at every single gdb startup �– Takes several minutes for said codebases
– Massive waste of time!
● Solution: cache the index, fold it into the build process!– Full description in the gdb manual
– gdb -batch -ex "save gdb-index $(OUTPUT_PATH)/gdb-index" $(BINARY)
– objcopy --add-section .gdb_index=$(OUTPUT_PATH)/gdb-index/$(BINARY).gdb-index --set-section-flags .gdb_index=readonly $(BINARY) $(BINARY)
Raw X11 or SDL?
● Initially tried rolling my own boilerplate– Basic X11 mouse, window and key press events are easy
– Unicode text input is not
– Useful windowing is not
– Correct GLX is not
– Linux joystick API is not
– Above all, X11 seems to be on its way out● Wayland & Mir will have emulation layers, but that's bound to have
overhead
● You really want to use SDL 2 instead, trust me– Shameless plug: see my talk from WGK 2013 for benefits of using SDL 2 ☺
Agenda
● The ports
● Laying down the foundations– Build system
– Compilers
– Linking
– Boilerplate
● Release and feedback– User issues
– Crash handling
– GLSL shader linking
Release and feedback
What we shipped initially with the beta
● 32-bit binaries (64-bit added later on)
● Launch script (~20 lines)– Architecture detection
● Initially a stub for 64-bit with fallback to 32-bit– Steam Runtime injection (if not already present)
● That's about it ☺
● Explicit dependency on the Steam Runtime– Allows shifting some responsibility to Valve
– And, admittedly, to users who insist on using their own dependencies
User issues
● Missing/incompatbile libraries– Resulting from disabling the Steam Runtime
● Gentoo users, mostly... Maintainer of steam package had chosen to disable it by default
– Usually fixed by force-starting Steam with STEAM_RUNTIME=1● $ STEAM_RUNTIME=1 steam
● ”Missing” 32-bit NVIDIA OpenGL libraries on 64-bit systems– Apparently, they might end up unreachable by the dynamic linker
– Fixed by adding /usr/lib32 to LD_LIBRARY_PATH in the launch script
– Also, prompt user to make sure they did install them● It's an option - ”install compatibility 32-bit libraries”
User issues
● No support for DXT texture compression despite capable hardware (GL_EXT_texture_compression_s3tc)– Concerns the open-source drivers
– For legal reasons (S3/VIA patents), some distros don't ship it or install it automatically
● E.g. Fedora– If extension not advertised by driver, suggest the user to install
libtxc_dxtn
● Often a distro package, so no hassle
More user issues...
● Graphical glitches...
● Broken V-sync...
● Broken NVIDIA Optimus with open-source multiplexer...
● Looong & unresponsive loading times...
● A whole lot of crashes...
● Most of the above was my fault – not going to bore you with all of this!
Crash handling
● Unix signals– Asynchronous IPC notification mechanism in POSIX-compliant systems
● Sources can be the process itself, other processes or the kernel– Default handler terminates process & dumps core for most signals
– Can (must?!) specify custom handlers
● Get/set handlers via the sigaction(2) system call– Handler prototype: void sa_handler(int signal,
siginfo_t *siginfo, void *context);
● More information– G. Ben-Yossef, Crash N' Burn: Writing Linux application fault handlers
Interesting siginfo_t fields
● si_errno – errno value– Possibly more detailed error code
● si_code – reason for sending the signal– Both general and per signal type
– Examples: issued by user, issued by kernel, illegal addressing mode, FP over/underflow, invalid memory permissions, unmapped address etc.
● si_addr – memory location at which fault happened– If applicable: SIGILL, SIGFPE, SIGSEGV, SIGBUS and SIGTRAP
Signal handler caveats
● Not safe to allocate or free heap memory!– Fault may have corrupted the allocator's data structures
● Prone to race conditions– Can't share locks with the main program!
● If signalled after locking, you'll deadlock– Can't call async-unsafe functions!
● See manual for signal(7) for a list of safe ones
● Custom handlers do not dump core (a.k.a. minidump)– Mitigated by restoring default handler after custom logging and re-signalling self
● signal(signum, SIG_DFL); raise(signum);
Safe stack walking
● glibc provides backtrace() and friends
● Symbols are read from the dynamic symbol table– Must pass -rdynamic to gcc/Clang to populate
● Calling backtrace_symbols() allocates heap memory– Not safe... ☹
– Still, can get away with it most of the time
– Proper solution involves a separate watchdog process & pipes (heap-less backtrace_symbols_fd() call instead)
Long load times? Unresponsiveness?
● Profiling quickly places blame on shader linking– OpenGL shader model operates on program objects, created by linking shader pipeline
combinations● Introduces lots of redundancy (see glGetProgramiv() & glGetShaderiv())● Drivers often defer actual compilation until ”link time”● Increased memory consumption
– UE3 OpenGL renderer blocks the render thread for linking● Render thread blocked → Frozen loading screen!
– Both games have thousands of shaders● An awful lot of vertex/fragment shader combinations (programs) �
● Moreover – makes async level streaming blocking!– Bad stuttering during gameplay
● Situation better on subsequent loads on NVIDIA due to in-driver cache
Shader linking
● Short-term fix: background shader linking– Worker thread with a separate OpenGL context, sharing data with the main one
– Queue all shader link jobs, execute on the worker only
– If on a loading screen, keep spinning it while waiting for the shaders
– Defer ”async streaming done” notifications till shader link queue is empty
● Pros:– Quick & easy to implement
– Fixes gameplay stuttering
● Cons:– Only fixes unresponsiveness, not the long load times ☹
Shader linking
● Disaster on the official AMD Catalyst driver!– Total system hang (PC needs hard reset)
– Apparently, exposed a race condition in AMD driver
– AMD has yet to ship the fix...
● Fallback to old, blocking code path if Catalyst detected
Shader linking
● Possible improvement (suggested by Epic): ARB_separate_shader_objects– Replaces programs (and linking) with much lighter pipeline objects
● Removes a lot of redundancy– Makes use of separate vertex/fragment shaders (D3D-like)
– Would play well with UE3's RHI, modelled mostly after D3D
● Not implemented ☹ – requires shader syntax upgrade and a refactor of UE3's OpenGL renderer– Explicit locations for attributes and varyings required for SSO
– Need to bump GLSL from 1.20 (OpenGL 2.1) to at least 1.40 (OpenGL 3.1)
Shader linking
● Proper fix: deferred shader access– Modern drivers queue shader compiles and links internally and process
them in a multithreaded manner● Official NVIDIA & AMD Catalyst● Open-source Mesa drivers in SteamOS (patches pushed upstream
recently by Valve)– Kick all the jobs (i.e. create shader objects) at level load
– Do not access the objects (query, draw) until they are needed
– Not even the compile/link status! This creates a sync point!
● Not implemented ☹ – requires a considerable refactor of UE3's OpenGL renderer
Summary
Takeaway 1/2
● Porting .NET-based tools to Linux is viable
● Many 32/64-bit cross-compile issues are solved with gcc-multilib
● Switching back and forth between Clang and gcc is easy and useful
● Link times can be greatly improved by using gold
● Caching the gdb-index improves debugging experience
● Using SDL 2 is way better than rolling your own boilerplate
Takeaway 2/2
● Using the Steam Runtime is good for you
● Crash handling in Linux is easy to do, tricky to get right
● OpenGL shader model is significantly diferent from D3D's
● GLSL linking is slow, so defer access if possible
● Multiple concurrent OpenGL contexts can still bite you
● Test on different GPU drivers to avoid unpleasant surprises!
@ l g o d l e w s k i @ n o r d i c g a m e s . a tt @ T h e I n e Q u a ti o n
K w w w. i n e q u a ti o n . o r g
Questions?
F u r t h e r N o r d i c G a m e s i n f o r m a ti o n :K w w w. n o r d i c g a m e s . a t
D e v e l o p m e n t i n f o r m a ti o n :K w w w. g r i m l o r e g a m e s . c o m
Thank you!