Toolchain Independent Distributed Compilation

Toolchain Independent Distributed CompilationDietmar HauserHead of Console TechnologySproing Interactive Media GmbH

http://xkcd.com/303/

Is it possible?Input● 1 cpp/c source file● Many h/hpp header files→ “Compilation Unit” (“CU”)Output● 1 obj/o binary file● (Misc. helper files)

Research previous solutionsIncrediBuild● The de-facto standard● Easy setup, works well● Pretty, but pricey● Limited platforms● Coordinated load balancing

distcc● Free, but narrow focus● Needs homogenous setup● Two methods:

● Preprocess & distribute● Analyse & distribute

The PlanSend input files● Start with .cpp/.c● Find required .h/.inl/...● Precompiled Header (PCH)● Compiler executable(s)● Calculate hash for every file

Receive output files● Main .obj binary file● Misc .ti/.sbr/.d/... files● Log output● Cache inputs using hash● Cache output by

combining input hashes

Finding all input filesPreprocess Method● Run Preprocessor locally● Distribute Result● Easy to do● Less parallelisation● Input file cache not possible● No PCH support possible

Analyse Method● Analyse .cpp file ● Find all dependencies● Sounds simple, but is tricky● Slightly more data to send● Good cache behaviour● PCH support possible

Precompiled Header Files● Speeds up compilation● Contains “global” includes● Is included in every CU● Proprietary format● Often very big● Contains extra dependency

information● May not be deterministic

Source: http://www.ogre3d.org

Fun with preprocessor directives● Directory search order● <> vs. “” includes● Multi line includes● Case mismatches● Conditional includes● PP constants in includes● PP macros in includes

● Conservative approach● Find all possible dependencies● Reasonable overhead● Cache dependencies locally

● Still a world of pain● Trial & Error

Putting it all together● Collect all input files● Send them if needed (zipped)● Build directory structure in Temp● Cache & Compile● Collect all output files● Cache & Send back (zipped)● ???● Profit!

It kinda works...Little problems:● PCH files don't work● Long & deeply nested file names● Absolute paths● Some compilers need registry● Issues with parallel jobs● And some more...

Big problem:● Debug info stores absolute

paths to source files!

Sandboxie to the rescue● Virtual file system

● Recreate original paths● No concurrency issues● Simple clean up

● Virtual registry● Not free (~€10-25 per user)● Does not solve all problems● But it's good enough!

Miscellaneous titbits● “Screen Saver” mode● Automatic server updates● Output file cache (~ccache)● Data compression woes● 100 MBit/s vs. 1 Gbit/s● Local compilation server● Parallel local compilation● Parallel linking experiment

So, is it worth the hassle?● Measuring this is tricky

● Real projects● In a live environment

● 34 servers, ~17 available● Maximum speed up: ~17● Uncached: 0.6 – 6.68● Cached: 1.06 – 13.13

Sproing's Codebase (21 Projects)

Schlag den Raab 2 (1 Project)

3rd Party Codebase (64 Projects)

Conclusions

It's possible distribute compilation with any compiler

Speed up is highly dependent on the environment and use case

Speed up is almost always positive, often greatly

What's next?● Get other developers involved?● Leverage an external cloud?● Distribute other processes? (Asset conversion,...)● Find a better solution for PCH?● Improve or unify front end with LLVM & Clang?● Distributed linking?

Thank you for your attention!

[email protected]

@Rattenhirn

http://www.sproing.com

http://fb.me/sproing

Questions?

Toolchain Independent Distributed Compilation

Software

Transcript of Toolchain Independent Distributed Compilation