10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John...

32
10-20x Faster Software Builds 10-20x Faster Software Builds John Ousterhout 2307 Leghorn Street Mountain View, CA 94043 www.electric-cloud.com

Transcript of 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John...

Page 1: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

10-20x FasterSoftware Builds10-20x Faster

Software Builds

John Ousterhout

2307 Leghorn StreetMountain View, CA 94043www.electric-cloud.com

Page 2: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 2

OverviewOverviewSlow builds impact almost allmedium/large development teams

Electric Cloud speeds up builds10-20x:

Harnesses clusters of inexpensiveserversUnlocks concurrency by deducingdependenciesMinimizes scalability bottlenecks

Faster builds meanFaster time to marketHigher product qualityAbility to do more with less

Design, create,manage sources

SoftwarebuildsTest

Page 3: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 3

OutlineOutlineThe impact of slow builds

The holy grail: concurrent builds

Dependencies: problem and solution

Electric Cloud architecture

Managing files

Limiting bottlenecks

Performance measurements

Page 4: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 4

Problem: Slow BuildsProblem: Slow BuildsOver 500 companies surveyed, average build 2-4 hours

5-15% loss in engineering productivity:Wasted engineering time & frustrationLess time to fix bugs, add features

5-10% delay in time to market:Slow builds add weeks to release cyclesUncertainty & risk due to last-minutebroken builds

Quality & customer satisfaction:Developers can’t rebuild before check-inQA waiting on broken builds or skipping tests tomeet deadlinesMore bugs escape to the field

Page 5: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 5

Personal ExperiencePersonal Experience

Slow builds drove me crazySprite research project (Berkeley, late ’80s):

Most popular feature was “pmake”Painful to return to commercial OS’es

Interwoven, 2000-2001:7-10-hour builds> 1 month with no successful daily builds, late in a release cycle

Discovered that they drive everyone crazy!

Founded Electric Cloud to solve the problem

Page 6: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 6

Theoretical Solution:ConcurrencyTheoretical Solution:Concurrency

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

SourceCode

ObjectFiles

Executables

Release

Builds have inherent parallelismSolution: split up builds and run pieces concurrently

Large SMP Machines (gmake –j)Distributed builds (distcc)01010

101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

Libraries

01010101010101010101

If only it were this easy…

Page 7: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 7

Problem: DependenciesProblem: Dependencies

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

SourceCode

ObjectFiles

Executables

Release

Builds have inherent parallelismSolution: split up builds and run pieces concurrently

Large SMP Machines (gmake –j)Distributed builds (distcc)

Current attempts to speed builds yield small resultsDependency problems:

Incomplete Can’t be expressed between MakefilesResult: broken builds

Difficult to get more than a 2-3x speedupHard to maintain Makefiles

01010101010101010101

01010101010101010101

01010101010101010101

01010101010101010101

Libraries

01010101010101010101

Page 8: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 8

Electric Cloud SolutionElectric Cloud SolutionDeduce dependencies on-the-fly:

Watch all file accesses: these indicate dependenciesAutomatically detect out-of-order steps

10101010101010101010101010101010101010101010101010101010

Linklibrary

x.libLinkapp.

write read

Desired Actual

10101010101010101010101010101010101010101010101010101010

Linklibrary

x.libwrite

10101010101010101010101010101010101010101010101010101010

x.libLinkapp.

read

old!Run inparallel?Error!

Page 9: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 9

Electric Cloud SolutionElectric Cloud SolutionDeduce dependencies on-the-fly:

Watch all file accesses: these indicate dependenciesAutomatically detect and correct out-of-order stepsSave discovered dependencies for future buildsResult: high concurrency possible

10101010101010101010101010101010101010101010101010101010

Linklibrary

x.libLinkapp.

write read

Desired Actual

10101010101010101010101010101010101010101010101010101010

Linklibrary

x.libwrite

10101010101010101010101010101010101010101010101010101010

x.libLinkapp.

read

old!

Discard

Linkapp.

Rerunread

Page 10: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 10

Electric Cloud ArchitectureElectric Cloud Architecture

ClusterManager

Manager

Cluster

Electric Make

Make Machine

Plug-in replacement for GNU Make, Microsoft NMAKE

Plug-in replacement for GNU Make, Microsoft NMAKE

Inexpensive rack-mounted servers run pieces of build in parallel

Inexpensive rack-mounted servers run pieces of build in parallel

Web-based reporting, management tools

Web-based reporting, management tools

Node

Electric File System

AgentNode

Electric File System

AgentNode

Electric File System

Agent Node

Electric File System

Agent

Network

Page 11: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 11

Clustering ApproachClustering Approach

Advantages (vs. multiprocessor):Cost-effective: $1-2K per CPUScalable: no hard limit to cluster size

Potential problems:Build state not necessarily available on nodesOverhead for network communicationRobustness: more pieces that can break

Page 12: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 12

VirtualizationVirtualizationNode environment must duplicate make machine; hard because of

Different environments on different make machinesFile versioning within a buildClearCase views

Simple application-specific network file system:

Electric Make is serverAgent is client, fetches files on demandVirtualizes subtree(s) from make machineFiles cached on nodes during a build

On Windows, registry data is also virtualized on nodes

Electric Make

Make Machine

Node

Electric File System

Agent

Network

Server

Client

Page 13: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 13

Versioning File SystemVersioning File System

Files can have many versions during build:Append to log fileDebug/release versions compiled to same .o files

Each read must return correct version (based on sequential order for build)Electric Make maintains version history for each file

Tricky: name space must be versioned also

Network file system passes appropriate version to each job, flushes caches when necessary

Example: log file extended with series of appends

Read #1 Read #2 Read #3

Page 14: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 14

Network OptimizationNetwork Optimization

P2P file transfers offload 20-25% of outbound traffic:Take advantage of inexpensive bandwidth within switch

Just-in-time compression cuts traffic 2.5-3x:Match network bandwidth to disk

Electric Make

Make Machine

Node

Electric File System

Agent

Network

Node

Electric File System

AgentNode

Electric File System

AgentNode

Electric File System

Agent

Network bandwidth concentrates at make machine

Peer-to-peer file transfer

Page 15: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 15

File System OptimizationFile System OptimizationHighly parallel builds stress build machine’s file system :

Average bandwidth as high as 10-20 MB/sClearCase? High latency

All disk I/O passes through Electric Make: opportunity to manage read & write concurrency

Single disk? Concurrency causes extra head motionNetwork file system? More concurrency hides network latency

Metadata caching improves ClearCase performance significantly

Page 16: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 16

Recursive MakesRecursive Makes

Gmake: separate gmake invocation for each Makefile:Hard to extract & manage concurrencyCan’t manage dependencies across Makefile

Electric Make: merge MakefilesRecursive makes return immediately with parameter infoTop-level emake manages multiple make instances

all: a bcc child1/mod1.a child2/mod2.a ...

a:make -C child1

b:make -C child2

all: a bcc child1/mod1.a child2/mod2.a ...

a:make -C child1

b:make -C child2

mod1.a: a.o b.o c.oar r mod1.a a.o b.o c.oranlib mod1.a

a.o: ...b.o: ...c.o: ...

mod1.a: a.o b.o c.oar r mod1.a a.o b.o c.oranlib mod1.a

a.o: ...b.o: ...c.o: ...

mod2.a: x.o y.o z.oar r mod1.a x.o y.o z.oranlib mod2.a

x.o: ...y.o: ...z.o: ...

mod2.a: x.o y.o z.oar r mod1.a x.o y.o z.oranlib mod2.a

x.o: ...y.o: ...z.o: ...

Makefile

child1/Makefile

child2/Makefile

Page 17: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 17

Recursive Makes, cont’dRecursive Makes, cont’dWhere this works well:all:

for i in “a b c d e f g”; do \cd $$i; $(MAKE); cd ..; \

done

Where this doesn’t work so well (output of submakes is used):all:

for i in “a b c d e f g”; do \cd $$i; $(MAKE) >> log; cd ..; \

done

Must modify Makefiles in some cases

Page 18: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 18

CompatibilityCompatibility

Plug-compatible with GNU Make, Microsoft NMAKE:Change ‘gmake’ or ‘nmake’ to ‘emake’ in build scriptsIdentical command-line optionsIdentical results (except builds run faster)Identical log file outputTypically a few Makefile changes to maximize speedup

Page 19: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 19

ManageabilityManageabilityWeb-based administration

As easy to manage many nodes as 1 node

Can be used by entire team:Supports multiple simultaneous buildsPriority system for node allocation

Robust: automatic fail-over on node failures

Page 20: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 20

Results: Open SourceResults: Open Source

Local 20 CPUs SpeedupSamba 952s 58s 16.4xMySQL 1400s 124s 11.3xGtk 891s 95s 9.4x

0

5

10

15

20

0 5 10 15 20

#CPUs in cluster

Spee

dup

SambaMySQLGtk

0

5

10

15

20

0 5 10 15 20

#CPUs in cluster

Spee

dup

SambaMySQLGtk

Page 21: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 21

Results: Linux KernelResults: Linux KernelLinux Kernel 2.6.1

Make bzimage + modules

2.8 GHz Xeon, 1 GB RAM, IDE Drive

Build Time

[mm:ss]

Speedup

Local 22:085 nodes 5:09 4.3x10 nodes 2:40 8.3x15 nodes* 2:03 10.8x20 nodes* 1:42 13.0x

* Projected build time

1328

309160 123 102

0

200

400

600

800

1000

1200

1400

local 5 10 15 20

Page 22: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 22

Telecom Equip. VendorTelecom Equip. Vendor

110

10

0

20

40

60

80

100

120

GNU Make -j8 Electric Cloud 16 Nodes

Bui

ld T

ime

(min

utes

)

Impact: 3 week savings out ofan 8 month release cycle expected

11xSpeedup!

Page 23: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 23

Enterprise Software Co.Enterprise Software Co.

Solaris 2.8

0

50

100

150

200

250

300

GNU Make Electric Cloud (30 nodes)

Bui

ld T

ime

(min

utes

)

274

0:13

20xSpeedup!

Impact: Enabled worldwide follow-the-sun development

Page 24: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 24

Electric CloudElectric Cloud

We eat our own dog food

Continuous build system:Start build and test cycle whenever changes are committed to the main branch

25.5

7.4

0.0

5.0

10.0

15.0

20.0

25.0

30.0

GNU Make Electric Cloud 7 nodes

Bui

ld T

ime

(min

utes

)

Page 25: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 25

What about distcc?What about distcc?Works with gmake –j

Distributes compile steps to nodes

Preprocesses code on make machine:Preprocessed code is self-contained: eliminates virtualization issues

Page 26: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 26

distcc vs. Electric Clouddistcc vs. Electric Clouddistcc:

FreeWorks with other build tools (SCons?)PortableCompiler-specific (gcc)Less scalable:

Only distributes compiles;preprocessing centralizedMissing dependencies break build

Build log scrambledNo cluster sharing facilities?

Electric Cloud:Not freeOnly works with Make

Windows, Linux, SolarisWorks with all compilersMore scalable:

Distributes all build steps (even Makefile parsing)Deduces dependencies to avoid build breakageParallelizes sub-makes

Build log in sequential orderCluster mgmt/sharing

Page 27: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 27

Electric Make vs. DistccElectric Make vs. Distcc

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10

Electric Make

GNU make/distcc

Apache

Number of Agents

Spee

dup

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10

Electric Make

GNU make/distcc

Number of Agents

Spee

dup

Linux Kernel

0123456789

0 2 4 6 8 10

Electric Make

GNU make/distcc

MySQLSp

eedu

p

Number of Agents

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10

Electric MakeGNU make/distcc

Mozilla

Spee

dup

Number of Agents

distccbreaksbuild

Page 28: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 28

Performance LimitsPerformance LimitsFile system on make machine

ClearCase dynamic views particularly slowWindows: large .pdb and .pch files

Serializations within buildsLinking slow on Linux

Make machine CPU not an issueTypically running at 30% utilization

Page 29: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 29

Impact of 10-20x SpeedupImpact of 10-20x Speedup

Build Time Impact

14 hours Build doesn’t finish overnight

6 hours Overnight build

2 hours Multiple revs in a single day

30 min. Full rebuild before checkin

5 min. Little need to switch context

1 min. No need to switch context

2-3x

2-3x

2-3x

2-3x

2-3x

Electric Cloud can drop you two bands

Page 30: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 30

ConclusionConclusion

No need to tolerate slow builds anymore

Faster builds meanFaster time to marketHigher qualityAbility to do more with less

Page 31: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 31

More InformationMore Information

For more information or to answer additional questions:

Visit our website: www.electric-cloud.comE-mail: [email protected]: 650-962-4777

Page 32: 10-20x Faster Software Builds - USENIX · PDF file10-20x Faster Software Builds John Ousterhout ... The holy grail: concurrent builds ... Sprite research project

Slide 32