The adoption of FOSS workfows in commercial software development: the case of git and github

101
The adoption of FOSS workfows in commercial software development: the case of git and github Daniel M German University of Victoria Canada

Transcript of The adoption of FOSS workfows in commercial software development: the case of git and github

Page 1: The adoption of FOSS workfows in commercial software development: the case of git and github

The adoption of FOSS workfows in commercial software development: the

case of git and github

Daniel M GermanUniversity of Victoria

Canada

Page 2: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 3: The adoption of FOSS workfows in commercial software development: the case of git and github

Open Source is everywhere

Page 4: The adoption of FOSS workfows in commercial software development: the case of git and github

On SSL and Heartbleed

“[Heartbleed] is a software faw that has left up to two-thirds of the world’s websites vulnerable to attack by hackers.”

– The Economist

Page 5: The adoption of FOSS workfows in commercial software development: the case of git and github

“There is no such thing as bad publicity except your own obituary.”

– Brendan Behan

Page 6: The adoption of FOSS workfows in commercial software development: the case of git and github

● “Most open-source software – and Open SSL is no exception – is produced voluntarily by people who are not paid for creating it. They do it for love, professional pride or as a way of demonstrating technical virtuosity. And mostly they do it in their spare time.”

– John Naughton The Observer/The Guardian

'Heartbleed' bug can't be simply blamed on coders, April 13, 2014

Page 7: The adoption of FOSS workfows in commercial software development: the case of git and github

“Responsible corporate use of open-source software should therefore involve some measure of reciprocity: a corporation that benefts hugely from such software ought to put something back, either in the form of fnancial support for a particular open-source project, or – better still – by encouraging its own software people to contribute to the project.”

Page 8: The adoption of FOSS workfows in commercial software development: the case of git and github

“Much of the invisible backbone of websites from Google to Amazon to the Federal Bureau of Investigation was built by volunteer programmers in what is known as the open-source community.”

Page 9: The adoption of FOSS workfows in commercial software development: the case of git and github

“... volunteers, connected over the Internet, work together to build free software, to maintain and improve it and to look for bugs. Ideally, they check one another’s work in a peer review system similar to that found in science.”

Page 10: The adoption of FOSS workfows in commercial software development: the case of git and github

Linus Law:

“Given Enough Eyeballs, all Bugs are Shallow”

Eric Raymond, The Cathedral and the Bazaar

Page 11: The adoption of FOSS workfows in commercial software development: the case of git and github

In the case of Heartbleed

“There weren't enough eyeballs”

- Eric Raymond,

Page 12: The adoption of FOSS workfows in commercial software development: the case of git and github

● Code was created by a grad student

● Reviewed by S. Henson, core developer of OpenSSL

● Included in OpenSSL in the Spring 2011

● Not discovered for 3 years!

Budget of openSSL:

– US$2,000 for 2013

Page 13: The adoption of FOSS workfows in commercial software development: the case of git and github

the OpenSSL problem

● important infrastructure projects that are run by small teams of volunteers

● on April 24, the Linux Foundation announces the “Core Infrastructure Initiative” to address it

Page 14: The adoption of FOSS workfows in commercial software development: the case of git and github

Core Infrastructure Initiative

● Funded by:

– Amazon, Cisco, Dell, Facebook, Fujitsu, Google, IBM, Intel, Microsoft, NetApp, Rackspace,Qualcomm, VMware and The Linux Foundation

● Funding to core projects:

– Fellowships to core developers

– as well as other resources to assist the project in improving its security, enabling outside reviews, and improving responsiveness to patch requests.

Page 15: The adoption of FOSS workfows in commercial software development: the case of git and github

What is FOSS development?

● Most important feature of FOSS

– its free or open source license

● License

– Guarantees code is available to others to reuse

– Becomes a social contractamong participants

Page 16: The adoption of FOSS workfows in commercial software development: the case of git and github

What is OSS development?

● Most frequently defned as:

– Self organized teams developing software without a central authority

● Code is open for review

– and reuse!!!

● Anybody can participate

Page 17: The adoption of FOSS workfows in commercial software development: the case of git and github

What makes OSS development possible?

● Teams of self-organized developers and contributors

● The Internet

● A common toolkit

● Version control systems

Page 18: The adoption of FOSS workfows in commercial software development: the case of git and github

Teams

● Come from all sectors:

– Professionals and hobbyists

– Paid and volunteers

– Novices and Experienced

– High-school students to PhDs

– All over the world!!!

● Highly motivated!

Page 19: The adoption of FOSS workfows in commercial software development: the case of git and github

Common Toolkit

● To be able to collaborate you need a common set of tools

– Programming languages● gcc, perl, python, java, ruby, lua, php...

– Editors and IDEs● Emacs, vim, Eclipse, Netbeans...

– Libraries● boost, maven, cpan, Pypi...

– Infrastructure● Make, ant, cmake, bugzilla, etc.

– Hosting infrastructure● Sourceforge, Google Code, github, bitbucket

● They must be available at zero cost to anybody

Page 20: The adoption of FOSS workfows in commercial software development: the case of git and github

FOSS Toolkit

● I posit that one of the biggest infuences of FOSS on the practice of Software Development is the wide use of FOSS tools for the development of software

– Most implementations of popular programming languages today are open source

– FOSS Editors and IDEs arewidely used too

Page 21: The adoption of FOSS workfows in commercial software development: the case of git and github

Free Software Foundation

● The FSF had to boostrap the development of the OSS toolkit

– To build an Operating System you need a compiler

– Before you build a compiler you need an editor, but you need an editor to build a compiler

– gcc, emacs, bintools (ls, echo, cat, etc.), etc

Page 22: The adoption of FOSS workfows in commercial software development: the case of git and github

Richard Stallman

Created the legal and technical infrastructure for Free and Open Source software

Page 23: The adoption of FOSS workfows in commercial software development: the case of git and github

on Code Reviews

Page 24: The adoption of FOSS workfows in commercial software development: the case of git and github

Need for Code Reviews

● Many FOSS teams discovered that to ship good quality software they needed to review the source code

Page 25: The adoption of FOSS workfows in commercial software development: the case of git and github

Fagan Code Inspections

● Code reviews performed at specifc stages of development

Effective, but not widely used

Page 26: The adoption of FOSS workfows in commercial software development: the case of git and github

Open Source style Code Reviews

● Fagan inspections were unfeasible

– Required participants to be in the same room

● Instead, code reviews started to be incremental

– Rather than reviewing the whole, review the delta (the patch)

Page 27: The adoption of FOSS workfows in commercial software development: the case of git and github

Code Reviews in FOSS

Page 28: The adoption of FOSS workfows in commercial software development: the case of git and github

the spectrum of Code Reviews

Page 29: The adoption of FOSS workfows in commercial software development: the case of git and github

code reviews in FOSS

(1) early, frequent reviews(2) of small, independent, complete contributions

(3) that are broadcast to a large group of stakeholders, but only reviewed by a small set of self-selected experts

(4) resulting in an effcient and effective peer review technique.

- Peter Rigby

Page 30: The adoption of FOSS workfows in commercial software development: the case of git and github

Lessons from FOSS

Page 31: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 32: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 33: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 34: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 35: The adoption of FOSS workfows in commercial software development: the case of git and github

on Version Control systems

Page 36: The adoption of FOSS workfows in commercial software development: the case of git and github

Version Control Systems

● At the beginning, FOSS used tar fles in USENET

– the FSF would ship physical tapes!

● Today, version control systems are the norm

– Centralized or Distributed

● FOSS has a continuous and proven track of innovation in version control systems

– FOSS democratized VC

Page 37: The adoption of FOSS workfows in commercial software development: the case of git and github

On Version Control

● The VC is the circulatory system of a software development

● It brings the code to all stakeholders

● A contribution is a patch

– one or more commits

Page 38: The adoption of FOSS workfows in commercial software development: the case of git and github

the patch

● the patch should be reviewed

● most VCs don't support reviewing of patches

Page 39: The adoption of FOSS workfows in commercial software development: the case of git and github

the patch and its review

● Two models:

– Commit then Review● Review the code after it has been integrated

or

– Review Then Commit (RTC)● Review the patch before it is integrated

Page 40: The adoption of FOSS workfows in commercial software development: the case of git and github

Linux

● Linux incorporated RTC early in its process

● Linus needed integration of Review process with VC

● No FOSS VC did it

– he turned to bitkeeper

Page 41: The adoption of FOSS workfows in commercial software development: the case of git and github

Bitkeeper and Linux

● Symbiotic relationship

– Free (as in beer) licenses to linux developers with one big condition

● User should not develop competing tools

– Bitkeeper rapidly improved Linux integration process● simplifed integration of reviewed code

– Bitkeeper was probably infuenced by Linus workfow

– in 2005 bitkeeper revokes its license to Linux developers

Page 42: The adoption of FOSS workfows in commercial software development: the case of git and github

Git

● Many other distributed version control systems before it

● What makes it special?

– Many features, but specially:● Pull-requests● git incorporates code review process with a

distributed version control system– Even via email patches

Page 43: The adoption of FOSS workfows in commercial software development: the case of git and github

How is distributed version control software being used?

Page 44: The adoption of FOSS workfows in commercial software development: the case of git and github

Git

● Software engineers are moving towards git

– And other DVCs

● Github a major reason

Page 45: The adoption of FOSS workfows in commercial software development: the case of git and github

The Promise of Git

From: http://thkoch2001.github.io/whygitisbetter/

Page 46: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 47: The adoption of FOSS workfows in commercial software development: the case of git and github

Challenge 1

● Personal repos are beyond reach

● Local commits might never be observable

Page 48: The adoption of FOSS workfows in commercial software development: the case of git and github

“History is written by the victors”

Challenge 2: History

Page 49: The adoption of FOSS workfows in commercial software development: the case of git and github

Rebasing changes history

Page 50: The adoption of FOSS workfows in commercial software development: the case of git and github

Save history before it is lost!

Page 51: The adoption of FOSS workfows in commercial software development: the case of git and github

Super-repository

● Collection of repositories cloned (recursively) from the same repo

– At least one per developer● In their personal computer

– At least one public repository● The blessed

– In git, no way to trace them

Page 52: The adoption of FOSS workfows in commercial software development: the case of git and github

Moving commits across the superRepo

Method

Push Done at source, needs write access to destination

Pull Done at destination, needs read access to source

Email Source creates patch mails it; recipient applies it

Page 53: The adoption of FOSS workfows in commercial software development: the case of git and github

Ecosystem of Repos

Page 54: The adoption of FOSS workfows in commercial software development: the case of git and github

Can we learn from Linux?

Page 55: The adoption of FOSS workfows in commercial software development: the case of git and github

Life of a Patch in Linux

Page 56: The adoption of FOSS workfows in commercial software development: the case of git and github

ContinuousMining of Linux

● Linux has no centralized logging

– Nobody really knows what the superRepo is

– Commits fow without any event broadcasting mechanism

● Who do we fnd the activity?

– Repos

– Commits

Page 57: The adoption of FOSS workfows in commercial software development: the case of git and github

Semiautomatic Process

● Every 3 hrs, ask every repo

– What new commits do you have?

– What commits did you delete?

– Automatically resolve propagations● Commits might propagate before we scan

● Daily:

– Are commits in repo by unknown committers?● Answer:

– is there a new repo? or is committer new to repo?

Page 58: The adoption of FOSS workfows in commercial software development: the case of git and github

Implementation

● Running since Nov. 2011

– Currently scans 650 repos every 3 hrs

– Retrieved ● 2.3 million commits (compared to 400k in Linus

repo)● 109 million records in propagation table

<commit-id, added|deleted, repo, when>

Page 59: The adoption of FOSS workfows in commercial software development: the case of git and github

Snapshot (Linus) Continuous

No Repos 1 479

Commits 64k 533k

Non-merge Commits 59k 485k

Unique Non-merges 58k 135k

%unique non-merges 98.9% 27.9%

Non-merges that reached Blessed 43.1%

Different authors emails 3434 5646

Different authors 2883 4575

Different committers emails 283 1185

Different committers 245 1058

Page 60: The adoption of FOSS workfows in commercial software development: the case of git and github

Commit vs Patches

● Commit ids are insuffcient to tracks patches

● Large amount of work not reaching blessed

Page 61: The adoption of FOSS workfows in commercial software development: the case of git and github

Arrival of Commits at Blessed

Page 62: The adoption of FOSS workfows in commercial software development: the case of git and github

Arrival of Commits at Blessed...

● We can classify patches as a new feature or bug-fx

Page 63: The adoption of FOSS workfows in commercial software development: the case of git and github

The Latency

Time of Authorship Time of Commit

Page 64: The adoption of FOSS workfows in commercial software development: the case of git and github

The Repos

Page 65: The adoption of FOSS workfows in commercial software development: the case of git and github

Path to Linus

Page 66: The adoption of FOSS workfows in commercial software development: the case of git and github

● Large ecosystem of repositories

– Producers

– Consumers

Page 67: The adoption of FOSS workfows in commercial software development: the case of git and github

Contributors vs Consumers

Page 68: The adoption of FOSS workfows in commercial software development: the case of git and github

Linux Dashboard

● We asked two linux maintainers:

– Can this info be useful?

● Answer:

– “Yes”

… but not for what we expected...

Page 69: The adoption of FOSS workfows in commercial software development: the case of git and github

Tracking commits in Linux

● Need to track patches, not commits

– Particularly important in consumer repositories

– Need to cross-reference commits● What commits contain the same patch?

– Some repos track commits from blessed via cherry-picking

● Commit ids are useless● So they annotate log with the origin commit id

Page 70: The adoption of FOSS workfows in commercial software development: the case of git and github

Linux Commits Dashboard● Where is my commit?

– My original commit, has it reached Linus?

● What was merged?

– What commits were merged at once by Linus?

● What commits are related to this one?

– Same patch● Rebasing● Cherry picking

– Mentioned in a commit● This commit fxes bug introduced in X● This commit reverts commit X

● http://o.cs.uvic.ca:20810/perl/cid.pl?cid=70cb8bb0d365f0bc8b20fa67347caf9598a4674e

Page 71: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 72: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 74: The adoption of FOSS workfows in commercial software development: the case of git and github

Researcher states:

“40% of pull requests are not merged”

● Based on simply querying ghtorrent data● But it ignores what really happens● Many pull requests are merged without being marked as merged in github

● Ghtorrent data has many potential threats to validity

Page 75: The adoption of FOSS workfows in commercial software development: the case of git and github

What is github used for?

Page 76: The adoption of FOSS workfows in commercial software development: the case of git and github

"I store my presentations in github. I don't need a USB stick anymore!"

Page 77: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 78: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 79: The adoption of FOSS workfows in commercial software development: the case of git and github

Are there potential threats to validity for studies that assume github is about software engineering

only?

Page 80: The adoption of FOSS workfows in commercial software development: the case of git and github

Methodology

● Data sources:

– Surveys

– Sampling of repositories

● Mixed methods:

– Quantitative, and

– Qualitative

Page 81: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 82: The adoption of FOSS workfows in commercial software development: the case of git and github

I. A repository is not necessarily a project

II. Most projects have few commits

III. Most projects are innactive

IV. A large proportion of repositories are not for software engineering

V. More than two thirds of projects are personal

VI. Only a fraction of repos use pull requests

VII. If the commits in a pull-request are reworked, github only records the resulting patch

VIII. Most pull-requests appear as non-merged, even though they were merged

IX. Many active projects do not conduct all their sotfware development activity in github

Page 83: The adoption of FOSS workfows in commercial software development: the case of git and github

Uses:

Page 84: The adoption of FOSS workfows in commercial software development: the case of git and github

Most projects are inactive

Page 85: The adoption of FOSS workfows in commercial software development: the case of git and github

Social?

67% of projects are personal repos

95% have 3 or less committers

Page 86: The adoption of FOSS workfows in commercial software development: the case of git and github

Self contained?

“Any serious project would have to have someseparate infrastructure - mailing lists, forums, ircchannels and their archives, build farms, etc. [...]Thus while GitHub and all other project hosts areused for collaboration, they are not and can not

be a complete solution.”

Page 87: The adoption of FOSS workfows in commercial software development: the case of git and github

Others are already using github's information to reach conclusions!

Page 88: The adoption of FOSS workfows in commercial software development: the case of git and github

the open source report card

http://osrc.dfm.io/dmgerman/

Page 89: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 90: The adoption of FOSS workfows in commercial software development: the case of git and github

how are github users collaborating?

Page 91: The adoption of FOSS workfows in commercial software development: the case of git and github

How does github suppot collaboration?

● Methodology:

– Survey● 240 responses (24% response rate)

– Interviews● 35 interviews from survey respondents

– 71% professional developers– 11% managers– 9% students– 9% interns

● Approximately 1hr each

Page 92: The adoption of FOSS workfows in commercial software development: the case of git and github

Survey: why do you use github?

Page 93: The adoption of FOSS workfows in commercial software development: the case of git and github

Code centric collaboration

Page 94: The adoption of FOSS workfows in commercial software development: the case of git and github

Themes: focus

● Simple tools

– git branching/merging

– github features seem to be enough for most● Pull requests and issue tracking

● Focused interaction

– code-centric, focused communication

– asynchronous and unobtrusive●

Page 95: The adoption of FOSS workfows in commercial software development: the case of git and github

Focus: independence

● Decentralized work:

– git allows them to work independently

– yet they have visibility of what others do

● Low need for management:

– Need for a clear process (the workfow)

– They shy away from rigid management and team structure

– Team managers recognize this

– Managers should be educated on using git/github

Page 96: The adoption of FOSS workfows in commercial software development: the case of git and github

Focus: Exposure

● Easy contribution process

– Fork and potentially contribute without pre-authorization

● Peer pressure

– Developers are conscious that their code is readily visible to others

– Adoption of small, frequent contributions

Page 97: The adoption of FOSS workfows in commercial software development: the case of git and github

OSS mentality

● At the operational level

– the nature of the work allows independence and self-organization.

– developers are familiar with the idea of working this way and share the mentality behind it.

● developers are self-driven

● share the mentality of

– self- organizing,

– minimizing communication and coordination needs,

– having ownership of code, and

– operating on a meritocratic, expertise-based model

Page 98: The adoption of FOSS workfows in commercial software development: the case of git and github

The github ecosystem

Page 99: The adoption of FOSS workfows in commercial software development: the case of git and github

The Github Ecosystem

● github is creating an ecosystem of proprietary, cloud enabled applications for software development teams

– Service integration

– JSON API

● Asana, Campfre, Lighthouse, Jira, Travis, Trello, etc, etc.

Page 100: The adoption of FOSS workfows in commercial software development: the case of git and github
Page 101: The adoption of FOSS workfows in commercial software development: the case of git and github

Conclusions

● git and github are promoting the use of the pull-request workfow

– small, independent contributions

– that can be reviewed before integration

● Effectively, adopting open source code practices into their development

– Independent work

– Code reviews of contributions before they are integrated