My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS...

24
My adventures with open .* Georgios Gousios TU Delft / EWI @gousiosg

Transcript of My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS...

Page 1: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

My adventures with open .*Georgios Gousios

TU Delft / EWI

@gousiosg

Page 2: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

how I got trapped1997

1999

first read about open source

installed Linux

2001 wrote how to for the linux doc project

contributor to the KDE 3.x/Kaffeine media player2003

2006 leading the development of Alitheia Core2008 google summer of code participant

founding member, greek OSS society2008

2010 work on OSS cloud infrastructures

2011 started the GHTorrent project

Page 3: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

alitheia core

50k LOC!

Page 4: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

demo.sqo-oss.org

Page 5: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

alitheia core in numbers• 750 OSS repositories, 1.5GB data dump

• the most refined software engineering dataset at the time

• supported by an EC FP6 project

• 6 partners

• ~20 publications

• 4 PhDs, mine included, funded

Page 6: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

2 external publications 1 external user

0 industry adoption

alitheia core impact

Page 7: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

api.github.com/v3

Page 8: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

<<event>>PushEvent

<<api>>/users/:user

ensure_user

<<api>>/repos/:user/:repo/

ensure_repo

<<api>>/repos/:user/:repo/commits

ensure_commits

ensure_user

<<api>>/:user/:repo/sha

ensure_commit

ensure_user

<<api>>/users/:user/followers

ensure_followers

<<api>>/repos/:user/:repo/commits/:sha/comments

ensure_commit_comments

<<api>>/users/:user/orgs

ensure_orgs

<<api>>/orgs/:org/teams

ensure_teams

recursive dependency retrieval

Page 9: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

relational database

Page 10: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

repositories

users

organizations

issues

/users/:user

/user/repos

/repos/:user/:repo/issues

/orgs/:org

{"type":"User","public_gists":0,"login":"gousiosg","followers":8,"name":"GeorgiosGousios","public_repos":4,"created_at":...,"id":386172,"following":4,}

{...

noSQL database as cache

Page 11: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

periodic dumps of DBs online

Page 12: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

ghtorrent facts

• 1 developer, no external funding

• 3 papers

• advertised on social media

• since 2012

Page 13: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

300+ external users 150+ external papers

msr14, vissoft14, github data mining challenge40% of all papers on GitHub (Cosentino et al. 2016)

many best paper awardsused at: microsoft, delloite, blackduck

received funding from: microsoft, google

ghtorrent impact

Page 14: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

why such a difference?• github is hot as a research target!

• true, but so was Sourceforge when Alitheia Core analysed it

• (alitheia core was) not invented here!

• true, but GHTorrent was of worse quality when available

• i don’t want to invest time in your infrastructure!

• true, but you still do it with GHTorrent (ok, less)

Page 15: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

be open or

be irrelevant

Page 16: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

Tools Datasets

Page 17: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

what to open?at the very least:

tools

datasets

but also:

papers

talk slides

lecture notes

technical designs

(successful?) research proposals

Page 18: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

how to open?• choose a license

• BSD or MIT for source code

• CC-BY-SA for data and other materials

• choose a platform

• github for src

• zenodo for data, gives a DOI!

• slideshare or speakerdeck for slides

• figshare, pure.tudelft.nl or your site for papers

Page 19: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

how to open?• choose a license

• BSD or MIT for source code

• CC-BY-SA for data and other materials

• choose a platform

• github for src

• zenodo for data, gives a DOI!

• slideshare or speakerdeck for slides

• figshare, pure.tudelft.nl or your site for papers

Page 20: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

open now trumps

open when it’s done

Page 21: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.
Page 22: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

how to open now?

• think in terms of Minimum Viable Product

• what is the least possible amount of work that will make sense to somebody else?

• work in iterations

• open, gather feedback, improve, repeat

• Embrace the “Hacker Way”

Page 23: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

but somebody will steal my data/code/ideas!

it feels amazing to have created something worth stealing!

• if someone invests time in stealing:

• what you created is great

• you have a head start

• if nobody invests time in stealing:

• is what you created worth your time/effort?

• is your research relevant?

good artists copy; great artists steal

Page 24: My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS society 2010 work on OSS cloud infrastructures 2011 started the GHTorrent project.

–Howard H. Aiken

“The problem in this business isn't to keep people from stealing your ideas; it is making them steal

your ideas”

@gousiosg