Post on 26-Jun-2020
My adventures with open .*Georgios Gousios
TU Delft / EWI
@gousiosg
how I got trapped1997
1999
first read about open source
installed Linux
2001 wrote how to for the linux doc project
contributor to the KDE 3.x/Kaffeine media player2003
2006 leading the development of Alitheia Core2008 google summer of code participant
founding member, greek OSS society2008
2010 work on OSS cloud infrastructures
2011 started the GHTorrent project
alitheia core
50k LOC!
demo.sqo-oss.org
alitheia core in numbers• 750 OSS repositories, 1.5GB data dump
• the most refined software engineering dataset at the time
• supported by an EC FP6 project
• 6 partners
• ~20 publications
• 4 PhDs, mine included, funded
2 external publications 1 external user
0 industry adoption
alitheia core impact
api.github.com/v3
<<event>>PushEvent
<<api>>/users/:user
ensure_user
<<api>>/repos/:user/:repo/
ensure_repo
<<api>>/repos/:user/:repo/commits
ensure_commits
ensure_user
<<api>>/:user/:repo/sha
ensure_commit
ensure_user
<<api>>/users/:user/followers
ensure_followers
<<api>>/repos/:user/:repo/commits/:sha/comments
ensure_commit_comments
<<api>>/users/:user/orgs
ensure_orgs
<<api>>/orgs/:org/teams
ensure_teams
recursive dependency retrieval
relational database
repositories
users
organizations
issues
/users/:user
/user/repos
/repos/:user/:repo/issues
/orgs/:org
{"type":"User","public_gists":0,"login":"gousiosg","followers":8,"name":"GeorgiosGousios","public_repos":4,"created_at":...,"id":386172,"following":4,}
{...
noSQL database as cache
periodic dumps of DBs online
ghtorrent facts
• 1 developer, no external funding
• 3 papers
• advertised on social media
• since 2012
300+ external users 150+ external papers
msr14, vissoft14, github data mining challenge40% of all papers on GitHub (Cosentino et al. 2016)
many best paper awardsused at: microsoft, delloite, blackduck
received funding from: microsoft, google
ghtorrent impact
why such a difference?• github is hot as a research target!
• true, but so was Sourceforge when Alitheia Core analysed it
• (alitheia core was) not invented here!
• true, but GHTorrent was of worse quality when available
• i don’t want to invest time in your infrastructure!
• true, but you still do it with GHTorrent (ok, less)
be open or
be irrelevant
Tools Datasets
what to open?at the very least:
tools
datasets
but also:
papers
talk slides
lecture notes
technical designs
(successful?) research proposals
how to open?• choose a license
• BSD or MIT for source code
• CC-BY-SA for data and other materials
• choose a platform
• github for src
• zenodo for data, gives a DOI!
• slideshare or speakerdeck for slides
• figshare, pure.tudelft.nl or your site for papers
how to open?• choose a license
• BSD or MIT for source code
• CC-BY-SA for data and other materials
• choose a platform
• github for src
• zenodo for data, gives a DOI!
• slideshare or speakerdeck for slides
• figshare, pure.tudelft.nl or your site for papers
open now trumps
open when it’s done
how to open now?
• think in terms of Minimum Viable Product
• what is the least possible amount of work that will make sense to somebody else?
• work in iterations
• open, gather feedback, improve, repeat
• Embrace the “Hacker Way”
but somebody will steal my data/code/ideas!
it feels amazing to have created something worth stealing!
• if someone invests time in stealing:
• what you created is great
• you have a head start
• if nobody invests time in stealing:
• is what you created worth your time/effort?
• is your research relevant?
good artists copy; great artists steal
–Howard H. Aiken
“The problem in this business isn't to keep people from stealing your ideas; it is making them steal
your ideas”
@gousiosg