My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS...

Post on 26-Jun-2020

3 views 0 download

Transcript of My adventures with open · 2008 google summer of code participant 2008 founding member, greek OSS...

My adventures with open .*Georgios Gousios

TU Delft / EWI

@gousiosg

how I got trapped1997

1999

first read about open source

installed Linux

2001 wrote how to for the linux doc project

contributor to the KDE 3.x/Kaffeine media player2003

2006 leading the development of Alitheia Core2008 google summer of code participant

founding member, greek OSS society2008

2010 work on OSS cloud infrastructures

2011 started the GHTorrent project

alitheia core

50k LOC!

demo.sqo-oss.org

alitheia core in numbers• 750 OSS repositories, 1.5GB data dump

• the most refined software engineering dataset at the time

• supported by an EC FP6 project

• 6 partners

• ~20 publications

• 4 PhDs, mine included, funded

2 external publications 1 external user

0 industry adoption

alitheia core impact

api.github.com/v3

<<event>>PushEvent

<<api>>/users/:user

ensure_user

<<api>>/repos/:user/:repo/

ensure_repo

<<api>>/repos/:user/:repo/commits

ensure_commits

ensure_user

<<api>>/:user/:repo/sha

ensure_commit

ensure_user

<<api>>/users/:user/followers

ensure_followers

<<api>>/repos/:user/:repo/commits/:sha/comments

ensure_commit_comments

<<api>>/users/:user/orgs

ensure_orgs

<<api>>/orgs/:org/teams

ensure_teams

recursive dependency retrieval

relational database

repositories

users

organizations

issues

/users/:user

/user/repos

/repos/:user/:repo/issues

/orgs/:org

{"type":"User","public_gists":0,"login":"gousiosg","followers":8,"name":"GeorgiosGousios","public_repos":4,"created_at":...,"id":386172,"following":4,}

{...

noSQL database as cache

periodic dumps of DBs online

ghtorrent facts

• 1 developer, no external funding

• 3 papers

• advertised on social media

• since 2012

300+ external users 150+ external papers

msr14, vissoft14, github data mining challenge40% of all papers on GitHub (Cosentino et al. 2016)

many best paper awardsused at: microsoft, delloite, blackduck

received funding from: microsoft, google

ghtorrent impact

why such a difference?• github is hot as a research target!

• true, but so was Sourceforge when Alitheia Core analysed it

• (alitheia core was) not invented here!

• true, but GHTorrent was of worse quality when available

• i don’t want to invest time in your infrastructure!

• true, but you still do it with GHTorrent (ok, less)

be open or

be irrelevant

Tools Datasets

what to open?at the very least:

tools

datasets

but also:

papers

talk slides

lecture notes

technical designs

(successful?) research proposals

how to open?• choose a license

• BSD or MIT for source code

• CC-BY-SA for data and other materials

• choose a platform

• github for src

• zenodo for data, gives a DOI!

• slideshare or speakerdeck for slides

• figshare, pure.tudelft.nl or your site for papers

how to open?• choose a license

• BSD or MIT for source code

• CC-BY-SA for data and other materials

• choose a platform

• github for src

• zenodo for data, gives a DOI!

• slideshare or speakerdeck for slides

• figshare, pure.tudelft.nl or your site for papers

open now trumps

open when it’s done

how to open now?

• think in terms of Minimum Viable Product

• what is the least possible amount of work that will make sense to somebody else?

• work in iterations

• open, gather feedback, improve, repeat

• Embrace the “Hacker Way”

but somebody will steal my data/code/ideas!

it feels amazing to have created something worth stealing!

• if someone invests time in stealing:

• what you created is great

• you have a head start

• if nobody invests time in stealing:

• is what you created worth your time/effort?

• is your research relevant?

good artists copy; great artists steal

–Howard H. Aiken

“The problem in this business isn't to keep people from stealing your ideas; it is making them steal

your ideas”

@gousiosg