The Ultimate Debian Database

26
The Ultimate Debian Database Israel Herraiz <[email protected]> Davis, CA, July 26th 2012 Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database

description

Some comments about the sources of data stored in the Ultimate Debian Database

Transcript of The Ultimate Debian Database

Page 1: The Ultimate Debian Database

The Ultimate Debian

Database Israel Herraiz

<[email protected]>

Davis, CA, July 26th 2012

Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database

Page 2: The Ultimate Debian Database

1 / 25

Outline

1. Debian: what is it and sources of data

2. The UDD: what is it and where to get it

3. What has been done and what we can do

Page 3: The Ultimate Debian Database

2 / 25

1. Debian: what is it and

sources of data

Page 4: The Ultimate Debian Database

3 / 25

Debian

• GNU/Linux software distribution

• Goal: to deliver an entirely and exclusively free

distribution

• Maintained by volunteers

• Bureaucratic organization (policies, constitution,

social contract)

• Release when ready

• > 10 years history

• > 500 MSLOC

• > 15k packages

Page 5: The Ultimate Debian Database

4 / 25

Debian Releases

Page 6: The Ultimate Debian Database

5 / 25

Page 7: The Ultimate Debian Database

6 / 25

Debian Source Packages

Page 8: The Ultimate Debian Database

7 / 25

Source and Binary Packages

• A source package generates one or more binary

packages

octave

octave-core

octave-doc

liboctave

liboctave-dev

Page 9: The Ultimate Debian Database

8 / 25

Package uploads

• There are no repositories like in other software

projects

• Although developers may privately use version

control systems

• When a bug is fixed, a new version is uploaded

• Uploads == commits

Page 10: The Ultimate Debian Database

9 / 25

Source: octave

Section: math

Priority: extra

Maintainer: Debian Octave Group <[email protected]>

Uploaders: Thomas Weber <[email protected]>, Sébastien Villemot

<[email protected]>

DM-Upload-Allowed: yes

Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….

Standards-Version: 3.9.3

Homepage: http://www.octave.org/

Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git

Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git

Source Packages metadata

Page 11: The Ultimate Debian Database

10 / 25

Package: octave

Priority: extra

Section: math

Installed-Size: 4760

Maintainer: Ubuntu Developers <[email protected]>

Architecture: amd64

Version: 3.6.1-1ubuntu1ppa1~precise1

Recommends: gnuplot, libatlas3gf-base

Replaces: octave3.2

Suggests: octave-info, octave-doc, octave-htmldoc

Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …

Conflicts: octave3.2

Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb

Size: 1746050

MD5sum: 2c431556d6cf98fd8a341e865ac63058

SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7

Description: GNU Octave language for numerical computations…

Binary Packages metadata

Page 12: The Ultimate Debian Database

11 / 25

Package: octave

Priority: extra

Section: math

Installed-Size: 4760

Maintainer: Ubuntu Developers <[email protected]>

Architecture: amd64

Version: 3.6.1-1ubuntu1ppa1~precise1

Recommends: gnuplot, libatlas3gf-base

Replaces: octave3.2

Suggests: octave-info, octave-doc, octave-htmldoc

Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …

Conflicts: octave3.2

Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb

Size: 1746050

MD5sum: 2c431556d6cf98fd8a341e865ac63058

SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7

Description: GNU Octave language for numerical computations…

Binary Packages metadata

Page 13: The Ultimate Debian Database

12 / 25

Debian Popcon: Tracking Installations

• Popularity: total

install counts

• Recent Use (< 30

days)

• Old Use (Beyond 30

days)

• Data collected daily

• Users voluntarily opt-

in

• Source of bias

Page 14: The Ultimate Debian Database

13 / 25

Debian Bugs

• People find bugs in binary packages

• ~500 bugs per month

• But bugs are linked to source packages

• Bugs can be

• Accepted and solved in Debian

• Rejected

• Forwarded to upstream

• Everything else, similar to other bug tracking

systems

• Life cycle, comments, severity levels…

Page 15: The Ultimate Debian Database

14 / 25

2. The UDD: what is it and

where to get it

Page 16: The Ultimate Debian Database

15 / 25

Research work: main paper (at MSR 2010)

Page 17: The Ultimate Debian Database

16 / 25

Other papers at MSR 2010

Page 18: The Ultimate Debian Database

17 / 25

What is the UDD?

• PostgreSQL database with all the information of

the sources described so far

• http://udd.debian.org

• New dumps available every two days

• ~ 500 MB bz2

• Used for some Debian internal services

• Schema too complex and too big for a slide

• Technical detail: you need a Debian-based

system to load the dump of the UDD

Page 19: The Ultimate Debian Database

18 / 25

Debian sources of data

• Sources / Packages

metadata

• Bugs

• including *all*

archived bugs

• 1995-96-97

• Carnivore

• Debtags

• Popularity Contest

• DEHS

• Lintian

• Migrations to testing

• Uploads

• All the way back to

1998!

• New packages queue

• Translations status

• Orphaned packages

• Screenshots

Page 20: The Ultimate Debian Database

19 / 25

!

Page 21: The Ultimate Debian Database

20 / 25

Bear in mind!

• You can also obtain the source code of the

packages

• Easy to automate

• And the modifications done by the Debian

maintainers

• So add product metrics to the set of data

sources

• But this is not included in the UDD

Page 22: The Ultimate Debian Database

21 / 25

3. What has been done and

what we can do

Page 23: The Ultimate Debian Database

22 / 25

What kind of questions does Debian solve with the

UDD?

• High priority packages that have Release

Candidate blocker bugs

• Developers with very buggy and/or outdated

packages

• Who uploaded this package to the unstable

release?

• Who reported the RC bugs since the last

release?

Page 24: The Ultimate Debian Database

23 / 25

Some questions solved in the literature

• The popularity bias

• http://oa.upm.es/9585/

• Open source projects get more bug reports if

they are popular

• The actual number of bugs is not related to the

number of bugs reported

• So more bugs actually means more quality

• Well, at least more people who decide to use the

software

Page 25: The Ultimate Debian Database

24 / 25

The popularity bias

Lo

g(B

ug

s)

Log(installations)

Required packages

Page 26: The Ultimate Debian Database

25 / 25

Summary

• Packages and sources metadata

• And source code

• Bugs

• All the way back to 1995-96-97!

• Popularity contest

• Maintainers activity (uploads)

• All the way back to 1998!

• And much more….

• Now, what do you think we can do with this?