The Ultimate Debian Database
-
Upload
israel-herraiz -
Category
Education
-
view
383 -
download
3
description
Transcript of The Ultimate Debian Database
The Ultimate Debian
Database Israel Herraiz
Davis, CA, July 26th 2012
Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
1 / 25
Outline
1. Debian: what is it and sources of data
2. The UDD: what is it and where to get it
3. What has been done and what we can do
2 / 25
1. Debian: what is it and
sources of data
3 / 25
Debian
• GNU/Linux software distribution
• Goal: to deliver an entirely and exclusively free
distribution
• Maintained by volunteers
• Bureaucratic organization (policies, constitution,
social contract)
• Release when ready
• > 10 years history
• > 500 MSLOC
• > 15k packages
4 / 25
Debian Releases
5 / 25
6 / 25
Debian Source Packages
7 / 25
Source and Binary Packages
• A source package generates one or more binary
packages
octave
octave-core
octave-doc
liboctave
liboctave-dev
8 / 25
Package uploads
• There are no repositories like in other software
projects
• Although developers may privately use version
control systems
• When a bug is fixed, a new version is uploaded
• Uploads == commits
9 / 25
Source: octave
Section: math
Priority: extra
Maintainer: Debian Octave Group <[email protected]>
Uploaders: Thomas Weber <[email protected]>, Sébastien Villemot
DM-Upload-Allowed: yes
Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….
Standards-Version: 3.9.3
Homepage: http://www.octave.org/
Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git
Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git
Source Packages metadata
10 / 25
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <[email protected]>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
Binary Packages metadata
11 / 25
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <[email protected]>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
Binary Packages metadata
12 / 25
Debian Popcon: Tracking Installations
• Popularity: total
install counts
• Recent Use (< 30
days)
• Old Use (Beyond 30
days)
• Data collected daily
• Users voluntarily opt-
in
• Source of bias
13 / 25
Debian Bugs
• People find bugs in binary packages
• ~500 bugs per month
• But bugs are linked to source packages
• Bugs can be
• Accepted and solved in Debian
• Rejected
• Forwarded to upstream
• Everything else, similar to other bug tracking
systems
• Life cycle, comments, severity levels…
14 / 25
2. The UDD: what is it and
where to get it
15 / 25
Research work: main paper (at MSR 2010)
16 / 25
Other papers at MSR 2010
17 / 25
What is the UDD?
• PostgreSQL database with all the information of
the sources described so far
• http://udd.debian.org
• New dumps available every two days
• ~ 500 MB bz2
• Used for some Debian internal services
• Schema too complex and too big for a slide
• Technical detail: you need a Debian-based
system to load the dump of the UDD
18 / 25
Debian sources of data
• Sources / Packages
metadata
• Bugs
• including *all*
archived bugs
• 1995-96-97
• Carnivore
• Debtags
• Popularity Contest
• DEHS
• Lintian
• Migrations to testing
• Uploads
• All the way back to
1998!
• New packages queue
• Translations status
• Orphaned packages
• Screenshots
19 / 25
!
20 / 25
Bear in mind!
• You can also obtain the source code of the
packages
• Easy to automate
• And the modifications done by the Debian
maintainers
• So add product metrics to the set of data
sources
• But this is not included in the UDD
21 / 25
3. What has been done and
what we can do
22 / 25
What kind of questions does Debian solve with the
UDD?
• High priority packages that have Release
Candidate blocker bugs
• Developers with very buggy and/or outdated
packages
• Who uploaded this package to the unstable
release?
• Who reported the RC bugs since the last
release?
23 / 25
Some questions solved in the literature
• The popularity bias
• http://oa.upm.es/9585/
• Open source projects get more bug reports if
they are popular
• The actual number of bugs is not related to the
number of bugs reported
• So more bugs actually means more quality
• Well, at least more people who decide to use the
software
24 / 25
The popularity bias
Lo
g(B
ug
s)
Log(installations)
Required packages
25 / 25
Summary
• Packages and sources metadata
• And source code
• Bugs
• All the way back to 1995-96-97!
• Popularity contest
• Maintainers activity (uploads)
• All the way back to 1998!
• And much more….
• Now, what do you think we can do with this?