GoOpen 2010: Roger Bivand
-
Upload
friprogsenteret -
Category
Documents
-
view
2.800 -
download
0
description
Transcript of GoOpen 2010: Roger Bivand
ExperienceR projectR spatial
Managing research in collaborative networks
Open Source software, research and highereducation: a practitioner’s view
GoOpen 2010 (Fou thread), Aker Brygge, Oslo, 19–20 April.
Roger Bivand
Department of EconomicsNorwegian School of Economics and Business Administration
Bergen, Norway
20 April 2010
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Outline
This talk will examine how open source software developmentand use may interact with their institutional contexts inresearch and higher education
The talk will be based on experience of open sourcedevelopment in applied statistics and geospatial applications
Reasons for mismatch between an institutional contextpreferring secrecy when applying for funding, restricteddeliverables, and races to publication, and the ways in whichopen source development occur will be discussed
In particular, the roles of mutual trust and community-buildingin open source development will be stressed; these factorsappear to express externalities between developers and usersof software that are neglected in the exclusive managementmodels prevalent in research and higher education
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Contextual background
In order to provide some justifications for presenting a“practitioners view”, some background information beyond myaffiliation may be useful
Although employed in the Department of Economics atNorges Handelshøyskole, I am an academic geographer,educated in Cambridge, and the London School of Economics
My specialities within geography are in quantitative methodsand geographical information systems, and have used anddeveloped software since 1973, for research and teaching
During the EU 5th Framework, I was involved in theevaluation of three open source Information SocietyTechnologies (IST) calls; I also founded the MBA programmesat Warsaw University of Technology in 1991/92
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Little languages
My first“open source”publication was an extra module for theproprietary program Systat, with both source code and DOS binariesavailable for FTP download, and an accompanying paper inComputers & Geosciences in 1992
While much early software (Fortran, later C) was compiled (I onlyhad limited exposure to BASIC), by the 1980s little languages,generally interpreted, began to appear as glue for compiled programs
The languages covered in two of my papers published in 1996 and1997 were the Unix shell scripting language and AWK, used as gluefor the GRASS GIS, and for GMT for map production; I have beenusing Unix/Linux since 1985
In these papers and other work in the mid 1990s, I pointed up thebenefits of scripting in permitting work to be reproduced andaudited, contrasted with non-journalling GUIs that were becomingprevalent in academic practice
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Glimpse from 1997
Here is a slide from a talk given in Italy about software forhandling geographical information (GI) in early 1997:
MAPPING GI USERS:
PRODUCTION: high training costs,
application specific macro languages,
CASUAL: generic likeness to
familiar GUI, looks & behaves
like Excel or Netscape (cf.
plug-ins)
PROFESSIONALS: as consultants customising
GI handling technologies for clients in long/
medium term relationships; as researchers
in GI handling technologies
few linking requirements (cf. COTS)
CURIOUS: as researchers analysing
geographic information; as citizens
challenging the use of GI by private
companies and public administration
STANDARDISED TASKSMORE LESS
MORELESS NEED OPEN SOFTWARE
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Using the R project
My first message to the R project was in mid January 1997, asI had begun using early alpha releases to re-implement anumber of spatial analysis functions
The initial motivation to systematise code for functions forspatial data analysis was for a course given in the University ofBergen Department of Geography; we were a joint departmentuntil administrative changes split us
By 1998, Albrecht Gebhardt (Klagenfurt, Austria) and I hadprovided code for most simple spatial data analysis for R,either porting existing code, or writing fresh contributions(presentation at a congress in Vienna)
But what is the R project?
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
www.r-project.org
While its website is non-candy, R is becoming a central resource forstatistical and computational data analysis across the sciences andin business:
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
The R project
R is a language and environment for statistical computing andgraphics — it is a GNU project which is similar to the S languageand environment which was developed at Bell Laboratories (formerlyAT&T, now Alcatel–Lucent) by John Chambers and colleagues
R can be considered as a different implementation of S. There aresome important differences, but much code written for S runsunaltered under R
The term“environment” is intended to characterize it as a fullyplanned and coherent system, rather than an incremental accretionof very specific and inflexible tools, as is frequently the case withother data analysis software
Many users think of R as a statistics system. We prefer to think ofit of an environment within which statistical techniques areimplemented — R can be extended (easily) via packages
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
The R foundation
The R project began as an academic
initiative with no funding in Auckland,
New Zealand, and was licensed under
GPL as more collaborators joined. This
group was strengthed by academic
contributors to S, who began to work
with R in the late 1990s. By 2002, a
more formal structure was needed, and a
foundation was formed. I was invited to
join as an ordinary member in March
2003, so have seen things“from the
kitchen” since then.
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
The R community
While the software system was intended
to be“fully planned and coherent”, the
community that has grown up around R
is neither planned nor coherent. Since
1997, there have been two main mailing
lists, one for users, the other for
developers. John Fox (another non-core
ordinary foundation member) has
described the social structure of the
project in a recent paper in the R
Journal, from which this graph is taken:
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
CRAN and contributed packages
The community has also grown thanks
to the ease with which packages may be
contributed. Both writing packages, and
their formal checking against R are not
hard — the check process executes all
the examples on the help pages and
other documentation. The
comprehensive R archive network
(CRAN) thus distributes R itself (source
and binaries for multiple platforms) and
packages (source and binaries), and
packages may also be installed and
updated from within R.
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
CRAN
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
CRAN task views
Since so many packages have been
contributed to R, and distributed
through CRAN, it became necessary to
provide a mechanism for guiding users
towards solutions to their problems. It is
helpful to see the complexity of CRAN
as an advantage, with“ecologically”more
fit packages establishing themselves in
“niches”possibly even in competition
with other packages providing similar
facilities. Task views have been added as
a light-weight non-authoritative way of
offering suggestions:
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
R Forge
In addition to CRAN running the
released, patched, and development
versions of R on the CRAN packages’
examples nightly, packages may also be
hosted on the R Forge repository. This
provides the usual *forge services, such
as SVN, but also builds Windows and
OSX binary packages, and checks
package source on multiple platforms
nightly. So even alpha or beta packages
may be made available, and may begin
to harvest user input, before being
released to CRAN:
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
R spatial
In 1999 I had interfaced R and the open source GIS GRASS, andpresented a paper on this at a Scandinavian GIS meeting — thepaper was rejected by Norsk Geografisk Tidsskrift, but published inextended form in Computers & Geosciences in 2000
This, and the publication of a paper based on my 1998 presentationwith Albrecht Gebhardt in Journal of Geographical Systems, and apresentation with Markus Neteler, the lead GRASS developer at the2000 GeoComputation conference, led to closer personal contactswith R core
Kurt Hornik, who runs CRAN, encouraged me to talk about R andGIS at the March 2001 Distributed Statistical Computing meeting inVienna, at which I got to know active developers personally
By the next DSC meeting in March 2003, I was organising athematic session on spatial statistics, and a crucial fringe developers’workshop to discuss how to advance spatial data analysis in R
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
CRAN Spatial task view
Since 2003, a number of
community-building steps have been
made over and above developing
contributed packages. From the CRAN
side, the Spatial task view is the hub, to
which traffic is channelled to package
pages and to ancilliary websites, as well
as the special interest group mailing list.
Some package authors contact me to
ask to be included, others are asked
whether they want to be added to the
web of information
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
R-sig-geo mailing list
Following the 2003 workshop, we
started a project on Sourceforge to
permit joint development, and a mailing
list served within the family of R lists
from Zurich. Traffic on the list has
grown steadily, with a subscribed
membership in April 2010 of over 1600.
Naturally, many of these“lurk”without
posting, while others post without
helping, and many fewer help by
answering posted questions. This final
group is however growing, and since the
list archives are also kept on Nabble,
they are easy to search for information.
●●●
●●
●●
●●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
050
100
150
200
250
300
monthly number of emails on r−sig−geo
# of
em
ails
2004 2005 2006 2007 2008 2009 2010
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
The sp package
In 2003, we agreed that a shared system of
new-style classes to contain spatial data would
permit many-to-one and on-to-many conversion
of representations, avoiding the then prevalent
many-to-many conversion problem. The idea
was to make it easier for GIS people and stats
people to work together by creating objects that
“looked” familiar to both groups, although the
groups differ a lot in how they“see”data
objects. Package dependencies have grown, here
the upper diagram shows packages depending
on sp in April 2008, the lower diagram in April
2010:
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
R Wiki
In addition to the“coordinated”
information sources, a community Wiki
does exist. While it seems to suit some
users, the general impression (among
older people?) is that there is little
feeling of responsibility for following up
tips given there. On the mailing list and
its archive, usually experienced
developers or users will clarify
misunderstandings, while on the Wiki,
posters do not feel obliged to update
their contributions, as when examples
stop working (they are not run ever,
unlike CRAN package examples):
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Spatial on R Forge
R Forge is used actively by individuals
and groups in developing packages for
spatial data analysis, with 52 projects
registered in April 2010. Some projects
are registered in more than one topical
area, some may never mature, but some
are already in active use; the raster
package is already frequently discussed
on R-sig-geo — it was released to CRAN
in late March 2010 after a gestation of
16 months.
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Book website
Finally, I’ll mention a book that I wrote
with Edzer Pebesma and Virgilio
Gomez-Rubio, and published in the
Springer useR series in 2008. Not only
does the book seem to be doing OK, but
the website with dataset and code
download is visited frequently (450–600
unique visitors per month). The code is
run nightly against current R and the
various required contributed packages. It
may be of interest to note that the text
was written using the literate
programming tool Sweave in R, which is
designed to support reproducible
research (as indeed is this talk).
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Managing research in higher education
While the links between the knowledge economy and OpenSource software are evident, there are very real challenges tothe management of research and higher education in policyterms that need to be addressedMost research and higher education organisations have beenrationalised and subjected to the styles of managementpractices introduced in commercial corporations years andeven decades agoIn particular, budget discipline is a favoured tool in attemptingto point organisational units in directions seen as beingappropriateGiven that these organisations clearly face a“missing market”,in that neither potential students nor grant-giving bodies areanalogues of customers in a fast-food restaurant, thoseresponsible for management have a measurement problem
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Grant processes
Universities and research institutions appear to“compete” ingrant processes, and thereby seem to have an interest inlocking potential competitors out, by securing privilegedaccess to knowledge
While such advantage may be quite real in the case oflaboratory skills and quality — the institution does deliverservices of higher quality, or when the institution has securedthe services of high-flying academics — this model is notdirectly transferable to software
Given the steadily increasing importance of software inteaching and research, it seems clear that care is needed inconstructing management tools for activities which mayproduce or modify software (see the UEA“climategate”scandal)
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Software deliverables
It does make sense for institutions to develop expertise incustomising software, in training, and in publishing materialsof benefit to software users on a for-profit basis
It does not in general, however, make sense to mandate sourceclosure in research programs or projects, in the same way thatmandating openness might be mistaken
The question as to whether software deliverables, or softwaredeveloped in the process of creating deliverables should beopened is one that is relevant in all grant processes
It is also highly relevant in evaluation routines associated withprogram and project execution
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Handling software in research projects
In grant awarding and evaluation processes, the grant-makingbody should consider at least two factors: the importance ofOpen Source for enhanced efficiency in providing the softwareneeded in a project, and the importance of reproducibility andpeer-review in the scientific process generally
It can thus be argued that the management of the boundarybetween what the institution“owns”, what can sensibly becommercialised on a for-profit basis, and research productivityand efficiency deserves attention
Otherwise, naive and rather outdated management practisescan endanger research quality and productivity with regard tosoftware innovation and incremental improvement by seeingproducts where one should see services
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Software, research and higher education
There are clearly cases in which source code should not beopened, although when the public purse has funded theresearch involves, the number of real cases will in practice bevery few, even for projects with very small communities ofinterest
It is of importance for the enabling, for the empowering ofactors in the knowledge economy, that unnecessary barriers tothe diffusion of knowledge be removed, and that new ones notbe permitted to emerge
As a corollary, researchers should perhaps be given incentivesin career terms to contribute to the pool of knowledge byopening source code, and by contributing to the improvementof software in their domain of science, in the same way thatpublications are rewarded
Roger Bivand A practitioner’s view
ExperienceR projectR spatial
Managing research in collaborative networks
Round-up
As far as I am aware, no research council has played anyrelevant role in the progress of the R project directly
Indirectly, research council funded projects have includedsoftware deliverables defined as contributed packages,including spatial packages (but none that I have handled)
Even more indirectly, people in research council fundeddoctoral and post-doctoral positions have not only used R andR spatial, but have contributed to software development, eventhough this was not required or mentioned in their projects
Finally, the diffuseness and unpredictability of collaborativenetworks of“amator”developers makes it very hard to reply tocalls; if a research council wanted to be pro-active, it mightfund travel for active developers to enable them to meet, orsimilar enabling measures
Roger Bivand A practitioner’s view