Towards Understanding SE Experiments Replication (ESEM'13 Keynote)
-
Upload
natalia-juristo -
Category
Technology
-
view
148 -
download
1
description
Transcript of Towards Understanding SE Experiments Replication (ESEM'13 Keynote)
Towards Understanding
the Replication of
SE Experiments
Natalia JuristoUniversidad Politecnica de Madrid (Spain)
&University of Oulu (Finland)
ESEM Conference Baltimore (USA) October 11th, 2013
Scope & Terminology
This talk focuses on the replication of
experiments
I will refer to the study whose results we
want to check as the baseline experiment
Replication Intuitive Definition
Deliberate repetition of research
procedures in a second investigation for
the purpose of determining if earlier
results can be reproduced
Content
Replication & the experimental paradigm
State of replication in ESE: practice & theory
Shedding a bit of light Purposes for replicating
Replication functions
Some answers Replication limits
Baseline and replication minimum degree of similarity
Admissible changes
Reproduction of results
Threats to reuse materials
Summary
Role of Replication in
Experimental Paradigm
Searching for Regularities
Science does not settle for anecdotes. A
scientific law or theory describes a regular
occurrence in the world
Regularities existing in reality are identified by
reproducing the same event in different
replications
The result of one experiment is an isolated
event
One Result, Three Meanings
Without reproduction of results it is impossible
to distinguish whether they
occurred by chance
are artifactual
the event occurs only in the experiment, not in reality
really correspond to a regularity
State of Replication in ESE
Practice
Not Enough Replications
Most SE experiments have not yet been
replicated
Two reviews provide empirical data to
support this point
Let us look at their results
Experiments in Leading Journals
& Conferences (1993-2002)
5,453 articles published from 1993 to 2002
in major SE journals and conference
proceedings
113 experiments
20 (17.7%) described as replications
Sjøberg et al. “A survey of
controlled experiments in SE” TSE, 2005
All Publications
96 papers reporting replications
133 replications of 72 baseline studies
Any type of empirical study Quasi-Experiments 35 49%
Controlled Experiments 21 29%
Case Study 15 21%
Survey 1 1%
da Silva et al. “Replication of Empirical Studies in
Software Engineering Research: A Systematic
Mapping Study” EMSE 2013
Mostly Internal & Not Stand-Alone
Publications [Da Silva et al. 2013]
Internal – Original-Included reports
Internal – Replication-only reports
1994 1995 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
0
0 0 2 1 1 1 3 1 3 6 1 3 5 2 4 1
3 0 0 3 1 2 2 1 3 11 7 5 2 6 14
1996
0
0
Nu
mb
er o
f R
ep
lica
tio
ns
2
4
6
8
10
12
14
16
Total 1 5 6 4 5 6 6 4 4 9 15 11 13 9 13 220
18
20
22
External 1 2 4 3 1 4 1 1 0 0 3 1 3 5 3 70
First Paper Published on a
Replication [Da Silva et al. 2012]
In SE, the first article that explicitly reported
a replication of an empirical study was
published in 1994 Daly, Brooks, Miller, Roper & Wood
Verification of Results in Sw Maintenance Through
External Replication
Intl Conf on Software Maintenance
EMSE Special Issue on
Replication
The large number of submissions was
admittedly more than we expected
We received a total of 16 submissions
Encouraging the publication of replications
will foster researchers to replicate more
studies
State of Replication in ESE
Theory
First Theoretical Publication on
Replication
In 1999, a paper discussed a framework to
organize sets of related experiments
(families) and the generation of knowledge
from such sets Basili, Shull & Lanubile. Building knowledge through
families of experiments. TSE
Is a family exactly a set of replications? …experiments can be viewed as part of common
families of studies, rather than being isolated
events…
More Activity in the Last 10 Years Shull, Basili, Carver, Maldonado, Travassos, Mendonça & Fabbri Replicating
software engineering experiments: Addressing the tacit knowledge problem.
ISESE 2002
Vegas, Juristo, Moreno, Solari & Letelier Analysis of the influence of
communication between researchers on experiment replication. ISESE 2006
Brooks, Roper, Wood, Daly & Miller Replication’s role in software engineering.
Guide to Advanced Empirical SE. Springer 2008
Juristo & Vegas Using differences among replications of software engineering
experiments to gain knowledge. ESEM 2009 [Juristo & Vegas The Role of Non-
Exact Replications in SE EMSE Journal 2011]
Krein & Knutson A Case for replication: Synthesizing research methodologies in
SE. RESER 2010
Gómez, Juristo & Vegas Replications types in experimental disciplines. ESEM
2010
Juristo, Vegas, Solari, Abrahao & Ramos A Process for Managing Interaction
between Experimenters to Get Useful Similar Replications IST 2013
State of the Theory
There is no agreement yet on terminology,
typology, purposes, operation and other
replication issues
There is not even agreement on what a
replication is!!
Different authors consider different types of
changes to the baseline experiment as
admissible
Example of Divergent Views
Some researchers advise the use of different protocols
and materials to preserve independence and prevent
error propagation in replications by using the same
configuration
Kitchenham
The role of replications in ESE - a word of warning EMSE 2008
Other researchers recommend the reuse of materials to
assure that replications are similar enough for results to
be comparable
Shull, Carver, Vegas & Juristo
The role of replications in ESE EMSE 2008
Shedding some Light on
Replication
Role of Replication
The Two Roles of Replication
Validation
Learning
Learning Relevant Conditions
As more replications of Thompson and
McConnell’s baseline experiment were run
different conditions influencing the results of this
experiment were identified
After several hundred experiments had been run
experimenters managed to identify around 70
conditions influencing the behavior of this type of
invertebrate
Which are the Important
Variables?
“…In fact, the principle of Transversely
Excited Atmospheric (TEA) lasers,
scientists did not know that the inductance
of the top was important”
A physicist quoted in
Changing Order: Replication and Induction in Scientific Practice
Harry Collins 1992
First Learn, Then Validate
“In the early stages, failure to get the expected
results is not falsification but a step in the
discovery of some interfering factor.
For immature experimental knowledge, the first
step is … to find out which experimental
conditions should be controlled”
Validity and the Research Process
Brinberg and McGrath 1985
SE Problems with Identical
Replications
SE has tried to repeat experiments identically, but no exact replications have yet been achieved
The complexity of the software development setting prevents the many experimental conditions from being reproduced identically
Yet this is a regular rather than an exceptional situation
In the Beginning Most is Unkown
“Most aspects are unknown when we start
to study a phenomenon experimentally.
Even the tiniest change in a replication
can lead to inexplicable differences in the
results”
Validity and the Research Process
Brinberg and McGrath 1985
Start with Similar Replications
“The less that is known about an area the more
power a very similar experiment has ... This is
because, in the absence of a well worked out set
of crucial variables, any change in the
experiment configuration, however trivial in
appearance, may well entail invisible but
significant changes in conditions”
Changing Order:
Replication and Induction in Scientific Practice
Harry Collins 1992
Learning & Validation Process
1. Start with identical replications At the beginning of experimental research, equality, even if
targeted, will not happen
There will be either invisible but significant changes in conditions or
induced changes due to context adaptation or both
Failure to get the expected results should not be construed as
falsification, but as a step towards the discovery of some new
factor
2. Later on, both knowledge discovery and testing
can be more systematic Changes in the configuration will be made purposely to learn
more variables and rule out artifactual results
Learning is Even More Important
Replication is needed not merely to
validate one’s findings, but more
importantly, to establish the increasing
range of radically different conditions
under which the findings hold, and the
predictable exceptions
The design of replicated studies. American Statistician
Lindsay and Ehrenberg 1993
Shedding some Light on
Replication
Functions of Replication
Reminder of Experimental Setting
Operationalization Treatments
Response variable
Protocol Experimental design
Experimental objects
Guides
Measuring instruments
Data analysis techniques
Population Objects
Subjects
Verification Functions
Control experimental errors
Control protocol independence
Understand operationalization limits
Understand population limits
Control Experimental Errors
Verify that the results of the baseline experiment
are not a chance product of an error
All elements of the experiment must resemble
the baseline experiment as closely as possible
Collateral benefit
Provide an understanding of the natural (random)
variation of the observed results
critical for being able to decide whether or not results hold in
dissimilar replications
Control Protocol Independence
Verify that the results of the baseline experiment
are not artifactual
An artifactual result is due to the experimental
configuration and cannot be guaranteed to exist in
reality
The experimental protocol needs to be changed
for this purpose
If an experiment is replicated several times using the
same materials, the observed results may occur due
to the materials
The same applies for all protocol elements
Understanding Operationalization
Limits
Learn how sensitive results are to different
operationalizations
Treatment operationalizations
treatment application procedures, treatment
instructions, resources, treatment
transmission …
Effect operationalizations
Metrics, measurement procedures …
Understand Population Limits
Learn the extent to which results hold for
other subject types or other types of
experimental objects
Learn to which specific population the
experimental sample belongs and what
the characteristics of such population are
Changes & Replication Functions
Experimental
Configuration
Control
Experimental
Error
Control Protocol
Independence
Understand
Operationalization
Limits
Understand
Population Limits
Operationalization = = ≠ =
Population = = = ≠
Protocol = ≠ = =
Function of Replication
LEGEND: = the element is equal to, or as similar as possible to, the baseline experiment
≠ the element varies with respect to the baseline experiment
Some Answers
Question 1
What exactly is a replication
study?
Establishing Limits
Amount of Changes
Run
Same data
different
models or
statistical
methods
Identical
same site &
researchers
+
Protocol
changes
Operationa-
lization
changes
Population
changesJust the
hypothesis
kept
RE-ANALYSIS REPETITION REPLICATION REPRODUCTION
Not Run
Question 2
What level of similarity should an
experiment have to be
considered a replication rather
than a new experiment?
Unchanged elements of a
replication
A replication must share the hypothesis
with the baseline experiment
Same response variable
although not same metric
Same treatments
although not same operationalization
at least two treatments in common
Partials Replications
Exp. A (Baseline OR Replication)
Exp. B (Replication OR Baseline)
Exp. D
Replication C
RV3RV2
RV1
T5
T4
T3
T2
T6
T1
Question 3
What changes are acceptable?
Levels of Verification
Similarity between the baseline experiment and a replication serve different verification purposes depending on the changes made Replicating an experiment as closely as possible
Verifies results are not accidental
Varying the experimental protocol
Verifies results are not artifactual
Varying the population properties
Verifies types of populations for which the results hold
Varying the operationalization
Verifies range of operationalizations for which the results hold
Can I Change Everything?
It is better not to change everything at the
same time
We can understand the source of differences
in results better if only one change is made at
a time
But a replication with a lot of changes is
not rendered useless or doomed to failure
We will just need to wait until other
replications are run
Everything Different
Changes
Different
Identical
Experiment
Elements
One Change at a Time
Changes
Different
Iqual
Experiment
Elements
Increasing Changes
Changes
Different
Iqual
Experiment
Elements
Question 4
What is the level of similarity that
results must have to be
considered as reproduced?
Understanding the Natural
Variation
Identical replications are useful for
understanding the range of variability of
results
This provides an estimate for other
experimenters to use as a baseline when
they replicate the experiment
Question 5
Should replications reuse
materials?
Accomodating Opposing Views
The possible threat of errors being propagated
by experimenters exchanging materials does not
mean discarding replications sharing materials
Replications with identical materials and protocols
(and possibly the same errors) are a necessary step
for verifying that an exact replication run by others
reproduces the same results
Other replications that alter the design and other
protocol details should be performed in order to
assure that the results are not induced by the protocol
Accomodating Opposing Views
Replication functions accommodate
opposing views within a broader
framework
Contrary stances are really tantamount to
different types of replication conducted for
different purposes
Different ways of running replications are
useful for gradually advancing towards
verified experimental results
Summarizing
Main Ideas
Replication in ESE
Replication plays an essential role in the experimental paradigm
Replication is not a regular practice in ESE today
More methodological research on the adoption and tailoring of
replication in ESE is still necessary
Clarifying conceptions
Replication is necessary not merely to validate findings, but,
more importantly, to discover the range of conditions under
which the findings hold
Replications provide different knowledge depending on the
changes to the baseline experiment
Knowledge gained from a replication needs to relate changes
and findings
Towards Understanding
the Replication of
SE Experiments
Natalia JuristoUniversidad Politecnica de Madrid (Spain)
&University of Oulu (Finland)
ESEM Conference Baltimore (USA) October 11th, 2013