References

1

Experimentation in Computer Science and Software Engineering

Kavi KhedoSenior Lecturer

Department of Computer Science and Engineering Faculty of EngineeringUniversity of Mauritius

[email protected]://khedo.wordpress.com

mailto:[email protected]

http://khedo.wordpress.com/

2

References

Tichy, W.F., “Should Computer Scientists Experiment More ?”, IEEE Computer, May 1998

Zelkowitz, M.V, and Wallace, D.R., “Experimental Models for Validating Technology”, IEEE Computer, May 1998.

3

Outline

Nature of computingWhy experiment?Methods of experimentationIssues and possible approachesLooking aheadConclusion

4

Nature of Computing

Science or engineering? Computers and programs are human creations. CS not a natural science in the traditional sense.

Computers and software Subject of enquiry not just technical issues But models of information and information

processes.

5

Computer Science

“A science is any discipline in which the fool of this generation can go beyond the point reached by the genius of the last generation.”

Max GluckmanComputer science is a young and constantly

evolving discipline. It is therefore viewed in different ways by different people, leading to different perceptions of whether it is a “science” at all.

6

Modeling information processes

Are information processes artificial? Where and how do they occur?

Computer models compare poorly with information processes found in nature. e.g., nervous systems, immune systems, genetic

processes, brains of programmers and users, etc.

7

Why experiment ?Experiments don’t prove a thing !

View of mathematicians No amount of experimentation provides proof with absolute

certainty Show presence of errors but not their absence A theory can be shot down by contrary evidence

Test theoretical predictions against reality A theory gets accepted if all known facts in its domain can be

deduced from it and are verified by experiments e.g., astrophysics

8

Why experiment ?

Example of a failed theory: Failure probability of multi-version programs is

the product of the failure probabilities of individual versions.

Experiments by Knight and Leveson showed significantly higher failure than predicted.

False assumption detected by experiment: faults in program versions are statistically independent.

9

Why experiment ?

Another example: Artificial neural networks originally discarded

on theoretical grounds. Experiments showed properties better than

predicted. Now researchers have developed better theories

to explain what is observed.

10

Benefits of experimentation

Help build reliable base of knowledge. reduce uncertainty about adequacy of theories,

methods and tools.

Lead to new, useful and unexpected insights. open new areas of investigation.

Accelerate progress by eliminating fruitless approaches, erroneous assumptions and fads.

11

How to experiment

General categories of experiments: Scientific method. Engineering method. Empirical method.

12

Scientific method

Develop a theory to explain a phenomenon. Propose a hypothesis and test alternative

variations of it. Collect data to verify or refute claims of the

hypothesis.

13

Engineering method

Develop and test a solution to a hypothesis.Based on results of the test, improve the

solution. Iterate until no further improvement needed.

14

Empirical method

Statistical method proposed as a means to validate a hypothesis.

There may not be a formal model or theory describing the hypothesis.

Data collected to verify the hypothesis.

15

A comparison of the scientific method (on the left) with the role of experimentation in

system design (right).

16

Other important aspects

Replication Other researchers must be able to reproduce the

experiments.

Influence Impact of experimental design on the result.

Temporal properties Historical or current data? Is any required information missing?

17

Lack of validation in CS and SE

40% of papers requiring empirical evaluation had none. in a sample of 400 papers published by the ACM in

1993 50% in software related journals.

40-50% of SE papers found to be unvalidated. study by Zelkowitz and Wallace (Computer, May 1998)

Much smaller percentage in disciplines such as physics, psychology and anthropology.

18

Argument:Experiments do not prove anything.

Response:

True, experiments show only evidence for or against a theory, but cannot prove or disprove it.

However: experiments are used for theory testing, and

for exploration leading to theory development. Theory acceptance follows gradual community

acceptance as evidence accumulates (Note importance of repeatability)

19

Argument: Traditional scientific

method is not applicableResponse:

Applicability is identical, only the target object/subject changes

We’re dealing partly with human processes and

activities, these have clearly been amenable to experimentation in other disciplines

Likewise, encodings of processes (e.g. programs) can be investigated

20

Argument: The current level of experimentation is sufficient

Response:

Not when compared with other sciences• Tichy: 50% vs. 15% of unsupported claims• Zelkowitz/Wallace: 40% - 50% unvalidated papers

Note: Tichy is not advocating replacing theory and engineering by experiment, but advocating balance.

21

Argument: Experiments are

expensive Response:

So what!? Depends on the importance of the research questions,

some are clearly important enough. There’s a spectrum of experimental approaches differing

in cost from which to choose. Benchmarks could amortize costs. Other scientific disciplines accept this.

22

Cost of experiments

Require more resources than theory. So what ?

Example: A significant segment of software industry

switched from C to C++ at a substantial cost. No solid evidence to show that C++ is superior

to C for programmer productivity and software quality.

23

Benchmarks

A sample of the task domainEffective and affordable way to experimentWell-defined performance measurementsUsed in several areas:

Speech understanding, information retrieval, pattern recognition, data warehousing and OLAP, etc.

Help to eliminate unpromising approaches and exaggerated claims

24

Argument: Demos are sufficient

Demos provide proof-of-concepts in the engineering sense. Illustrate a potential, but depend on observers’ imagination and

extrapolation. Do not produce solid evidence. Not a substitute for the scientific process.

Satisfactory when presenting a radically new idea or a significant breakthrough. e.g., first compiler, time-sharing system, OO language, web

browser, etc.

Demos don’t investigate cause/effect, don’t provide (statistically) quantifiable results

25

Examples of questions for experimentation

Introduce theories of how requirements are refined into programs and test them.

Deeper understanding of what is intelligence.Quality of human computer interactions.Relative merits of parallel machine models

and algorithms.Behavior of algorithms on typical problems.

26

Argument: Too much noise (too

many variables to control)

Too many variables make experimentation hard.No more than in other fields, this is just lazinessHuman subjects experiments are particularly

difficult but other fields have developed many techniques for addressing these difficulties

Benchmarking can simplify many questions in CS.Benchmark development can help Composition of the benchmark is subjective, and so the

weakest link. Is the benchmark representative enough? Evolve over time to be close to what needs to be tested.

27

Argument: Progress will slow

(e.g. requiring experimentation with every paper will prevent ideas from emerging.)

We are wasting time by targeting unproductive research and development, productivity might actually improve given more experimentation.

There’s no reason for prohibiting conceptual papers and papers formulating new theories or hypotheses. (It’s a question of balance.)

28

Argument: Technology changes too

fastTechnology changes too fast, experiments are

nonrelevant by the time they’ve been completed.

Response:

Experiment focus is then too narrow Consider instead the bigger picture (e.g. fundamental

underlying questions, not ephemeral concerns.)

29

Argument: You’ll never get it

published.Response:

Can be true, especially when you run into reviewers who don’t understand empirical science!

But this has been changing. Still, a painful process of education in empirical research methods continues to be needed.

30

Potential Substitutes for Experimentation

Feature comparison Okay sometimes, but it isn’t science.

Intuition

There are plenty of examples of times when intuition has been wrong

Expert judgment Get real. Science is built on skepticism.

31

Concepts Vs Experiments

Rapid publication of novel concepts and new hypotheses is important.

But questionable ideas need to be weeded out by meaningful validation. Then scientists can concentrate on promising

approaches

Need for balance.

32

Problems with experiments

Unrealistic assumptions, manipulated dataFailure to provide details for repeating

experimentsResults over-interpreted, or do not

generaliseScientific process can self-correct errors,

hoaxes and even fraud.

33

CS as a harder scienceMost papers take small steps forward.Scientists should create models, formulate

hypotheses and test them using experiments.Competing theories: new theory replacing old lead

to paradigm shifts In physics, but not so evident in CS Physical symbol system theory Vs knowledge

processing theory in AI. A theory needed for behavior of algorithms on typical

problems.

34

ConclusionCS research used to rely far less on

experiments than most other disciplines.A good case exists for more

experimentation.Conventional scientific methods have made

CS a ‘hard’ science.Balance between theory, engineering and

experimentation needed.

References

Documents

Transcript of References