Download - Fast Feedback Cycles in Empirical Software Engineering Research: research and engineering challenges

Fast Feedback Cycles in Empirical Software Engineering Research

Dr. Antonio Vetro’ Technische Universität München, Germany

!24 February 2014

VU Amsterdam, Software and Services (S2) Research Group

@[email protected]

Research and Engineering challenges

mailto:[email protected]?subject=Fast%20Feedback%20Cycles%20in%20Empirical%20Software%20Engineering%20Research

Outline

• Feedback cycles

• Fast Feedback Cycles in Software Engineering Research

• What this talk is about: Empirical Software Engineering Research

!Feedback Cycles in

Empirical Software Engineering Research Fast

Empirical cycle

Source: Empirical Practice in Software Engineering, Andreas Jedlitschka, Liliana Guzmán, Jessica Jung, Constanza Lampasona, Silke Steinbach, in Perspectives on the Future of Software Engineering, 2013, pp 217-233

Big Picture, 3rd layer: Methods

Theory/System of theories

(Tentative)!Hypotheses

Observations / Evaluations

Study Population

Pattern Building

Falsification / Support

Theory Building

Formal / Conceptual Analysis

Grounded Theory

Confirmatory • Case & Field Studies • Experiments, • Simulations

Survey and Interview Research

For now, prototyping is not part of this “method

view” (so aren’t reference models)

• Ethnographic Studies• Folklore Gathering

Exploratory • Case & Field Studies• Data Analysis

Further reading: Vessey et alA unified classification system for research in the computing disciplines

Induction

Deduction

Traditional approach

• Benefits: – scientific tool for evaluation, validation, discovery – possible theory building



• Drawbacks – long, difficult knowledge generation and transfer to industry – results sensitive to context variables and time – it doesn’t fit the new paradigms of data streams – it doesn’t keep the pace of innovation – lack of flexibility (e.g., you cannot change study design after the fact)



: the need for speed

• Drawbacks – long, difficult knowledge generation and transfer to industry – results sensitive to context variables and time – it doesn’t fit the new paradigms of data streams – it doesn’t keep the pace of innovation – lack of flexibility (e.g., you cannot change study design after the fact)

Empirical Software Engineering 2.0

Empirical Software Engineering 2.0Software archives Data from any artefact Data always available Instantaneous results

Zeller A.

MSR 2007


Data mining on multiple sources Domain knowledge (case studies) Adaptive agents: mining + monitor + repair Local models !

Zeller A. Menzies T.

MSR 2007 ICSE 2011



Zeller A. Menzies T. Shull F.

Hybrid approach: manual hp testing + speed of mining Data driven decisions

MSR 2007 ICSE 2011 IEEE SW 2012



Zeller A. Menzies T. Shull F.

Hybrid approach: manual hp testing + speed of mining Data driven decisions

Shull F.

MSR 2007 ICSE 2011 IEEE SW 2012

Collaborative effort in hp testing process Iterative model building

IEEE SW 2013

The path towards EMSE 2.0


EMSE 1.0 !l Case studies

l Watch, don't touch

!l Experiments

l Vary a few conditions in a project

l Simple analyses lLittle ANOVA, regression, maybe a T-test


EMSE 1.0 !l Case studies

l Watch, don't touch

!l Experiments

l Vary a few conditions in a project

l Simple analyses lLittle ANOVA, regression, maybe a T-test

EMSE 2.0 !

l Data generators l Case studies l Experiments l Data streams

l Data analysis l 10K possible data miners

l Crowd-sourcing l 10K possible analysts

Adapted from Tim Menzies, Forrest Schull, “Empirical software Engineering 2.0”

What this talk is about




Study Population

Pattern Building


Theory Building

Formal / Conceptual Analysis

Grounded Theory

Confirmatory • Case & Field Studies • Experiments, • Simulations

Survey and Interview Research

For now, prototyping is not part of this “method

view” (so aren’t reference models)

• Ethnographic Studies• Folklore Gathering

Exploratory • Case & Field Studies• Data Analysis

Further reading: Vessey et al A unified classification system for research in the computing disciplines

What this talk is about




Study Population

Pattern Building


Theory Building

Further reading: Vessey et al A unified classification system for research in the computing disciplines

Exploratory • Data Analysis

Outline


• Feedback cycles


Feedback Cycles Empirical Software Engineering Research Fast in

Feedback Cycles Empirical Software Engineering Research

Fast in

Lean and Agile

Earned value in iterations

Source: Hakan Erdogmus. 2005. The Economic Impact of Learning and Flexibility on Process Decisions. IEEE Softw. 22, 6 (November 2005), 76-83. DOI=10.1109/MS.2005.165 http://dx.doi.org/10.1109/MS.2005.165

http://dx.doi.org/10.1109/MS.2005.165

Earned value in iterations

Source: Hakan Erdogmus. 2005. The Economic Impact of Learning and Flexibility on Process Decisions. IEEE Softw. 22, 6 (November 2005), 76-83. DOI=10.1109/MS.2005.165 http://dx.doi.org/10.1109/MS.2005.165

Our scope is: knowledge

http://dx.doi.org/10.1109/MS.2005.165

Big Data

In the web 2.0: Feedback Cycles + Big Data

In the web 2.0 : Feedback Cycles Big Data +

In the web 2.0 : Feedback CyclesBig Data +

Implies crowd sourcing

End, finally, back to Software Engineering

Big Data in SE: example from agile development


Stories Metrics from stories (e.g. , Req Smells) …


Story points Tasks Features Dependencies Metrics from stories Estimations …


Big Data in SE: example from agile developmentImplementation time Code Features from Code Metrics from Code Bugs Changes …




UATs




UATs



Customer A


UATs



Customer A

Feedback

Acceptance Motivations Improvements Bugs …

+


UATs

Discussion Problems in Sprints …



Customer A

Feedback


+


UATs





Customer A

Feedback


+


UATs






Customer A

Feedback


+


UATs







Customer A

Feedback


+


UATs



Even simple processes produce huge amount of information, interconnected each other





Customer A

Feedback


+


UATs



Even simple processes produce huge amount of information, interconnected each other





Customer A

Feedback


Even simple mechanisms can collect valuable feedback

+

Big Data + Feedback in SE : example from agile development


What to mine – data from any step (also from

past projects)



past projects)Implementation time Code Features from Code Metrics from Code Bugs Changes …


Discussion time Problems in Sprints (list) …UATs



past projects)

What we try to find – Indicators for problems

• Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems

– New patterns (relationships:“what”, not “why”)

Implementation time Code Features from Code Metrics from Code Bugs Changes …


Discussion time Problems in Sprints (list) …UATs



past projects)





Results are continuously visualised to stakeholders

We can check them and collect fast feedback


past projects)







Fast feedback enables:

- iterative local model building

- knowledge earned value as in lean approach

- input for follow up studies


past projects)







Fast feedback enables:

- iterative local model building

- knowledge earned value as in lean approach

- input for follow up studies

!Risks minimisation

Focus on value


past projects)




Outline


• Feedback cycles


Data sources

Automatic Data Analysis

Facts, hypotheses (Testable knowledge)

Experience base (Tested knowledge)

View, edit

Edit ?

Check

Produce

Stakeholders (implicit/explicit feedback)

Trigger

Tune

Is model for

Process to enable fast feedback cycles

Data sources




View, edit

Edit ?

Check

Produce


Trigger

Tune

Is model for

Process to enable fast feedback cyclesAny data collected from software development, execution, maintenance. Stream and snapshot data Also data from past studies is considered.

Data sources




View, edit

Edit ?

Check

Produce


Trigger

Tune

Is model for


Data mining techniques are applied to the data sources and used to reveal, strengthen or deny hypotheses

Data sources




View, edit

Edit ?

Check

Produce


Trigger

Tune

Is model for


Fact and hypotheses derived from data analyses

Data sources




View, edit

Edit ?

Check

Produce


Trigger

Tune

Is model for


Industry and research collaborators

Data sources




View, edit

Edit ?

Check

Produce


Trigger

Tune

Is model for


Collection of tested knowledge, well established theories in the field

Data sources




View, edit

Edit ?

Check

Produce


Trigger

Tune

Is model for




Edit ?

Check

Produce


Tune

Data sources

Feed

Data sources

FeedView, editIs model for

Data sources


Any data collected from software development, execution, maintenance. Stream and snapshot data. Also data from past studies is considered.

Integration of different datasets Guarantee high data quality Applicability of temporal abstractions

Objective Challenges



Data mining techniques are applied to the data sources and used to reveal, strengthen or deny hypotheses


Appropriate selection and tuning of techniques (also automatic and iterative) Appropriate response variable selection Incorporation of a priori knowledge and human feedback Exploration should take into account industry needs (usually short terms)



Meaningfulness of generated knowledge Consistency check (formal, semantic) with human expertise and experience base Representation issues ( See experience base)


Fact and hypotheses derived from data analyses

Facts not supported by statistical significance are not rejected


Representation of the information in a easily query-able, representable and modifiable way. Representation of uncertainty and soft constraints. Protection of sensible data.


Collection of tested knowledge, well established theories in the field



Value the two types of stakeholders Collect and give meaningful fast feedback Create and test pragmatical feedback mechanisms: e.g., shared dashboards, interactive visualisations, …


Industry and research collaborators

!l Books:

• Münch, Jürgen, Schmid, Klaus (Eds.) , Perspectives on the Future of Software Engineering Essays in Honor of Dieter Rombach, 2013, XVI, 366 p.r.

• Mayer-Schonberger, V., Cukier, K. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think . Boston: Houghton Mifflin Harcourt. ISBN: 0544002695 9780544002692

• Ries, Eric. 2011. The lean startup: how today's entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business.

• Biffl, S., Aurum, A., Boehm, B., Erdogmus, H., Grünbacher, P. (Eds.) , Value-Based Software Engineering, 2006, XXII, 388 p. 69 ILLUS !!l Articles:

• Hassan, A.E.; Hindle, A.; Runeson, P.; Shepperd, M.; Devanbu, P.; Sunghun Kim, Roundtable: What's Next in Software Analytics, Software, IEEE , vol.30, no.4, pp.53,56, July-Aug. 2013

• Forrest Shull: Getting an Intuition for Big Data. IEEE Software 30(4): 3-6 (2013) • J. Münch, F. Fagerholm, P. Johnson, J. Pirttilahti, J. Torkkel, and J. Järvinen, “Creating Minimum Viable Products in Industry-

Academia Collaborations,” in Proceedings of the Lean Enterprise Software and Systems Conference (LESS 2013), Galway, Ireland, 2013.

• Forrest Shull: Research 2.0? IEEE Software 29(6): 4-8 (2012)

• Dag I. K. Sjøberg, Tore Dybå, Bente C. D. Anda, and Jo E. Hannay, Building theories in software engineering. In Forrest Shull, Janice Singer, and Dag I. K. Sjøberg, editors, Guide to Advanced Empirical, Software Engineering, chapter 12, pages 312–336. Springer London, London, 2008.

• Victor R. Basili, Jens Heidrich, Mikael Lindvall, Jürgen Münch, Myrna Regardie, and Adam Trendowicz. ,GQM+ Strategies - Aligning Business Strategies with Software Measurement., ESEM, page 488-490. IEEE Computer Society, (2007) !

l Presentations: Tim Menzies, Forrest Shull , 2011, Empirical Software Engineering 2.0, http://www.slideshare.net/timmenzies/empirical-software-engineering-v20 Thomas Zimmerman, ICSM 2010, Analytics for Software Development , http://www.slideshare.net/tom.zimmermann/analytics-for-software-development Andreas Zeller, MSR 2007, Empirical Software Engineering 2.0: How mining software repositories changes the game for empirical software engineering research, http://msr.uwaterloo.ca/msr2007/Empirical-SE-2.0-Zeller.pdf

Some useful and inspiring references

http://www.bibsonomy.org/author/Basili

http://www.bibsonomy.org/author/Heidrich

http://www.bibsonomy.org/author/Lindvall

http://www.bibsonomy.org/author/M%C3%BCnch

http://www.bibsonomy.org/author/Regardie

http://www.bibsonomy.org/author/Trendowicz

http://www.bibsonomy.org/bibtex/696e1e69d596b41138d8cc7f003635c5

http://www.slideshare.net/timmenzies/empirical-software-engineering-v20

http://www.slideshare.net/tom.zimmermann/analytics-for-software-development

http://msr.uwaterloo.ca/msr2007/Empirical-SE-2.0-Zeller.pdf

Acknowledgements

Thanks for feedback, reviews, writing, and even listening to me :), to : !Daniel Méndez Fernández, Manfred Broy, Florian Grigoleit, Henning Femmer, Peter Struss, Jacob Mund, Benedikt Hauptmann, Andreas Vogelsang !!Stefan Wagner !!Forrest Shull !!!Davide Falessi !!Andreas Jedlitschka and Jens Heidrich !!!

Fast Feedback Cycles in Empirical Software Engineering Research

Dr. Antonio Vetro’ Technische Universität München,

!24 February 2014

VU Amsterdam, Software and Services (S2) Research Group

@[email protected]


THANKS!

mailto:[email protected]?subject=Fast%20Feedback%20Cycles%20in%20Empirical%20Software%20Engineering%20Research