An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

14
Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 1 Layers This slide deck is licensed under a Creative Commons Attribution- ShareAlike 3.0 Unported License . An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany Anna Hannemann, Michael Hackstein, Ralf Klamma, Matthias Jarke

description

Presented at Software Engineering Conference 2013 in Aachen

Transcript of An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Page 1: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 1

Layers

This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

An Adaptive Filter-Framework for the Quality Improvement of Open-Source

Software Analysis

Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany

Anna Hannemann, Michael Hackstein, Ralf Klamma, Matthias Jarke

Page 2: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 2

Layers

Open Source Software Projects

 Community-driven Development   Voluntary participation  Communication, project management and

development via Web tools   Some successful and famous examples   Smaller niche projects   A long-tail of unsuccessful projects

Page 3: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 3

Layers

Open Source Software Analysis for Software Engineering

 Understand, model, simulate and organize community-driven development

  Agile development practices  Distributed and intercultural practices  New success factors   Long-term freely available datasets   Low cost empirical studies

Page 4: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 4

Layers

Open Source Software Analysis Research Results

Scacchi, “The Future Research in Free/Open Source Software Development”, 2010

Page 5: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 5

Layers

Techniques for Knowledge Mining in

Development Repositories

 Results are only as good as data is!  Remember DNA Phantom?

“A hypothesized unknown female serial killer as a result of contaminated cotton swabs used for collecting DNA”

 Mine Data not Noise! Cleaning of Artifacts from Communication and

Development Repositories Needed

Page 6: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 6

Layers

Data Cleaning for Knowledge Mining in Development Repositories

 Data-structure independence: variable artifacts types   Additive filtering: filter only new data   Filter nesting: sequence of arbitrary order  Consistent data format: cross-medium analysis  Consistent and easy-to-use interface   Extensibility: continuous evolution   Adaptive database insertion

Page 7: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 7

Layers

Adaptive-Filtering Approach Cross-Media Mapping

Artifact types  Mail  Comment   Post   ... Cross-media mapping   Assignment of semantic meaning to artifact elements   Extensibility to new data sources   Same filters for different data

Page 8: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 8

Layers

Adaptive-Filtering Approach Filter Nesting

  Sequence of filters F1, F2, …, FN

 Results in same predefined format  One filter – one cleaning (analysis) task   Each filter triggers its predecessor  Complex filter as a combination of several filters   Filtering triggered on demand   Filtering of a subset possible   Simple filters first and than analysis of reduced data

set with more filters of higher complexity

Page 9: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 9

Layers

Adaptive-Filtering Approach Multi-Threading

 Only new data is filtered   Asynchronous processing: filtered data subset is

provided directly to the next analysis task   Synchronous processing: wait till the complete data

set is filtered

Page 10: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 10

Layers

Dataset Reduction and Content Cleaning Filters

 Dataset Reduction Filter (DRF) –  Reduces amount of artifacts –  Select artifacts, which fulfill certain criteria –  Example – Spam detection – Artifact classification based on Bayes Decision Rule

 Content Cleaning Filter (CRF) –  Modifies content of artifacts –  Example – Quotation Filter – Detection of predefined patterns in content

Page 11: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 11

Layers

Artifact Transformation Filters   Filter as analysis task  Modifies artifact attributes   Example:

–  Core-Periphery Filter: Separates core of community from periphery

–  Hierarchical clustering based on power law distribution

Page 12: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 12

Layers

Validation in BioJava, Biopython and BioPerl OSS: Spam Detection

Spam and spammer level in mailing lists of OSS   Significant amount (up to 60%)  Non-monoton  Distortion of dynamics

BioJava

Page 13: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 13

Layers

Mood within project community   Summarized sentiment of project Mails per month   Positive sentiment of spam advertisement   Incorrect sentiment assignment due to quotation

Validation in BioJava, Biopython and BioPerl OSS: Results Distortion

Year 2004, BioJava

Page 14: An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Lehrstuhl Informatik 5 (Information Systems)

Prof. Dr. M. Jarke 14

Layers

Adaptive Filter-Framework and OSS Analysis

 OSS Analysis for SE –  Methods/metrics for knowledge mining in company

communication and development repositories –  Understanding of community-oriented development:

principles, obstacles and advantages ! Data Cleaning: Results are only as good as data is!   Adaptive Filter-Framework

–  Significant noise level in data –  Adaptable for any Web artifact format –  Filter nesting –  Filter as analysis method