An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis
-
Upload
anna-glukhova -
Category
Technology
-
view
836 -
download
1
description
Transcript of An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 1
Layers
This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
An Adaptive Filter-Framework for the Quality Improvement of Open-Source
Software Analysis
Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany
Anna Hannemann, Michael Hackstein, Ralf Klamma, Matthias Jarke
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 2
Layers
Open Source Software Projects
Community-driven Development Voluntary participation Communication, project management and
development via Web tools Some successful and famous examples Smaller niche projects A long-tail of unsuccessful projects
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 3
Layers
Open Source Software Analysis for Software Engineering
Understand, model, simulate and organize community-driven development
Agile development practices Distributed and intercultural practices New success factors Long-term freely available datasets Low cost empirical studies
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 4
Layers
Open Source Software Analysis Research Results
Scacchi, “The Future Research in Free/Open Source Software Development”, 2010
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 5
Layers
Techniques for Knowledge Mining in
Development Repositories
Results are only as good as data is! Remember DNA Phantom?
“A hypothesized unknown female serial killer as a result of contaminated cotton swabs used for collecting DNA”
Mine Data not Noise! Cleaning of Artifacts from Communication and
Development Repositories Needed
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 6
Layers
Data Cleaning for Knowledge Mining in Development Repositories
Data-structure independence: variable artifacts types Additive filtering: filter only new data Filter nesting: sequence of arbitrary order Consistent data format: cross-medium analysis Consistent and easy-to-use interface Extensibility: continuous evolution Adaptive database insertion
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 7
Layers
Adaptive-Filtering Approach Cross-Media Mapping
Artifact types Mail Comment Post ... Cross-media mapping Assignment of semantic meaning to artifact elements Extensibility to new data sources Same filters for different data
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 8
Layers
Adaptive-Filtering Approach Filter Nesting
Sequence of filters F1, F2, …, FN
Results in same predefined format One filter – one cleaning (analysis) task Each filter triggers its predecessor Complex filter as a combination of several filters Filtering triggered on demand Filtering of a subset possible Simple filters first and than analysis of reduced data
set with more filters of higher complexity
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 9
Layers
Adaptive-Filtering Approach Multi-Threading
Only new data is filtered Asynchronous processing: filtered data subset is
provided directly to the next analysis task Synchronous processing: wait till the complete data
set is filtered
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 10
Layers
Dataset Reduction and Content Cleaning Filters
Dataset Reduction Filter (DRF) – Reduces amount of artifacts – Select artifacts, which fulfill certain criteria – Example – Spam detection – Artifact classification based on Bayes Decision Rule
Content Cleaning Filter (CRF) – Modifies content of artifacts – Example – Quotation Filter – Detection of predefined patterns in content
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 11
Layers
Artifact Transformation Filters Filter as analysis task Modifies artifact attributes Example:
– Core-Periphery Filter: Separates core of community from periphery
– Hierarchical clustering based on power law distribution
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 12
Layers
Validation in BioJava, Biopython and BioPerl OSS: Spam Detection
Spam and spammer level in mailing lists of OSS Significant amount (up to 60%) Non-monoton Distortion of dynamics
BioJava
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 13
Layers
Mood within project community Summarized sentiment of project Mails per month Positive sentiment of spam advertisement Incorrect sentiment assignment due to quotation
Validation in BioJava, Biopython and BioPerl OSS: Results Distortion
Year 2004, BioJava
Lehrstuhl Informatik 5 (Information Systems)
Prof. Dr. M. Jarke 14
Layers
Adaptive Filter-Framework and OSS Analysis
OSS Analysis for SE – Methods/metrics for knowledge mining in company
communication and development repositories – Understanding of community-oriented development:
principles, obstacles and advantages ! Data Cleaning: Results are only as good as data is! Adaptive Filter-Framework
– Significant noise level in data – Adaptable for any Web artifact format – Filter nesting – Filter as analysis method