Research Article Understanding Open Source Software Evolution...

14
Research Article Understanding Open Source Software Evolution Using Fuzzy Data Mining Algorithm for Time Series Data Munish Saini, 1 Sandeep Mehmi, 2 and Kuljit Kaur Chahal 1 1 Department of Computer Science, Guru Nanak Dev University, Amritsar, India 2 Department of Computer Science, I.K.G. Punjab Technical University, Jalandhar, Punjab, India Correspondence should be addressed to Munish Saini; munish 1 [email protected] Received 6 June 2016; Accepted 20 July 2016 Academic Editor: G¨ ozde Ulutagay Copyright © 2016 Munish Saini et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Source code management systems (such as Concurrent Versions System (CVS), Subversion, and git) record changes to code repositories of open source soſtware projects. is study explores a fuzzy data mining algorithm for time series data to generate the association rules for evaluating the existing trend and regularity in the evolution of open source soſtware project. e idea to choose fuzzy data mining algorithm for time series data is due to the stochastic nature of the open source soſtware development process. Commit activity of an open source project indicates the activeness of its development community. An active development community is a strong contributor to the success of an open source project. erefore commit activity analysis along with the trend and regularity analysis for commit activity of open source soſtware project acts as an important indicator to the project managers and analyst regarding the evolutionary prospects of the project in the future. 1. Introduction Understanding soſtware evolution in general and open source soſtware (OSS) evolution in particular has been of wide interest in the recent past. A wide range of research studies have analysed OSS project evolution from different points of views such as growth [1], quality [2], and group dynamics [3]. However, there are a very few studies on commit activity in OSS projects. A commit is a change to a source code entity submitted by a developer through a source code management (SCM) system. SCM systems, such as Subversion (SVN) and git [4], manage the source code files of OSS systems and maintain log of each change (a.k.a. commit) made to the files. Committing is an important activity of the OSS development approach. Most of the OSS developers being volunteers, the success of these OSS projects is mainly determined by the committing activities of developers [5]. Commit activity indicates project activity which is further related to project success [6, 7]. Stakeholders of an OSS project, such as project managers, developers, and users, are interested in its future change behavior. Analysing the commit activity of an OSS project for finding trend and regularities in the evolution helps in indicating the future change behavior of the project and helps in decision making as far as project usage and management are concerned. e OSS development is a stochastic process. Unlike the traditional development in which the environment is controlled, OSS development is based on contributions from volunteers who could not be forced to work even if something is of high priority for the project [1]. Along with this unplanned activity, there is a lack of planned documentation related to requirements and detailed design [8]. Classical time series techniques are inappropriate for analysis and forecasting of the data which involves random variables [8, 9]. Fuzzy time series can work for domains which involve uncertainty. Commit activity of an OSS project is measured with the number of commits per month metric [5]. Kemerer and Slaughter [9] and Mockus and Votta [10] emphasize that the commits available in a SCM system (such as git [4]) can be used as a metric to study the evolution of OSS systems, and these studies motivate us too to choose number of commits per month as a metric to analyse and predict soſtware evolution. In this research work, the hybrid approach proposed by Chen et al. [11] (fuzzy theory along with data mining Hindawi Publishing Corporation Advances in Fuzzy Systems Volume 2016, Article ID 1479692, 13 pages http://dx.doi.org/10.1155/2016/1479692

Transcript of Research Article Understanding Open Source Software Evolution...

Research ArticleUnderstanding Open Source Software Evolution Using FuzzyData Mining Algorithm for Time Series Data

Munish Saini1 Sandeep Mehmi2 and Kuljit Kaur Chahal1

1Department of Computer Science Guru Nanak Dev University Amritsar India2Department of Computer Science IKG Punjab Technical University Jalandhar Punjab India

Correspondence should be addressed to Munish Saini munish 1 sainiyahoocoin

Received 6 June 2016 Accepted 20 July 2016

Academic Editor Gozde Ulutagay

Copyright copy 2016 Munish Saini et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Source code management systems (such as Concurrent Versions System (CVS) Subversion and git) record changes to coderepositories of open source software projects This study explores a fuzzy data mining algorithm for time series data to generatethe association rules for evaluating the existing trend and regularity in the evolution of open source software project The idea tochoose fuzzy data mining algorithm for time series data is due to the stochastic nature of the open source software developmentprocess Commit activity of an open source project indicates the activeness of its development community An active developmentcommunity is a strong contributor to the success of an open source projectTherefore commit activity analysis along with the trendand regularity analysis for commit activity of open source software project acts as an important indicator to the project managersand analyst regarding the evolutionary prospects of the project in the future

1 Introduction

Understanding software evolution in general and open sourcesoftware (OSS) evolution in particular has been of wideinterest in the recent past A wide range of research studieshave analysed OSS project evolution from different points ofviews such as growth [1] quality [2] and group dynamics [3]However there are a very few studies on commit activity inOSS projects A commit is a change to a source code entitysubmitted by a developer through a source codemanagement(SCM) system SCM systems such as Subversion (SVN) andgit [4] manage the source code files of OSS systems andmaintain log of each change (aka commit) made to the filesCommitting is an important activity of the OSS developmentapproach Most of the OSS developers being volunteersthe success of these OSS projects is mainly determined bythe committing activities of developers [5] Commit activityindicates project activity which is further related to projectsuccess [6 7] Stakeholders of an OSS project such as projectmanagers developers and users are interested in its futurechange behavior Analysing the commit activity of an OSSproject for finding trend and regularities in the evolutionhelps in indicating the future change behavior of the project

and helps in decision making as far as project usage andmanagement are concerned

The OSS development is a stochastic process Unlikethe traditional development in which the environment iscontrolled OSS development is based on contributions fromvolunteers who could not be forced towork even if somethingis of high priority for the project [1] Along with thisunplanned activity there is a lack of planned documentationrelated to requirements and detailed design [8] Classicaltime series techniques are inappropriate for analysis andforecasting of the data which involves random variables [89] Fuzzy time series can work for domains which involveuncertainty

Commit activity of an OSS project is measured with thenumber of commits per month metric [5] Kemerer andSlaughter [9] and Mockus and Votta [10] emphasize that thecommits available in a SCM system (such as git [4]) can beused as a metric to study the evolution of OSS systems andthese studies motivate us too to choose number of commitsper month as a metric to analyse and predict softwareevolution

In this research work the hybrid approach proposedby Chen et al [11] (fuzzy theory along with data mining

Hindawi Publishing CorporationAdvances in Fuzzy SystemsVolume 2016 Article ID 1479692 13 pageshttpdxdoiorg10115520161479692

2 Advances in Fuzzy Systems

algorithm) is used to generate linguistic rules from the timeseries data In [11] Chen et al specified the finding applicationfor their algorithm and validating it as the future work Thispresent study considers both of these issues as one of theobjectives In this proposed work we divide the consideredtime series dataset into two data subsets that is training setand remaining set We use the algorithm first to generatethe association rules from the training set and then validatethe accuracy and prediction capability of these rules on theremaining set The high prediction accuracy indicates thatthe commits in the remaining set have regularity and trend inthe number of commits performed for Eclipse CDT The lowprediction accuracy specifies that the commits on remainingset are not consistent with those of training set and there isirregularity and detrend in the commits performed

Main objective of the present study is to explore thefuzzy data mining approach for time series used to generatethe association rules for evaluating the existing trend andregularity in the evolution ofOSS projects As commit activityis a good indicator of continuous development activity ofan OSS project another objective of the present work isto develop a commit prediction model for OSS systemsProject managers developers and users can use the commitpredictionmodel to understand the future commit activity ofan OSS project and then plan schedule and allocate resourcesaccordingly as per their role

The rest of the paper is as follows Section 2 presentsthe related work to study software evolution and the usedfuzzy data mining technique Section 3 explains the researchmethodology Section 4 presents details of the experimentalsetup used for performing the study Section 5 gives theresults and analysis Threats to validity of the results areexplained in Section 6 Last section concludes the paper

2 Related Work

The idea to analyse and predict the software evolution wasseen initially in the late 1980s when Yuen published hispapers on the subject in a series of conferences on softwaremaintenance [12ndash14] He used time series analysis as atechnique for software evolution prediction Later severalstudies used time series analysis for predicting softwareevolution The software evolution metrics undertaken forprediction include the monthly number of changes [8 9]change requests [15 16] size and complexity [17 18] defects[19 20] clones [21] and maintenance effort [22]

Kemerer and Slaughter [9] looked at the evolution oftwo proprietary systems using two approaches one based onthe time series analysis (ARIMA) and the other based ona technique called sequence analysis They found ARIMAmodels inappropriate for analysis as the dataset was largelyrandom in nature Antoniol et al [21] presented an approachfor monitoring and predicting evolution of software clonesacross subsequent versions of a software system (mSQL)using time series analysis (ARIMA)

Caprio et al [17] used time series analysis to estimatesize and complexity of the Linux Kernel They used ARIMAmodel to predict evolution of the Linux Kernel by usingdataset related to 68 stable releases of the software system

Herraiz et al [8] applied a stationery model based ontime series analysis to the monthly number of changes inthe CVS repository of Eclipse Their model predicted thenumber of changes per month for the next three monthsThey employed Kernel smoothing to reduce noise a lessonlearned from Kemerer and Slaughter [9] study who couldnot get good results of ARIMA modelling for predictingnumber of changes as they ignored noise present in thedata

Kenmei et al [16] applied ARIMA to model and forecastchange requests per unit of size of large open source projectsData from three large open source projects Mozilla JBossand Eclipse confirm the capability of the approach to effec-tively perform prediction and identify trendsThey report theevidence that ARIMA models almost always outperform thepredictive accuracy of simplemodels such as linear regressionor random walk [16] The benchmark models selected byKenmei et al [16] for evaluating the prediction accuracy ofARIMA model are not rigorous

Raja et al [20] did time series analysis of defect reportsof eight open source software projects over a period of fiveyears and found ARIMA (0 1 1) model to be useful for defectprediction Klas et al [19] combined time series analysiswith expert opinion to create prediction models for defectsThey suggest that a hybrid model is more powerful than datamodels in the early phases of a projectrsquos life cycle

Goulao et al [15] used time series technique for long-term prediction of the overall number of change requestsThey investigated the suitability of ARIMA model to predictthe long-term fluctuation of all change requests for a projecthaving seasonal patterns such as Eclipse They found thattheir ARIMA model is statistically more significant andoutperforms the nonseasonal models

Amin et al [23] used the ARIMA model in place of soft-ware reliability growthmodel (SRGM) to predict the softwarereliability SRGM has restrictive assumption on environmentof the software under analysisThey specified that theARIMAmodelling is far better than SRGM approach as ARIMAis data oriented and cover all limitations of the previousapproaches

A perusal of the existing research in this area showsARIMA modelling as the most frequently used predic-tion procedure However OSS development is a stochasticprocess Unlike the traditional development in which theenvironment is controlled OSS development is based oncontributions from volunteers who could not be forced towork even if something is of high priority for the project [1]Along with this unplanned activity there is a lack of planneddocumentation related to requirements and detailed design[8]

Open source projects without any tight organizationalsupport face many uncertainties Uncertainty lies in anuncontrolled development environment such as availabilityof contributors at any point of time Due to uncertainty thereis a large fluctuation in consecutive values (as observed inmonthly commit data of Eclipse CDT in Figure 1)Most of theclassical time series techniques are inappropriate for analysisand forecasting of the data which involve uncertainty [8 9]Research literature also indicates that ARIMAmodelling can

Advances in Fuzzy Systems 3

Table 1 Descriptive statistics of the open source software projects

Software Origin date Data collection date Number of years Number of authors Total commits analysedEclipse CDT 6272002 9232013 11 135 26246

YearMonths

2002 2003 2004 2005 2007 2008 2009 2010 2011 2012

Jun Jul Sep Nov Jan Mar May Jul Sep Nov

0

100

200

300

400

500

Num

ber o

f com

mits

Figure 1 Eclipse CDT commit activity

be useful when there is uncertainty in the data but only afterapplying smoothing to reduce noise [24]

Hong et al [25] introduced the fuzzy data miningalgorithm for quantitative values The algorithm extracts theuseful knowledge from the transactional database havingquantitative values The algorithm combines the fuzzy setconcept with that of Apriori algorithm

Chen et al [11] extended the work of Hong et al andanalysed the fuzzy data mining algorithm on time seriesdata They proposed an approach in which the conceptsof fuzzy sets are used along with the data mining Apriorialgorithm to generate linguistic association rules They usethe fuzzymembership function to convert the time series datato fuzzy set and then apply the Apriori algorithm to generateassociation rules accordingly They specified the validationand finding of applications for their algorithm as the futurework So with the continuation of their work we are usingalgorithm for analysing the commit activity of open sourcesoftware project for finding existing regularity and trend inthe commit activity of OSS project

Suresh and Raimond [26] extended the work of Chen etal [11]They proposed a new algorithm called extended fuzzyfrequent pattern algorithm for analysing the time series dataThe association rules are generated without generating thecandidate sets

This paper uses a fuzzy data mining approach on timeseries [11] to analyse the commit activity in open sourcesoftware projectsThe research questions that this study aimsto answer are (1) analysing the commit activity of OSS projectfor finding the trend and regularity and (2) validating thefuzzy data mining algorithm on OSS project data

3 Methodology

The objective of this empirical study is to investigate thesuitability of the fuzzy data mining method to analyse thenumber of commits per month as a software system evolvesfor finding the regularity and trendThis section describes thedata collection process basic concepts of fuzzy time seriesand the fuzzy data mining method

31 Data Collection The development repository of opensource software project (Eclipse CDT) is obtained from GIT

Hub [27] A repository is downloaded by making the cloneof the original repository onto the local machine by usingGIT Bash [4] A script is written in JAVA to fetch the numberof commits per month for the observation period for allthe software projects The descriptive statistics about thedevelopment repositories of all software projects is shown inTable 1

Eclipse [28] is an integrated development environmentEclipse has base workspace and extendable plug-in forcustomizing the development environment It is used todevelop application in different languages such as JAVACC++ COBOL and PHP by using available plug-in Thetwo variations Eclipse SDK and Eclipse CDT are well knownfor developing applications Eclipse SDK is compatible withJAVA and used by JAVA developers for building project onEclipse platform Eclipse CDT provides CC++ developmenttooling [29] Eclipse CDT allows developing application inCC++ using Eclipse Eclipse CDT provides various features[30] such as full featured editor debugging refactoringparser and indexes In this study we consider Eclipse CDTonlyThe number of commits for eachmonth from 6272002to 9232013 is arrangedmonth-wise to form a time series (for136 months) shown in Figure 1

32 Basics of Fuzzy Set Theory and Fuzzy Time Series Theconcept of fuzzy set was introduced by Zadeh [31] in 1965It was the extension of classical set theory The fuzzy setis characterized by degree of membership function [31]The membership function can be of various forms such astriangular 119871-function 119877-function and trapezoidal functionused depending upon the application and requirement Thepresent study uses both 119871 and 119877 membership function todefine the values of fuzzy variable

33 Apriori Algorithm Apriori algorithm [32 33] is one of theclassical algorithms proposed by R Srikant and R Agrawal in1994 for finding frequent patterns for generating associationrules Apriori employs an iterative approach known as level-wise search where 119896-itemsets are used to explore (119896 + 1)-itemsets

Apriori algorithm is executed in two steps Firstly itretrieves all the frequent itemsets whose support is notsmaller than the minimum support (min sup) The first stepfurther consists of join and pruning action In joining thecandidate set 119862119896 is produced by joining 119871119896minus1 with itselfIn pruning the candidate sets are pruned by applying theApriori property that is all the nonempty subsets of frequentitemset must also be frequent

The pseudocode for generation of frequent itemsets is asfollows119862119896 Candidate itemset of size 119896119871119896 Frequent itemset of size 1198961198711 = f requent 1-itemset

4 Advances in Fuzzy Systems

For (119896 = 1 119871119896 = B 119896 + +)

119862119896+1 = Join 119871119896 with 119871119896 to generate 119862119896+1119871119896+1 = Candidate in 119862119896+1 with support greater thanor equal to min support

EndReturn 119871119896

Next it uses the frequent itemsets to generate the strong asso-ciation rules satisfying the minimum confidence (min conf)threshold The pseudocode for generation of strong associa-tion rules is as follows

InputFrequent Itemset 119871Minimum confidence threshold min confOutput Strong association rules 119877119877 = Φ

for each frequent itemset 119868 in 119871

for each non-empty subset s of 119868

conf (119878 rarr 119868 minus 119878) = support count (119868)supportcount (119878)if conf gt= min conf

generate strong association ruleRule r = ldquo119904 rarr (119868 minus 119878)rdquo119877 = 119877119880119903

34 Fuzzy DataMining Algorithm for Time Series Data Chenet al [11] extended the work of Hong et al [25] and proposedthe fuzzy datamining for time series dataThe time series dataof 119870 points are entered as input along with the predefinedminimum support 120582 minimum confidence 120572 and windowsize of 119908

The input data is first converted to generate (119870 minus 119908 +1) sequences each subsequence has 119908 elements The fuzzymembership function is used to convert each data iteminto the equivalent fuzzy set The Apriori algorithm is usedto mine frequent fuzzy sets Moreover the data reductionmethod is used to remove the redundant data items

The association rules are generated in the same way asgenerated in Apriori algorithm The stepwise process of thefuzzy Apriori algorithm is given belowInputTime series with119870 data pointsMembership function values ℎ

Minimum support 120582Minimum confidence 120572Sliding window size 119908Fuzzy set 119891119901Output Set of fuzzy association rules

Step 1 Convert the time series data into (119870minus119908+1) sequenceswhere each sequence has maximum of 119908 elements Supposewe assume 119908 = 5 then each sequence has maximum of 5elementsThe elements of subsequence are referred to as datavariable and given as 1198601 1198602 1198603 1198604 and 1198605 respectively

Step 2 Apply fuzzymembership function on elements of timeseries to generate fuzzy set (119891119901)

Step 3 Based on the membership function and its userdefined level (suppose low middle and high) each datavariable after conversion to fuzzy item lies in different userdefined levels (such as 1198601Low 1198602Middle and 1198603High referredto as fuzzy items)

Step 3 Calculate the scalar cardinality count of each fuzzyitem of subsequences

Step 4 Compare the total scalar cardinality count of each sub-sequence with the minimum support value The sequenceswith value greater than or equal to 120572 are kept in 1198711

The support value of subsequence is generated as

Support value = Count119870 minus 119908 + 1

(1)

Step 5 If 1198711 = NULL then exit else do the following step for119903 = 1 to 119870

Step 6 Join 119871119903 with 119871119903 to generate candidate (119903 + 1) fuzzyitemset (ie 119862119903+1) (similar to Apriori algorithm except notjoining the items generated from same order of data point andjoin is possible only if (119903 minus 1) data items in both sets are thesame)

(i) Calculate the fuzzy value of each candidate fuzzyitemset by using fuzzy set theory that is Min(1198911Λ1198912Λ 119891119896)

(ii) Count the scalar cardinality of each fuzzy candidateitemset

(iii) If count is more than or equal to 120572 then put it in 119871119903+1and calculate its support value as

Support value = Count119870 minus 119908 + 1

(2)

Step 7 If 119871119903+1 = NULL then exit otherwise go to Step 6again

Step 8 Remove redundant large itemset (ie by shifting eachlarge itemsets (1198681 1198682 1198683 119868119902) into (119868

1015840

1 1198681015840

2 1198681015840

3 119868

1015840

119902) such that

fuzzy region 11987711 becomes 119877101584011when shifted)

Advances in Fuzzy Systems 5

Table 2 Fuzzy candidate item sets in 1198622

1198601Low cap 1198602Low 119860

1Low cap 1198602Middle 1198601Low cap 1198603Low 119860

1LowΛ1198603Middle 1198601LowΛ1198604Low

1198601Low cap 1198604Middle 1198601Low cap 1198605Low 1198601Low cap 1198605Middle 1198601Middle cap 1198602Low 1198601Middle cap 1198602Middle

1198601Middle cap 1198603Low 1198601Middle cap 1198603Middle 1198601Middle cap 1198604Low 1198601Middle cap 1198604Middle 1198601Middle cap 1198605Low

1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198603Middle 1198602Low cap 1198604Low 1198602Low cap 1198604Middle

1198602Low cap 1198605Low 1198602Low cap 1198605Middle 1198602Middle cap 1198603Low 1198602Middle cap 1198603Middle 1198602Middle cap 1198604Low

1198602Middle cap 1198604Middle 1198602Middle cap 1198605Low 1198602Middle cap 1198605Middle 1198603Low cap 1198604Low 1198603Low cap 1198604Middle

1198603Low cap 1198605Low 1198603Low cap 1198605Middle 1198603Middle cap 1198604Low 1198603Middle cap 1198604Middle 1198603Middle cap 1198605Low

1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Low cap 1198605Middle 1198604Middle cap 1198605Low 1198604Middle cap 1198605Middle

Low Middle High

0

1

0 100 250 300 450

Figure 2 Membership function

Step 9 Generate the association rules using Apriori rulegeneration method and calculate the confidence of each ruleThe only variation is that in place of normal data itemsetwe have fuzzy itemset so concept of intersection is used inrule not union If confidence value is not less than minimumconfidence 120572 then keep the rule otherwise reject

4 Experimental Setup

The experiment is performed on x86 Family 6 Model 15Stepping 6 GenuineIntel sim2131Mhz Processor 1 GB RAMMicrosoft Windows XP Professional operating system ver-sion 512600 Service Pack 3 Build 2600 240GB hard diskThe fuzzy data mining algorithm for time series data isimplemented using JAVA

5 Results and Analysis

The dataset is divided into two subsets training dataset (120months) and remaining dataset (16 months) The completeprocess of generating the association rule using fuzzy datamining algorithm is performed in two steps

(A) The association rules are generated on trainingdataset using fuzzy data mining for time series dataalgorithm

(B) The validation of generated association rule is doneusing training and remaining dataset

51 Generation of Association Rules from Training DatasetThe fuzzy data mining algorithm of time series [11] is appliedon training dataset to generate association rule We assumed119908 = 5 120582 = 25 (25100 lowast number of subsequences) and120572 = 65 membership function and its values are shownin Figure 2 These assumptions are made by referring to and

understanding the concepts of fuzzy data mining algorithmgiven in [11] The commits are divided according to the levelof activity The commits in the range of 0ndash100 250ndash300 and450 onwards indicate the lowmiddle (average) and high levelof activity respectively The level of commit activity indicatesthe amount of work done in a commit

(i) Generation of Subsequences

We have 119870 = 120 (as number of months)119908 = 5 (window size)Thenumber of subsequences is generated as (119870minus119908+1)Therefore number of subsequences = (119870 minus 119908 + 1) =(120 minus 5 + 1) = 116

All generated subsequences for 119870 = 120 (shown inTable 8 of the Appendix)

(ii) Transformation of Data to Fuzzy Sets In this step wetransform the commit activity data into fuzzy set (usingmembership function) and count the scalar cardinality ofeach data variable (shown in Table 9 of the Appendix)

All these data variables are considered as candidateitemsets (1198621)For generation of 1198711 data variables having countmore than 120582 = 25 (25100 lowast 116) = 29 areconsideredIt is found that 1198711 contain 1198601Low 1198601Middle 1198602Low1198602Middle 1198603Low 1198603Middle 1198604Low 1198604Middle 1198605Lowand 1198605Middle

Further calculate support value of each candidate itemsetusing

Support value = Count119870 minus 119908 + 1

(3)

(iii) Generation of 1198622 and 1198712 For generating 1198622 (shown inTable 2) join1198711with1198711 not joining the items generated fromsame order of data points that is1198601Low join with1198601Middle isnot allowed similar to all others also Meanwhile joining allthe properties of Apriori algorithm is used

Next useMin (1198911cap1198912) function to find the value of each ofthe candidate fuzzy sets Count the scalar cardinality of eachof the candidate sets in 1198622

6 Advances in Fuzzy Systems

Table 3 Fuzzy item sets in 1198712

1198601Low cap 1198602Low 1198601Low cap 1198603Low 1198601Low cap 1198604Middle 1198601Middle cap 1198602Middle 1198601Middlecap3Middle

1198601Middlecap4Middle 1198601Middle cap 1198605Low 1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198604Low

1198602Middlecap3Middle 119860

2Middle cap 1198604Middle 1198602Middle cap 1198605Middle 119860

3Low cap 1198604Low 1198603Low cap 1198605Low

1198603MiddleΛ1198604Middle 1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Middle cap 1198605Middle

Table 4 Fuzzy item sets in 1198716

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

1198602Middle cap 1198603Middle cap 1198604Middle 1198602Middle cap 1198603Middle cap 1198605Middle 1198602Middle cap 1198604Middle cap 1198605Middle

1198603Middle cap 1198604Middle cap 1198605Middle

Table 5 Largest fuzzy item sets generated

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

The candidate fuzzy sets with a value not less than thethreshold value are kept in 1198712 and also find their supportvalue Now 1198712 contain fuzzy itemsets shown in Table 3

(iv) Generation of 1198623 and 1198713 For generating 1198623 join 1198712 with1198712 not joining the items generated from same order of datapoints In case of 1198623 join is possible between only thosefuzzy sets where at least one data item is common in bothAfter generation of 1198623 only those fuzzy itemsets are put in1198713 (shown in Table 4) whose count is not less than thresholdvalue

(v) Generation of 1198624 Join 1198713 with 1198713 not joining the itemsgenerated from same order of data point Join is possible onlyfor those fuzzy sets where at least two data items are commonin both

After generation of 1198624 it is found that no element hascount more than threshold value therefore 1198714 = Null

(vi) Removal of Redundant Large ItemsetsRemove the redun-dant large itemsets from 1198713 using Step 8 of the algorithmdescribed in Section 34

After applying Step 8 of algorithm we are left with theselarge itemsets (shown in Table 5)

(vii) Generate Association Rules Generate the associationrules from these large itemsets using Step 8 of the algorithmdescribed in Section 34 All the generated rules are shown inTable 6 where strong rules having count more than thresholdconfidence are marked as bold

In this experiment we use 120572 = 65 it means only thoserules are valid (and are called strong association rules) andhave support value not less than 65 We found 18 rulesin this case each rule acts as the knowledge base for theproject manager and developers of the project For examplethe generated rule A1Middle cap A2Middle rarr A3Middle specifiesthat if the value of first and second data point lies in themiddle then there is high probability that the third data pointhas middle value also All these rules act as a precise andcompact knowledge for project manager and analyst

52 Validation of Association Rules Using Training andRemaining Dataset

(i) Validation on Training Dataset It is found that 65 ofthe transactions in the Eclipse CDT training set follow theserules These rules act as a compact and concrete knowledgeof this data

(ii) Validation on Remaining Dataset All generated associ-ation rules are tested on remaining dataset values to findthe regularity and trend in the number of commits analysedto find whether the commits are performed at same rate ornot If the rate is same then it means Eclipse has consistentgrowth otherwise there exists a variation in the consideredsoftware growth It is found that in case of remainingset the applicability of these rules decreases This thing isagain verified by generating the association rules from theremaining set The generated rules from remaining datasetspecify that the data in the remaining dataset is more towardslower range of commits performed

Following are the factors due to which this variation inthe commit rate is found

(a) Most of the values in the remaining dataset aretowards low rangeThis point is verified by finding thelargest frequent fuzzy itemsets and then generatingthe association rules from the remaining dataset byusing fuzzy data mining for time series data algo-rithmThe largest frequent itemset found is shown inTable 7

The generated frequent itemsets consist of only datapoints with low range Hence most of the entriesof commits in remaining datasets are probably lowThis specifies that the numbers of commits anal-ysed from 612012 to 9232013 is less as com-pared to the number of commits analysed in thetraining dataset It specifies the less activity in thedevelopment of considered software growth in thisperiod

Advances in Fuzzy Systems 7

Table 6 Calculated confidence value of various association rules

A1Middle cap A2Middle rarr A3Middle 0741 741198603Middle rarr 1198601Middle cap 1198602Middle 0522 52A1Middle cap A3Middle rarr A2Middle 0781 781198602Middle rarr 1198601Middle cap 1198603Middle 0526 53

A2Middle cap A3Middle rarr A1Middle 0735 741198601Middle rarr 1198602Middle cap 1198603Middle 0529 53A1Middle cap A2Middle rarr A4Middle 0692 691198604Middle rarr 1198601Middle cap 1198602Middle 0491 49A1Middle cap A4Middle rarr A2Middle 0762 761198602Middle rarr 1198601Middle cap 1198604Middle 0491 49A2Middle cap A4Middle rarr A1Middle 0724 721198601Middle rarr 1198602Middle cap 1198604Middle 0494 49A1Middle cap A2Middle rarr A5Middle 0695 691198605Middle rarr 1198601Middle cap 1198602Middle 0497 50

A1Middle cap A5Middle rarr A2Middle 0737 741198602Middle rarr 1198601Middle cap 1198605Middle 0493 49A2Middle cap A5Middle rarr A1Middle 0759 761198601Middle rarr 1198602Middle cap 1198605Middle 0496 50A1Middle cap A3Middle rarr A4Middle 0742 741198604Middle rarr 1198601Middle cap 1198603Middle 0499 50A1Middle cap A4Middle rarr A3Middle 0775 781198603Middle rarr 1198601Middle cap 1198604Middle 0496 50A3Middle cap A4Middle rarr A1Middle 0691 691198601Middle rarr 1198603Middle cap 1198604Middle 0502 50

A1Middle cap A3Middle rarr A5Middle 0751 751198605Middle rarr 1198601Middle cap 1198603Middle 0510 51A1Middle cap A5Middle rarr A3Middle 0757 761198603Middle rarr 1198601Middle cap 1198605Middle 0502 50A3Middle cap A5Middle rarr A1Middle 0738 741198601Middle rarr 1198603Middle cap 1198605Middle 0509 51A1Middle cap A4Middle rarr A5Middle 0792 791198605Middle rarr 1198601Middle cap 1198604Middle 0515 51A1Middle cap A5Middle rarr A4Middle 0764 761198604Middle rarr 1198601Middle cap 1198605Middle 0511 51A4Middle cap A5Middle rarr A1Middle 0715 711198601Middle rarr 1198604Middle cap 1198605Middle 0513 51

Table 7 Largest frequent fuzzy item set for remaining dataset

1198601Low cap 1198602Low cap 1198603Low cap 1198604Low

1198601Low cap 1198602Low cap 1198603Low cap 1198605Low

1198601Low cap 1198602Low cap 1198604Low cap 1198605Low

1198601Low cap 1198603Low cap 1198604Low cap 1198605Low

(b) Factors like the number of active users and numberof files changed (addition deletion andmodification)are less

6 Discussion

The fuzzy data mining algorithm for time series data [11]allows efficient mining of the association rules from the large

Table 8 Generated subsequences

Sp Subsequences1 9 25 252 240 3612 25 252 240 361 2983 252 240 361 298 1184 240 361 298 118 2945 361 298 118 294 2196 298 118 294 219 1937 118 294 219 193 3598 294 219 193 359 969 219 193 359 96 15110 193 359 96 151 14111 359 96 151 141 19812 96 151 141 198 29613 151 141 198 296 18314 141 198 296 183 16015 198 296 183 160 11716 296 183 160 117 13917 183 160 117 139 17918 160 117 139 179 25519 117 139 179 255 35120 139 179 255 351 32021 179 255 351 320 44022 255 351 320 440 21923 351 320 440 219 26024 320 440 219 260 25425 440 219 260 254 22726 219 260 254 227 36327 260 254 227 363 18528 254 227 363 185 20029 227 363 185 200 25430 363 185 200 254 17531 185 200 254 175 25332 200 254 175 253 22133 254 175 253 221 28134 175 253 221 281 26335 253 221 281 263 8736 221 281 263 87 8337 281 263 87 83 4238 263 87 83 42 8039 87 83 42 80 4440 83 42 80 44 9941 42 80 44 99 7142 80 44 99 71 8343 44 99 71 83 15044 99 71 83 150 12045 71 83 150 120 6046 83 150 120 60 13247 150 120 60 132 12748 120 60 132 127 15149 60 132 127 151 10950 132 127 151 109 141

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

2 Advances in Fuzzy Systems

algorithm) is used to generate linguistic rules from the timeseries data In [11] Chen et al specified the finding applicationfor their algorithm and validating it as the future work Thispresent study considers both of these issues as one of theobjectives In this proposed work we divide the consideredtime series dataset into two data subsets that is training setand remaining set We use the algorithm first to generatethe association rules from the training set and then validatethe accuracy and prediction capability of these rules on theremaining set The high prediction accuracy indicates thatthe commits in the remaining set have regularity and trend inthe number of commits performed for Eclipse CDT The lowprediction accuracy specifies that the commits on remainingset are not consistent with those of training set and there isirregularity and detrend in the commits performed

Main objective of the present study is to explore thefuzzy data mining approach for time series used to generatethe association rules for evaluating the existing trend andregularity in the evolution ofOSS projects As commit activityis a good indicator of continuous development activity ofan OSS project another objective of the present work isto develop a commit prediction model for OSS systemsProject managers developers and users can use the commitpredictionmodel to understand the future commit activity ofan OSS project and then plan schedule and allocate resourcesaccordingly as per their role

The rest of the paper is as follows Section 2 presentsthe related work to study software evolution and the usedfuzzy data mining technique Section 3 explains the researchmethodology Section 4 presents details of the experimentalsetup used for performing the study Section 5 gives theresults and analysis Threats to validity of the results areexplained in Section 6 Last section concludes the paper

2 Related Work

The idea to analyse and predict the software evolution wasseen initially in the late 1980s when Yuen published hispapers on the subject in a series of conferences on softwaremaintenance [12ndash14] He used time series analysis as atechnique for software evolution prediction Later severalstudies used time series analysis for predicting softwareevolution The software evolution metrics undertaken forprediction include the monthly number of changes [8 9]change requests [15 16] size and complexity [17 18] defects[19 20] clones [21] and maintenance effort [22]

Kemerer and Slaughter [9] looked at the evolution oftwo proprietary systems using two approaches one based onthe time series analysis (ARIMA) and the other based ona technique called sequence analysis They found ARIMAmodels inappropriate for analysis as the dataset was largelyrandom in nature Antoniol et al [21] presented an approachfor monitoring and predicting evolution of software clonesacross subsequent versions of a software system (mSQL)using time series analysis (ARIMA)

Caprio et al [17] used time series analysis to estimatesize and complexity of the Linux Kernel They used ARIMAmodel to predict evolution of the Linux Kernel by usingdataset related to 68 stable releases of the software system

Herraiz et al [8] applied a stationery model based ontime series analysis to the monthly number of changes inthe CVS repository of Eclipse Their model predicted thenumber of changes per month for the next three monthsThey employed Kernel smoothing to reduce noise a lessonlearned from Kemerer and Slaughter [9] study who couldnot get good results of ARIMA modelling for predictingnumber of changes as they ignored noise present in thedata

Kenmei et al [16] applied ARIMA to model and forecastchange requests per unit of size of large open source projectsData from three large open source projects Mozilla JBossand Eclipse confirm the capability of the approach to effec-tively perform prediction and identify trendsThey report theevidence that ARIMA models almost always outperform thepredictive accuracy of simplemodels such as linear regressionor random walk [16] The benchmark models selected byKenmei et al [16] for evaluating the prediction accuracy ofARIMA model are not rigorous

Raja et al [20] did time series analysis of defect reportsof eight open source software projects over a period of fiveyears and found ARIMA (0 1 1) model to be useful for defectprediction Klas et al [19] combined time series analysiswith expert opinion to create prediction models for defectsThey suggest that a hybrid model is more powerful than datamodels in the early phases of a projectrsquos life cycle

Goulao et al [15] used time series technique for long-term prediction of the overall number of change requestsThey investigated the suitability of ARIMA model to predictthe long-term fluctuation of all change requests for a projecthaving seasonal patterns such as Eclipse They found thattheir ARIMA model is statistically more significant andoutperforms the nonseasonal models

Amin et al [23] used the ARIMA model in place of soft-ware reliability growthmodel (SRGM) to predict the softwarereliability SRGM has restrictive assumption on environmentof the software under analysisThey specified that theARIMAmodelling is far better than SRGM approach as ARIMAis data oriented and cover all limitations of the previousapproaches

A perusal of the existing research in this area showsARIMA modelling as the most frequently used predic-tion procedure However OSS development is a stochasticprocess Unlike the traditional development in which theenvironment is controlled OSS development is based oncontributions from volunteers who could not be forced towork even if something is of high priority for the project [1]Along with this unplanned activity there is a lack of planneddocumentation related to requirements and detailed design[8]

Open source projects without any tight organizationalsupport face many uncertainties Uncertainty lies in anuncontrolled development environment such as availabilityof contributors at any point of time Due to uncertainty thereis a large fluctuation in consecutive values (as observed inmonthly commit data of Eclipse CDT in Figure 1)Most of theclassical time series techniques are inappropriate for analysisand forecasting of the data which involve uncertainty [8 9]Research literature also indicates that ARIMAmodelling can

Advances in Fuzzy Systems 3

Table 1 Descriptive statistics of the open source software projects

Software Origin date Data collection date Number of years Number of authors Total commits analysedEclipse CDT 6272002 9232013 11 135 26246

YearMonths

2002 2003 2004 2005 2007 2008 2009 2010 2011 2012

Jun Jul Sep Nov Jan Mar May Jul Sep Nov

0

100

200

300

400

500

Num

ber o

f com

mits

Figure 1 Eclipse CDT commit activity

be useful when there is uncertainty in the data but only afterapplying smoothing to reduce noise [24]

Hong et al [25] introduced the fuzzy data miningalgorithm for quantitative values The algorithm extracts theuseful knowledge from the transactional database havingquantitative values The algorithm combines the fuzzy setconcept with that of Apriori algorithm

Chen et al [11] extended the work of Hong et al andanalysed the fuzzy data mining algorithm on time seriesdata They proposed an approach in which the conceptsof fuzzy sets are used along with the data mining Apriorialgorithm to generate linguistic association rules They usethe fuzzymembership function to convert the time series datato fuzzy set and then apply the Apriori algorithm to generateassociation rules accordingly They specified the validationand finding of applications for their algorithm as the futurework So with the continuation of their work we are usingalgorithm for analysing the commit activity of open sourcesoftware project for finding existing regularity and trend inthe commit activity of OSS project

Suresh and Raimond [26] extended the work of Chen etal [11]They proposed a new algorithm called extended fuzzyfrequent pattern algorithm for analysing the time series dataThe association rules are generated without generating thecandidate sets

This paper uses a fuzzy data mining approach on timeseries [11] to analyse the commit activity in open sourcesoftware projectsThe research questions that this study aimsto answer are (1) analysing the commit activity of OSS projectfor finding the trend and regularity and (2) validating thefuzzy data mining algorithm on OSS project data

3 Methodology

The objective of this empirical study is to investigate thesuitability of the fuzzy data mining method to analyse thenumber of commits per month as a software system evolvesfor finding the regularity and trendThis section describes thedata collection process basic concepts of fuzzy time seriesand the fuzzy data mining method

31 Data Collection The development repository of opensource software project (Eclipse CDT) is obtained from GIT

Hub [27] A repository is downloaded by making the cloneof the original repository onto the local machine by usingGIT Bash [4] A script is written in JAVA to fetch the numberof commits per month for the observation period for allthe software projects The descriptive statistics about thedevelopment repositories of all software projects is shown inTable 1

Eclipse [28] is an integrated development environmentEclipse has base workspace and extendable plug-in forcustomizing the development environment It is used todevelop application in different languages such as JAVACC++ COBOL and PHP by using available plug-in Thetwo variations Eclipse SDK and Eclipse CDT are well knownfor developing applications Eclipse SDK is compatible withJAVA and used by JAVA developers for building project onEclipse platform Eclipse CDT provides CC++ developmenttooling [29] Eclipse CDT allows developing application inCC++ using Eclipse Eclipse CDT provides various features[30] such as full featured editor debugging refactoringparser and indexes In this study we consider Eclipse CDTonlyThe number of commits for eachmonth from 6272002to 9232013 is arrangedmonth-wise to form a time series (for136 months) shown in Figure 1

32 Basics of Fuzzy Set Theory and Fuzzy Time Series Theconcept of fuzzy set was introduced by Zadeh [31] in 1965It was the extension of classical set theory The fuzzy setis characterized by degree of membership function [31]The membership function can be of various forms such astriangular 119871-function 119877-function and trapezoidal functionused depending upon the application and requirement Thepresent study uses both 119871 and 119877 membership function todefine the values of fuzzy variable

33 Apriori Algorithm Apriori algorithm [32 33] is one of theclassical algorithms proposed by R Srikant and R Agrawal in1994 for finding frequent patterns for generating associationrules Apriori employs an iterative approach known as level-wise search where 119896-itemsets are used to explore (119896 + 1)-itemsets

Apriori algorithm is executed in two steps Firstly itretrieves all the frequent itemsets whose support is notsmaller than the minimum support (min sup) The first stepfurther consists of join and pruning action In joining thecandidate set 119862119896 is produced by joining 119871119896minus1 with itselfIn pruning the candidate sets are pruned by applying theApriori property that is all the nonempty subsets of frequentitemset must also be frequent

The pseudocode for generation of frequent itemsets is asfollows119862119896 Candidate itemset of size 119896119871119896 Frequent itemset of size 1198961198711 = f requent 1-itemset

4 Advances in Fuzzy Systems

For (119896 = 1 119871119896 = B 119896 + +)

119862119896+1 = Join 119871119896 with 119871119896 to generate 119862119896+1119871119896+1 = Candidate in 119862119896+1 with support greater thanor equal to min support

EndReturn 119871119896

Next it uses the frequent itemsets to generate the strong asso-ciation rules satisfying the minimum confidence (min conf)threshold The pseudocode for generation of strong associa-tion rules is as follows

InputFrequent Itemset 119871Minimum confidence threshold min confOutput Strong association rules 119877119877 = Φ

for each frequent itemset 119868 in 119871

for each non-empty subset s of 119868

conf (119878 rarr 119868 minus 119878) = support count (119868)supportcount (119878)if conf gt= min conf

generate strong association ruleRule r = ldquo119904 rarr (119868 minus 119878)rdquo119877 = 119877119880119903

34 Fuzzy DataMining Algorithm for Time Series Data Chenet al [11] extended the work of Hong et al [25] and proposedthe fuzzy datamining for time series dataThe time series dataof 119870 points are entered as input along with the predefinedminimum support 120582 minimum confidence 120572 and windowsize of 119908

The input data is first converted to generate (119870 minus 119908 +1) sequences each subsequence has 119908 elements The fuzzymembership function is used to convert each data iteminto the equivalent fuzzy set The Apriori algorithm is usedto mine frequent fuzzy sets Moreover the data reductionmethod is used to remove the redundant data items

The association rules are generated in the same way asgenerated in Apriori algorithm The stepwise process of thefuzzy Apriori algorithm is given belowInputTime series with119870 data pointsMembership function values ℎ

Minimum support 120582Minimum confidence 120572Sliding window size 119908Fuzzy set 119891119901Output Set of fuzzy association rules

Step 1 Convert the time series data into (119870minus119908+1) sequenceswhere each sequence has maximum of 119908 elements Supposewe assume 119908 = 5 then each sequence has maximum of 5elementsThe elements of subsequence are referred to as datavariable and given as 1198601 1198602 1198603 1198604 and 1198605 respectively

Step 2 Apply fuzzymembership function on elements of timeseries to generate fuzzy set (119891119901)

Step 3 Based on the membership function and its userdefined level (suppose low middle and high) each datavariable after conversion to fuzzy item lies in different userdefined levels (such as 1198601Low 1198602Middle and 1198603High referredto as fuzzy items)

Step 3 Calculate the scalar cardinality count of each fuzzyitem of subsequences

Step 4 Compare the total scalar cardinality count of each sub-sequence with the minimum support value The sequenceswith value greater than or equal to 120572 are kept in 1198711

The support value of subsequence is generated as

Support value = Count119870 minus 119908 + 1

(1)

Step 5 If 1198711 = NULL then exit else do the following step for119903 = 1 to 119870

Step 6 Join 119871119903 with 119871119903 to generate candidate (119903 + 1) fuzzyitemset (ie 119862119903+1) (similar to Apriori algorithm except notjoining the items generated from same order of data point andjoin is possible only if (119903 minus 1) data items in both sets are thesame)

(i) Calculate the fuzzy value of each candidate fuzzyitemset by using fuzzy set theory that is Min(1198911Λ1198912Λ 119891119896)

(ii) Count the scalar cardinality of each fuzzy candidateitemset

(iii) If count is more than or equal to 120572 then put it in 119871119903+1and calculate its support value as

Support value = Count119870 minus 119908 + 1

(2)

Step 7 If 119871119903+1 = NULL then exit otherwise go to Step 6again

Step 8 Remove redundant large itemset (ie by shifting eachlarge itemsets (1198681 1198682 1198683 119868119902) into (119868

1015840

1 1198681015840

2 1198681015840

3 119868

1015840

119902) such that

fuzzy region 11987711 becomes 119877101584011when shifted)

Advances in Fuzzy Systems 5

Table 2 Fuzzy candidate item sets in 1198622

1198601Low cap 1198602Low 119860

1Low cap 1198602Middle 1198601Low cap 1198603Low 119860

1LowΛ1198603Middle 1198601LowΛ1198604Low

1198601Low cap 1198604Middle 1198601Low cap 1198605Low 1198601Low cap 1198605Middle 1198601Middle cap 1198602Low 1198601Middle cap 1198602Middle

1198601Middle cap 1198603Low 1198601Middle cap 1198603Middle 1198601Middle cap 1198604Low 1198601Middle cap 1198604Middle 1198601Middle cap 1198605Low

1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198603Middle 1198602Low cap 1198604Low 1198602Low cap 1198604Middle

1198602Low cap 1198605Low 1198602Low cap 1198605Middle 1198602Middle cap 1198603Low 1198602Middle cap 1198603Middle 1198602Middle cap 1198604Low

1198602Middle cap 1198604Middle 1198602Middle cap 1198605Low 1198602Middle cap 1198605Middle 1198603Low cap 1198604Low 1198603Low cap 1198604Middle

1198603Low cap 1198605Low 1198603Low cap 1198605Middle 1198603Middle cap 1198604Low 1198603Middle cap 1198604Middle 1198603Middle cap 1198605Low

1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Low cap 1198605Middle 1198604Middle cap 1198605Low 1198604Middle cap 1198605Middle

Low Middle High

0

1

0 100 250 300 450

Figure 2 Membership function

Step 9 Generate the association rules using Apriori rulegeneration method and calculate the confidence of each ruleThe only variation is that in place of normal data itemsetwe have fuzzy itemset so concept of intersection is used inrule not union If confidence value is not less than minimumconfidence 120572 then keep the rule otherwise reject

4 Experimental Setup

The experiment is performed on x86 Family 6 Model 15Stepping 6 GenuineIntel sim2131Mhz Processor 1 GB RAMMicrosoft Windows XP Professional operating system ver-sion 512600 Service Pack 3 Build 2600 240GB hard diskThe fuzzy data mining algorithm for time series data isimplemented using JAVA

5 Results and Analysis

The dataset is divided into two subsets training dataset (120months) and remaining dataset (16 months) The completeprocess of generating the association rule using fuzzy datamining algorithm is performed in two steps

(A) The association rules are generated on trainingdataset using fuzzy data mining for time series dataalgorithm

(B) The validation of generated association rule is doneusing training and remaining dataset

51 Generation of Association Rules from Training DatasetThe fuzzy data mining algorithm of time series [11] is appliedon training dataset to generate association rule We assumed119908 = 5 120582 = 25 (25100 lowast number of subsequences) and120572 = 65 membership function and its values are shownin Figure 2 These assumptions are made by referring to and

understanding the concepts of fuzzy data mining algorithmgiven in [11] The commits are divided according to the levelof activity The commits in the range of 0ndash100 250ndash300 and450 onwards indicate the lowmiddle (average) and high levelof activity respectively The level of commit activity indicatesthe amount of work done in a commit

(i) Generation of Subsequences

We have 119870 = 120 (as number of months)119908 = 5 (window size)Thenumber of subsequences is generated as (119870minus119908+1)Therefore number of subsequences = (119870 minus 119908 + 1) =(120 minus 5 + 1) = 116

All generated subsequences for 119870 = 120 (shown inTable 8 of the Appendix)

(ii) Transformation of Data to Fuzzy Sets In this step wetransform the commit activity data into fuzzy set (usingmembership function) and count the scalar cardinality ofeach data variable (shown in Table 9 of the Appendix)

All these data variables are considered as candidateitemsets (1198621)For generation of 1198711 data variables having countmore than 120582 = 25 (25100 lowast 116) = 29 areconsideredIt is found that 1198711 contain 1198601Low 1198601Middle 1198602Low1198602Middle 1198603Low 1198603Middle 1198604Low 1198604Middle 1198605Lowand 1198605Middle

Further calculate support value of each candidate itemsetusing

Support value = Count119870 minus 119908 + 1

(3)

(iii) Generation of 1198622 and 1198712 For generating 1198622 (shown inTable 2) join1198711with1198711 not joining the items generated fromsame order of data points that is1198601Low join with1198601Middle isnot allowed similar to all others also Meanwhile joining allthe properties of Apriori algorithm is used

Next useMin (1198911cap1198912) function to find the value of each ofthe candidate fuzzy sets Count the scalar cardinality of eachof the candidate sets in 1198622

6 Advances in Fuzzy Systems

Table 3 Fuzzy item sets in 1198712

1198601Low cap 1198602Low 1198601Low cap 1198603Low 1198601Low cap 1198604Middle 1198601Middle cap 1198602Middle 1198601Middlecap3Middle

1198601Middlecap4Middle 1198601Middle cap 1198605Low 1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198604Low

1198602Middlecap3Middle 119860

2Middle cap 1198604Middle 1198602Middle cap 1198605Middle 119860

3Low cap 1198604Low 1198603Low cap 1198605Low

1198603MiddleΛ1198604Middle 1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Middle cap 1198605Middle

Table 4 Fuzzy item sets in 1198716

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

1198602Middle cap 1198603Middle cap 1198604Middle 1198602Middle cap 1198603Middle cap 1198605Middle 1198602Middle cap 1198604Middle cap 1198605Middle

1198603Middle cap 1198604Middle cap 1198605Middle

Table 5 Largest fuzzy item sets generated

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

The candidate fuzzy sets with a value not less than thethreshold value are kept in 1198712 and also find their supportvalue Now 1198712 contain fuzzy itemsets shown in Table 3

(iv) Generation of 1198623 and 1198713 For generating 1198623 join 1198712 with1198712 not joining the items generated from same order of datapoints In case of 1198623 join is possible between only thosefuzzy sets where at least one data item is common in bothAfter generation of 1198623 only those fuzzy itemsets are put in1198713 (shown in Table 4) whose count is not less than thresholdvalue

(v) Generation of 1198624 Join 1198713 with 1198713 not joining the itemsgenerated from same order of data point Join is possible onlyfor those fuzzy sets where at least two data items are commonin both

After generation of 1198624 it is found that no element hascount more than threshold value therefore 1198714 = Null

(vi) Removal of Redundant Large ItemsetsRemove the redun-dant large itemsets from 1198713 using Step 8 of the algorithmdescribed in Section 34

After applying Step 8 of algorithm we are left with theselarge itemsets (shown in Table 5)

(vii) Generate Association Rules Generate the associationrules from these large itemsets using Step 8 of the algorithmdescribed in Section 34 All the generated rules are shown inTable 6 where strong rules having count more than thresholdconfidence are marked as bold

In this experiment we use 120572 = 65 it means only thoserules are valid (and are called strong association rules) andhave support value not less than 65 We found 18 rulesin this case each rule acts as the knowledge base for theproject manager and developers of the project For examplethe generated rule A1Middle cap A2Middle rarr A3Middle specifiesthat if the value of first and second data point lies in themiddle then there is high probability that the third data pointhas middle value also All these rules act as a precise andcompact knowledge for project manager and analyst

52 Validation of Association Rules Using Training andRemaining Dataset

(i) Validation on Training Dataset It is found that 65 ofthe transactions in the Eclipse CDT training set follow theserules These rules act as a compact and concrete knowledgeof this data

(ii) Validation on Remaining Dataset All generated associ-ation rules are tested on remaining dataset values to findthe regularity and trend in the number of commits analysedto find whether the commits are performed at same rate ornot If the rate is same then it means Eclipse has consistentgrowth otherwise there exists a variation in the consideredsoftware growth It is found that in case of remainingset the applicability of these rules decreases This thing isagain verified by generating the association rules from theremaining set The generated rules from remaining datasetspecify that the data in the remaining dataset is more towardslower range of commits performed

Following are the factors due to which this variation inthe commit rate is found

(a) Most of the values in the remaining dataset aretowards low rangeThis point is verified by finding thelargest frequent fuzzy itemsets and then generatingthe association rules from the remaining dataset byusing fuzzy data mining for time series data algo-rithmThe largest frequent itemset found is shown inTable 7

The generated frequent itemsets consist of only datapoints with low range Hence most of the entriesof commits in remaining datasets are probably lowThis specifies that the numbers of commits anal-ysed from 612012 to 9232013 is less as com-pared to the number of commits analysed in thetraining dataset It specifies the less activity in thedevelopment of considered software growth in thisperiod

Advances in Fuzzy Systems 7

Table 6 Calculated confidence value of various association rules

A1Middle cap A2Middle rarr A3Middle 0741 741198603Middle rarr 1198601Middle cap 1198602Middle 0522 52A1Middle cap A3Middle rarr A2Middle 0781 781198602Middle rarr 1198601Middle cap 1198603Middle 0526 53

A2Middle cap A3Middle rarr A1Middle 0735 741198601Middle rarr 1198602Middle cap 1198603Middle 0529 53A1Middle cap A2Middle rarr A4Middle 0692 691198604Middle rarr 1198601Middle cap 1198602Middle 0491 49A1Middle cap A4Middle rarr A2Middle 0762 761198602Middle rarr 1198601Middle cap 1198604Middle 0491 49A2Middle cap A4Middle rarr A1Middle 0724 721198601Middle rarr 1198602Middle cap 1198604Middle 0494 49A1Middle cap A2Middle rarr A5Middle 0695 691198605Middle rarr 1198601Middle cap 1198602Middle 0497 50

A1Middle cap A5Middle rarr A2Middle 0737 741198602Middle rarr 1198601Middle cap 1198605Middle 0493 49A2Middle cap A5Middle rarr A1Middle 0759 761198601Middle rarr 1198602Middle cap 1198605Middle 0496 50A1Middle cap A3Middle rarr A4Middle 0742 741198604Middle rarr 1198601Middle cap 1198603Middle 0499 50A1Middle cap A4Middle rarr A3Middle 0775 781198603Middle rarr 1198601Middle cap 1198604Middle 0496 50A3Middle cap A4Middle rarr A1Middle 0691 691198601Middle rarr 1198603Middle cap 1198604Middle 0502 50

A1Middle cap A3Middle rarr A5Middle 0751 751198605Middle rarr 1198601Middle cap 1198603Middle 0510 51A1Middle cap A5Middle rarr A3Middle 0757 761198603Middle rarr 1198601Middle cap 1198605Middle 0502 50A3Middle cap A5Middle rarr A1Middle 0738 741198601Middle rarr 1198603Middle cap 1198605Middle 0509 51A1Middle cap A4Middle rarr A5Middle 0792 791198605Middle rarr 1198601Middle cap 1198604Middle 0515 51A1Middle cap A5Middle rarr A4Middle 0764 761198604Middle rarr 1198601Middle cap 1198605Middle 0511 51A4Middle cap A5Middle rarr A1Middle 0715 711198601Middle rarr 1198604Middle cap 1198605Middle 0513 51

Table 7 Largest frequent fuzzy item set for remaining dataset

1198601Low cap 1198602Low cap 1198603Low cap 1198604Low

1198601Low cap 1198602Low cap 1198603Low cap 1198605Low

1198601Low cap 1198602Low cap 1198604Low cap 1198605Low

1198601Low cap 1198603Low cap 1198604Low cap 1198605Low

(b) Factors like the number of active users and numberof files changed (addition deletion andmodification)are less

6 Discussion

The fuzzy data mining algorithm for time series data [11]allows efficient mining of the association rules from the large

Table 8 Generated subsequences

Sp Subsequences1 9 25 252 240 3612 25 252 240 361 2983 252 240 361 298 1184 240 361 298 118 2945 361 298 118 294 2196 298 118 294 219 1937 118 294 219 193 3598 294 219 193 359 969 219 193 359 96 15110 193 359 96 151 14111 359 96 151 141 19812 96 151 141 198 29613 151 141 198 296 18314 141 198 296 183 16015 198 296 183 160 11716 296 183 160 117 13917 183 160 117 139 17918 160 117 139 179 25519 117 139 179 255 35120 139 179 255 351 32021 179 255 351 320 44022 255 351 320 440 21923 351 320 440 219 26024 320 440 219 260 25425 440 219 260 254 22726 219 260 254 227 36327 260 254 227 363 18528 254 227 363 185 20029 227 363 185 200 25430 363 185 200 254 17531 185 200 254 175 25332 200 254 175 253 22133 254 175 253 221 28134 175 253 221 281 26335 253 221 281 263 8736 221 281 263 87 8337 281 263 87 83 4238 263 87 83 42 8039 87 83 42 80 4440 83 42 80 44 9941 42 80 44 99 7142 80 44 99 71 8343 44 99 71 83 15044 99 71 83 150 12045 71 83 150 120 6046 83 150 120 60 13247 150 120 60 132 12748 120 60 132 127 15149 60 132 127 151 10950 132 127 151 109 141

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in Fuzzy Systems 3

Table 1 Descriptive statistics of the open source software projects

Software Origin date Data collection date Number of years Number of authors Total commits analysedEclipse CDT 6272002 9232013 11 135 26246

YearMonths

2002 2003 2004 2005 2007 2008 2009 2010 2011 2012

Jun Jul Sep Nov Jan Mar May Jul Sep Nov

0

100

200

300

400

500

Num

ber o

f com

mits

Figure 1 Eclipse CDT commit activity

be useful when there is uncertainty in the data but only afterapplying smoothing to reduce noise [24]

Hong et al [25] introduced the fuzzy data miningalgorithm for quantitative values The algorithm extracts theuseful knowledge from the transactional database havingquantitative values The algorithm combines the fuzzy setconcept with that of Apriori algorithm

Chen et al [11] extended the work of Hong et al andanalysed the fuzzy data mining algorithm on time seriesdata They proposed an approach in which the conceptsof fuzzy sets are used along with the data mining Apriorialgorithm to generate linguistic association rules They usethe fuzzymembership function to convert the time series datato fuzzy set and then apply the Apriori algorithm to generateassociation rules accordingly They specified the validationand finding of applications for their algorithm as the futurework So with the continuation of their work we are usingalgorithm for analysing the commit activity of open sourcesoftware project for finding existing regularity and trend inthe commit activity of OSS project

Suresh and Raimond [26] extended the work of Chen etal [11]They proposed a new algorithm called extended fuzzyfrequent pattern algorithm for analysing the time series dataThe association rules are generated without generating thecandidate sets

This paper uses a fuzzy data mining approach on timeseries [11] to analyse the commit activity in open sourcesoftware projectsThe research questions that this study aimsto answer are (1) analysing the commit activity of OSS projectfor finding the trend and regularity and (2) validating thefuzzy data mining algorithm on OSS project data

3 Methodology

The objective of this empirical study is to investigate thesuitability of the fuzzy data mining method to analyse thenumber of commits per month as a software system evolvesfor finding the regularity and trendThis section describes thedata collection process basic concepts of fuzzy time seriesand the fuzzy data mining method

31 Data Collection The development repository of opensource software project (Eclipse CDT) is obtained from GIT

Hub [27] A repository is downloaded by making the cloneof the original repository onto the local machine by usingGIT Bash [4] A script is written in JAVA to fetch the numberof commits per month for the observation period for allthe software projects The descriptive statistics about thedevelopment repositories of all software projects is shown inTable 1

Eclipse [28] is an integrated development environmentEclipse has base workspace and extendable plug-in forcustomizing the development environment It is used todevelop application in different languages such as JAVACC++ COBOL and PHP by using available plug-in Thetwo variations Eclipse SDK and Eclipse CDT are well knownfor developing applications Eclipse SDK is compatible withJAVA and used by JAVA developers for building project onEclipse platform Eclipse CDT provides CC++ developmenttooling [29] Eclipse CDT allows developing application inCC++ using Eclipse Eclipse CDT provides various features[30] such as full featured editor debugging refactoringparser and indexes In this study we consider Eclipse CDTonlyThe number of commits for eachmonth from 6272002to 9232013 is arrangedmonth-wise to form a time series (for136 months) shown in Figure 1

32 Basics of Fuzzy Set Theory and Fuzzy Time Series Theconcept of fuzzy set was introduced by Zadeh [31] in 1965It was the extension of classical set theory The fuzzy setis characterized by degree of membership function [31]The membership function can be of various forms such astriangular 119871-function 119877-function and trapezoidal functionused depending upon the application and requirement Thepresent study uses both 119871 and 119877 membership function todefine the values of fuzzy variable

33 Apriori Algorithm Apriori algorithm [32 33] is one of theclassical algorithms proposed by R Srikant and R Agrawal in1994 for finding frequent patterns for generating associationrules Apriori employs an iterative approach known as level-wise search where 119896-itemsets are used to explore (119896 + 1)-itemsets

Apriori algorithm is executed in two steps Firstly itretrieves all the frequent itemsets whose support is notsmaller than the minimum support (min sup) The first stepfurther consists of join and pruning action In joining thecandidate set 119862119896 is produced by joining 119871119896minus1 with itselfIn pruning the candidate sets are pruned by applying theApriori property that is all the nonempty subsets of frequentitemset must also be frequent

The pseudocode for generation of frequent itemsets is asfollows119862119896 Candidate itemset of size 119896119871119896 Frequent itemset of size 1198961198711 = f requent 1-itemset

4 Advances in Fuzzy Systems

For (119896 = 1 119871119896 = B 119896 + +)

119862119896+1 = Join 119871119896 with 119871119896 to generate 119862119896+1119871119896+1 = Candidate in 119862119896+1 with support greater thanor equal to min support

EndReturn 119871119896

Next it uses the frequent itemsets to generate the strong asso-ciation rules satisfying the minimum confidence (min conf)threshold The pseudocode for generation of strong associa-tion rules is as follows

InputFrequent Itemset 119871Minimum confidence threshold min confOutput Strong association rules 119877119877 = Φ

for each frequent itemset 119868 in 119871

for each non-empty subset s of 119868

conf (119878 rarr 119868 minus 119878) = support count (119868)supportcount (119878)if conf gt= min conf

generate strong association ruleRule r = ldquo119904 rarr (119868 minus 119878)rdquo119877 = 119877119880119903

34 Fuzzy DataMining Algorithm for Time Series Data Chenet al [11] extended the work of Hong et al [25] and proposedthe fuzzy datamining for time series dataThe time series dataof 119870 points are entered as input along with the predefinedminimum support 120582 minimum confidence 120572 and windowsize of 119908

The input data is first converted to generate (119870 minus 119908 +1) sequences each subsequence has 119908 elements The fuzzymembership function is used to convert each data iteminto the equivalent fuzzy set The Apriori algorithm is usedto mine frequent fuzzy sets Moreover the data reductionmethod is used to remove the redundant data items

The association rules are generated in the same way asgenerated in Apriori algorithm The stepwise process of thefuzzy Apriori algorithm is given belowInputTime series with119870 data pointsMembership function values ℎ

Minimum support 120582Minimum confidence 120572Sliding window size 119908Fuzzy set 119891119901Output Set of fuzzy association rules

Step 1 Convert the time series data into (119870minus119908+1) sequenceswhere each sequence has maximum of 119908 elements Supposewe assume 119908 = 5 then each sequence has maximum of 5elementsThe elements of subsequence are referred to as datavariable and given as 1198601 1198602 1198603 1198604 and 1198605 respectively

Step 2 Apply fuzzymembership function on elements of timeseries to generate fuzzy set (119891119901)

Step 3 Based on the membership function and its userdefined level (suppose low middle and high) each datavariable after conversion to fuzzy item lies in different userdefined levels (such as 1198601Low 1198602Middle and 1198603High referredto as fuzzy items)

Step 3 Calculate the scalar cardinality count of each fuzzyitem of subsequences

Step 4 Compare the total scalar cardinality count of each sub-sequence with the minimum support value The sequenceswith value greater than or equal to 120572 are kept in 1198711

The support value of subsequence is generated as

Support value = Count119870 minus 119908 + 1

(1)

Step 5 If 1198711 = NULL then exit else do the following step for119903 = 1 to 119870

Step 6 Join 119871119903 with 119871119903 to generate candidate (119903 + 1) fuzzyitemset (ie 119862119903+1) (similar to Apriori algorithm except notjoining the items generated from same order of data point andjoin is possible only if (119903 minus 1) data items in both sets are thesame)

(i) Calculate the fuzzy value of each candidate fuzzyitemset by using fuzzy set theory that is Min(1198911Λ1198912Λ 119891119896)

(ii) Count the scalar cardinality of each fuzzy candidateitemset

(iii) If count is more than or equal to 120572 then put it in 119871119903+1and calculate its support value as

Support value = Count119870 minus 119908 + 1

(2)

Step 7 If 119871119903+1 = NULL then exit otherwise go to Step 6again

Step 8 Remove redundant large itemset (ie by shifting eachlarge itemsets (1198681 1198682 1198683 119868119902) into (119868

1015840

1 1198681015840

2 1198681015840

3 119868

1015840

119902) such that

fuzzy region 11987711 becomes 119877101584011when shifted)

Advances in Fuzzy Systems 5

Table 2 Fuzzy candidate item sets in 1198622

1198601Low cap 1198602Low 119860

1Low cap 1198602Middle 1198601Low cap 1198603Low 119860

1LowΛ1198603Middle 1198601LowΛ1198604Low

1198601Low cap 1198604Middle 1198601Low cap 1198605Low 1198601Low cap 1198605Middle 1198601Middle cap 1198602Low 1198601Middle cap 1198602Middle

1198601Middle cap 1198603Low 1198601Middle cap 1198603Middle 1198601Middle cap 1198604Low 1198601Middle cap 1198604Middle 1198601Middle cap 1198605Low

1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198603Middle 1198602Low cap 1198604Low 1198602Low cap 1198604Middle

1198602Low cap 1198605Low 1198602Low cap 1198605Middle 1198602Middle cap 1198603Low 1198602Middle cap 1198603Middle 1198602Middle cap 1198604Low

1198602Middle cap 1198604Middle 1198602Middle cap 1198605Low 1198602Middle cap 1198605Middle 1198603Low cap 1198604Low 1198603Low cap 1198604Middle

1198603Low cap 1198605Low 1198603Low cap 1198605Middle 1198603Middle cap 1198604Low 1198603Middle cap 1198604Middle 1198603Middle cap 1198605Low

1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Low cap 1198605Middle 1198604Middle cap 1198605Low 1198604Middle cap 1198605Middle

Low Middle High

0

1

0 100 250 300 450

Figure 2 Membership function

Step 9 Generate the association rules using Apriori rulegeneration method and calculate the confidence of each ruleThe only variation is that in place of normal data itemsetwe have fuzzy itemset so concept of intersection is used inrule not union If confidence value is not less than minimumconfidence 120572 then keep the rule otherwise reject

4 Experimental Setup

The experiment is performed on x86 Family 6 Model 15Stepping 6 GenuineIntel sim2131Mhz Processor 1 GB RAMMicrosoft Windows XP Professional operating system ver-sion 512600 Service Pack 3 Build 2600 240GB hard diskThe fuzzy data mining algorithm for time series data isimplemented using JAVA

5 Results and Analysis

The dataset is divided into two subsets training dataset (120months) and remaining dataset (16 months) The completeprocess of generating the association rule using fuzzy datamining algorithm is performed in two steps

(A) The association rules are generated on trainingdataset using fuzzy data mining for time series dataalgorithm

(B) The validation of generated association rule is doneusing training and remaining dataset

51 Generation of Association Rules from Training DatasetThe fuzzy data mining algorithm of time series [11] is appliedon training dataset to generate association rule We assumed119908 = 5 120582 = 25 (25100 lowast number of subsequences) and120572 = 65 membership function and its values are shownin Figure 2 These assumptions are made by referring to and

understanding the concepts of fuzzy data mining algorithmgiven in [11] The commits are divided according to the levelof activity The commits in the range of 0ndash100 250ndash300 and450 onwards indicate the lowmiddle (average) and high levelof activity respectively The level of commit activity indicatesthe amount of work done in a commit

(i) Generation of Subsequences

We have 119870 = 120 (as number of months)119908 = 5 (window size)Thenumber of subsequences is generated as (119870minus119908+1)Therefore number of subsequences = (119870 minus 119908 + 1) =(120 minus 5 + 1) = 116

All generated subsequences for 119870 = 120 (shown inTable 8 of the Appendix)

(ii) Transformation of Data to Fuzzy Sets In this step wetransform the commit activity data into fuzzy set (usingmembership function) and count the scalar cardinality ofeach data variable (shown in Table 9 of the Appendix)

All these data variables are considered as candidateitemsets (1198621)For generation of 1198711 data variables having countmore than 120582 = 25 (25100 lowast 116) = 29 areconsideredIt is found that 1198711 contain 1198601Low 1198601Middle 1198602Low1198602Middle 1198603Low 1198603Middle 1198604Low 1198604Middle 1198605Lowand 1198605Middle

Further calculate support value of each candidate itemsetusing

Support value = Count119870 minus 119908 + 1

(3)

(iii) Generation of 1198622 and 1198712 For generating 1198622 (shown inTable 2) join1198711with1198711 not joining the items generated fromsame order of data points that is1198601Low join with1198601Middle isnot allowed similar to all others also Meanwhile joining allthe properties of Apriori algorithm is used

Next useMin (1198911cap1198912) function to find the value of each ofthe candidate fuzzy sets Count the scalar cardinality of eachof the candidate sets in 1198622

6 Advances in Fuzzy Systems

Table 3 Fuzzy item sets in 1198712

1198601Low cap 1198602Low 1198601Low cap 1198603Low 1198601Low cap 1198604Middle 1198601Middle cap 1198602Middle 1198601Middlecap3Middle

1198601Middlecap4Middle 1198601Middle cap 1198605Low 1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198604Low

1198602Middlecap3Middle 119860

2Middle cap 1198604Middle 1198602Middle cap 1198605Middle 119860

3Low cap 1198604Low 1198603Low cap 1198605Low

1198603MiddleΛ1198604Middle 1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Middle cap 1198605Middle

Table 4 Fuzzy item sets in 1198716

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

1198602Middle cap 1198603Middle cap 1198604Middle 1198602Middle cap 1198603Middle cap 1198605Middle 1198602Middle cap 1198604Middle cap 1198605Middle

1198603Middle cap 1198604Middle cap 1198605Middle

Table 5 Largest fuzzy item sets generated

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

The candidate fuzzy sets with a value not less than thethreshold value are kept in 1198712 and also find their supportvalue Now 1198712 contain fuzzy itemsets shown in Table 3

(iv) Generation of 1198623 and 1198713 For generating 1198623 join 1198712 with1198712 not joining the items generated from same order of datapoints In case of 1198623 join is possible between only thosefuzzy sets where at least one data item is common in bothAfter generation of 1198623 only those fuzzy itemsets are put in1198713 (shown in Table 4) whose count is not less than thresholdvalue

(v) Generation of 1198624 Join 1198713 with 1198713 not joining the itemsgenerated from same order of data point Join is possible onlyfor those fuzzy sets where at least two data items are commonin both

After generation of 1198624 it is found that no element hascount more than threshold value therefore 1198714 = Null

(vi) Removal of Redundant Large ItemsetsRemove the redun-dant large itemsets from 1198713 using Step 8 of the algorithmdescribed in Section 34

After applying Step 8 of algorithm we are left with theselarge itemsets (shown in Table 5)

(vii) Generate Association Rules Generate the associationrules from these large itemsets using Step 8 of the algorithmdescribed in Section 34 All the generated rules are shown inTable 6 where strong rules having count more than thresholdconfidence are marked as bold

In this experiment we use 120572 = 65 it means only thoserules are valid (and are called strong association rules) andhave support value not less than 65 We found 18 rulesin this case each rule acts as the knowledge base for theproject manager and developers of the project For examplethe generated rule A1Middle cap A2Middle rarr A3Middle specifiesthat if the value of first and second data point lies in themiddle then there is high probability that the third data pointhas middle value also All these rules act as a precise andcompact knowledge for project manager and analyst

52 Validation of Association Rules Using Training andRemaining Dataset

(i) Validation on Training Dataset It is found that 65 ofthe transactions in the Eclipse CDT training set follow theserules These rules act as a compact and concrete knowledgeof this data

(ii) Validation on Remaining Dataset All generated associ-ation rules are tested on remaining dataset values to findthe regularity and trend in the number of commits analysedto find whether the commits are performed at same rate ornot If the rate is same then it means Eclipse has consistentgrowth otherwise there exists a variation in the consideredsoftware growth It is found that in case of remainingset the applicability of these rules decreases This thing isagain verified by generating the association rules from theremaining set The generated rules from remaining datasetspecify that the data in the remaining dataset is more towardslower range of commits performed

Following are the factors due to which this variation inthe commit rate is found

(a) Most of the values in the remaining dataset aretowards low rangeThis point is verified by finding thelargest frequent fuzzy itemsets and then generatingthe association rules from the remaining dataset byusing fuzzy data mining for time series data algo-rithmThe largest frequent itemset found is shown inTable 7

The generated frequent itemsets consist of only datapoints with low range Hence most of the entriesof commits in remaining datasets are probably lowThis specifies that the numbers of commits anal-ysed from 612012 to 9232013 is less as com-pared to the number of commits analysed in thetraining dataset It specifies the less activity in thedevelopment of considered software growth in thisperiod

Advances in Fuzzy Systems 7

Table 6 Calculated confidence value of various association rules

A1Middle cap A2Middle rarr A3Middle 0741 741198603Middle rarr 1198601Middle cap 1198602Middle 0522 52A1Middle cap A3Middle rarr A2Middle 0781 781198602Middle rarr 1198601Middle cap 1198603Middle 0526 53

A2Middle cap A3Middle rarr A1Middle 0735 741198601Middle rarr 1198602Middle cap 1198603Middle 0529 53A1Middle cap A2Middle rarr A4Middle 0692 691198604Middle rarr 1198601Middle cap 1198602Middle 0491 49A1Middle cap A4Middle rarr A2Middle 0762 761198602Middle rarr 1198601Middle cap 1198604Middle 0491 49A2Middle cap A4Middle rarr A1Middle 0724 721198601Middle rarr 1198602Middle cap 1198604Middle 0494 49A1Middle cap A2Middle rarr A5Middle 0695 691198605Middle rarr 1198601Middle cap 1198602Middle 0497 50

A1Middle cap A5Middle rarr A2Middle 0737 741198602Middle rarr 1198601Middle cap 1198605Middle 0493 49A2Middle cap A5Middle rarr A1Middle 0759 761198601Middle rarr 1198602Middle cap 1198605Middle 0496 50A1Middle cap A3Middle rarr A4Middle 0742 741198604Middle rarr 1198601Middle cap 1198603Middle 0499 50A1Middle cap A4Middle rarr A3Middle 0775 781198603Middle rarr 1198601Middle cap 1198604Middle 0496 50A3Middle cap A4Middle rarr A1Middle 0691 691198601Middle rarr 1198603Middle cap 1198604Middle 0502 50

A1Middle cap A3Middle rarr A5Middle 0751 751198605Middle rarr 1198601Middle cap 1198603Middle 0510 51A1Middle cap A5Middle rarr A3Middle 0757 761198603Middle rarr 1198601Middle cap 1198605Middle 0502 50A3Middle cap A5Middle rarr A1Middle 0738 741198601Middle rarr 1198603Middle cap 1198605Middle 0509 51A1Middle cap A4Middle rarr A5Middle 0792 791198605Middle rarr 1198601Middle cap 1198604Middle 0515 51A1Middle cap A5Middle rarr A4Middle 0764 761198604Middle rarr 1198601Middle cap 1198605Middle 0511 51A4Middle cap A5Middle rarr A1Middle 0715 711198601Middle rarr 1198604Middle cap 1198605Middle 0513 51

Table 7 Largest frequent fuzzy item set for remaining dataset

1198601Low cap 1198602Low cap 1198603Low cap 1198604Low

1198601Low cap 1198602Low cap 1198603Low cap 1198605Low

1198601Low cap 1198602Low cap 1198604Low cap 1198605Low

1198601Low cap 1198603Low cap 1198604Low cap 1198605Low

(b) Factors like the number of active users and numberof files changed (addition deletion andmodification)are less

6 Discussion

The fuzzy data mining algorithm for time series data [11]allows efficient mining of the association rules from the large

Table 8 Generated subsequences

Sp Subsequences1 9 25 252 240 3612 25 252 240 361 2983 252 240 361 298 1184 240 361 298 118 2945 361 298 118 294 2196 298 118 294 219 1937 118 294 219 193 3598 294 219 193 359 969 219 193 359 96 15110 193 359 96 151 14111 359 96 151 141 19812 96 151 141 198 29613 151 141 198 296 18314 141 198 296 183 16015 198 296 183 160 11716 296 183 160 117 13917 183 160 117 139 17918 160 117 139 179 25519 117 139 179 255 35120 139 179 255 351 32021 179 255 351 320 44022 255 351 320 440 21923 351 320 440 219 26024 320 440 219 260 25425 440 219 260 254 22726 219 260 254 227 36327 260 254 227 363 18528 254 227 363 185 20029 227 363 185 200 25430 363 185 200 254 17531 185 200 254 175 25332 200 254 175 253 22133 254 175 253 221 28134 175 253 221 281 26335 253 221 281 263 8736 221 281 263 87 8337 281 263 87 83 4238 263 87 83 42 8039 87 83 42 80 4440 83 42 80 44 9941 42 80 44 99 7142 80 44 99 71 8343 44 99 71 83 15044 99 71 83 150 12045 71 83 150 120 6046 83 150 120 60 13247 150 120 60 132 12748 120 60 132 127 15149 60 132 127 151 10950 132 127 151 109 141

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

4 Advances in Fuzzy Systems

For (119896 = 1 119871119896 = B 119896 + +)

119862119896+1 = Join 119871119896 with 119871119896 to generate 119862119896+1119871119896+1 = Candidate in 119862119896+1 with support greater thanor equal to min support

EndReturn 119871119896

Next it uses the frequent itemsets to generate the strong asso-ciation rules satisfying the minimum confidence (min conf)threshold The pseudocode for generation of strong associa-tion rules is as follows

InputFrequent Itemset 119871Minimum confidence threshold min confOutput Strong association rules 119877119877 = Φ

for each frequent itemset 119868 in 119871

for each non-empty subset s of 119868

conf (119878 rarr 119868 minus 119878) = support count (119868)supportcount (119878)if conf gt= min conf

generate strong association ruleRule r = ldquo119904 rarr (119868 minus 119878)rdquo119877 = 119877119880119903

34 Fuzzy DataMining Algorithm for Time Series Data Chenet al [11] extended the work of Hong et al [25] and proposedthe fuzzy datamining for time series dataThe time series dataof 119870 points are entered as input along with the predefinedminimum support 120582 minimum confidence 120572 and windowsize of 119908

The input data is first converted to generate (119870 minus 119908 +1) sequences each subsequence has 119908 elements The fuzzymembership function is used to convert each data iteminto the equivalent fuzzy set The Apriori algorithm is usedto mine frequent fuzzy sets Moreover the data reductionmethod is used to remove the redundant data items

The association rules are generated in the same way asgenerated in Apriori algorithm The stepwise process of thefuzzy Apriori algorithm is given belowInputTime series with119870 data pointsMembership function values ℎ

Minimum support 120582Minimum confidence 120572Sliding window size 119908Fuzzy set 119891119901Output Set of fuzzy association rules

Step 1 Convert the time series data into (119870minus119908+1) sequenceswhere each sequence has maximum of 119908 elements Supposewe assume 119908 = 5 then each sequence has maximum of 5elementsThe elements of subsequence are referred to as datavariable and given as 1198601 1198602 1198603 1198604 and 1198605 respectively

Step 2 Apply fuzzymembership function on elements of timeseries to generate fuzzy set (119891119901)

Step 3 Based on the membership function and its userdefined level (suppose low middle and high) each datavariable after conversion to fuzzy item lies in different userdefined levels (such as 1198601Low 1198602Middle and 1198603High referredto as fuzzy items)

Step 3 Calculate the scalar cardinality count of each fuzzyitem of subsequences

Step 4 Compare the total scalar cardinality count of each sub-sequence with the minimum support value The sequenceswith value greater than or equal to 120572 are kept in 1198711

The support value of subsequence is generated as

Support value = Count119870 minus 119908 + 1

(1)

Step 5 If 1198711 = NULL then exit else do the following step for119903 = 1 to 119870

Step 6 Join 119871119903 with 119871119903 to generate candidate (119903 + 1) fuzzyitemset (ie 119862119903+1) (similar to Apriori algorithm except notjoining the items generated from same order of data point andjoin is possible only if (119903 minus 1) data items in both sets are thesame)

(i) Calculate the fuzzy value of each candidate fuzzyitemset by using fuzzy set theory that is Min(1198911Λ1198912Λ 119891119896)

(ii) Count the scalar cardinality of each fuzzy candidateitemset

(iii) If count is more than or equal to 120572 then put it in 119871119903+1and calculate its support value as

Support value = Count119870 minus 119908 + 1

(2)

Step 7 If 119871119903+1 = NULL then exit otherwise go to Step 6again

Step 8 Remove redundant large itemset (ie by shifting eachlarge itemsets (1198681 1198682 1198683 119868119902) into (119868

1015840

1 1198681015840

2 1198681015840

3 119868

1015840

119902) such that

fuzzy region 11987711 becomes 119877101584011when shifted)

Advances in Fuzzy Systems 5

Table 2 Fuzzy candidate item sets in 1198622

1198601Low cap 1198602Low 119860

1Low cap 1198602Middle 1198601Low cap 1198603Low 119860

1LowΛ1198603Middle 1198601LowΛ1198604Low

1198601Low cap 1198604Middle 1198601Low cap 1198605Low 1198601Low cap 1198605Middle 1198601Middle cap 1198602Low 1198601Middle cap 1198602Middle

1198601Middle cap 1198603Low 1198601Middle cap 1198603Middle 1198601Middle cap 1198604Low 1198601Middle cap 1198604Middle 1198601Middle cap 1198605Low

1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198603Middle 1198602Low cap 1198604Low 1198602Low cap 1198604Middle

1198602Low cap 1198605Low 1198602Low cap 1198605Middle 1198602Middle cap 1198603Low 1198602Middle cap 1198603Middle 1198602Middle cap 1198604Low

1198602Middle cap 1198604Middle 1198602Middle cap 1198605Low 1198602Middle cap 1198605Middle 1198603Low cap 1198604Low 1198603Low cap 1198604Middle

1198603Low cap 1198605Low 1198603Low cap 1198605Middle 1198603Middle cap 1198604Low 1198603Middle cap 1198604Middle 1198603Middle cap 1198605Low

1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Low cap 1198605Middle 1198604Middle cap 1198605Low 1198604Middle cap 1198605Middle

Low Middle High

0

1

0 100 250 300 450

Figure 2 Membership function

Step 9 Generate the association rules using Apriori rulegeneration method and calculate the confidence of each ruleThe only variation is that in place of normal data itemsetwe have fuzzy itemset so concept of intersection is used inrule not union If confidence value is not less than minimumconfidence 120572 then keep the rule otherwise reject

4 Experimental Setup

The experiment is performed on x86 Family 6 Model 15Stepping 6 GenuineIntel sim2131Mhz Processor 1 GB RAMMicrosoft Windows XP Professional operating system ver-sion 512600 Service Pack 3 Build 2600 240GB hard diskThe fuzzy data mining algorithm for time series data isimplemented using JAVA

5 Results and Analysis

The dataset is divided into two subsets training dataset (120months) and remaining dataset (16 months) The completeprocess of generating the association rule using fuzzy datamining algorithm is performed in two steps

(A) The association rules are generated on trainingdataset using fuzzy data mining for time series dataalgorithm

(B) The validation of generated association rule is doneusing training and remaining dataset

51 Generation of Association Rules from Training DatasetThe fuzzy data mining algorithm of time series [11] is appliedon training dataset to generate association rule We assumed119908 = 5 120582 = 25 (25100 lowast number of subsequences) and120572 = 65 membership function and its values are shownin Figure 2 These assumptions are made by referring to and

understanding the concepts of fuzzy data mining algorithmgiven in [11] The commits are divided according to the levelof activity The commits in the range of 0ndash100 250ndash300 and450 onwards indicate the lowmiddle (average) and high levelof activity respectively The level of commit activity indicatesthe amount of work done in a commit

(i) Generation of Subsequences

We have 119870 = 120 (as number of months)119908 = 5 (window size)Thenumber of subsequences is generated as (119870minus119908+1)Therefore number of subsequences = (119870 minus 119908 + 1) =(120 minus 5 + 1) = 116

All generated subsequences for 119870 = 120 (shown inTable 8 of the Appendix)

(ii) Transformation of Data to Fuzzy Sets In this step wetransform the commit activity data into fuzzy set (usingmembership function) and count the scalar cardinality ofeach data variable (shown in Table 9 of the Appendix)

All these data variables are considered as candidateitemsets (1198621)For generation of 1198711 data variables having countmore than 120582 = 25 (25100 lowast 116) = 29 areconsideredIt is found that 1198711 contain 1198601Low 1198601Middle 1198602Low1198602Middle 1198603Low 1198603Middle 1198604Low 1198604Middle 1198605Lowand 1198605Middle

Further calculate support value of each candidate itemsetusing

Support value = Count119870 minus 119908 + 1

(3)

(iii) Generation of 1198622 and 1198712 For generating 1198622 (shown inTable 2) join1198711with1198711 not joining the items generated fromsame order of data points that is1198601Low join with1198601Middle isnot allowed similar to all others also Meanwhile joining allthe properties of Apriori algorithm is used

Next useMin (1198911cap1198912) function to find the value of each ofthe candidate fuzzy sets Count the scalar cardinality of eachof the candidate sets in 1198622

6 Advances in Fuzzy Systems

Table 3 Fuzzy item sets in 1198712

1198601Low cap 1198602Low 1198601Low cap 1198603Low 1198601Low cap 1198604Middle 1198601Middle cap 1198602Middle 1198601Middlecap3Middle

1198601Middlecap4Middle 1198601Middle cap 1198605Low 1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198604Low

1198602Middlecap3Middle 119860

2Middle cap 1198604Middle 1198602Middle cap 1198605Middle 119860

3Low cap 1198604Low 1198603Low cap 1198605Low

1198603MiddleΛ1198604Middle 1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Middle cap 1198605Middle

Table 4 Fuzzy item sets in 1198716

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

1198602Middle cap 1198603Middle cap 1198604Middle 1198602Middle cap 1198603Middle cap 1198605Middle 1198602Middle cap 1198604Middle cap 1198605Middle

1198603Middle cap 1198604Middle cap 1198605Middle

Table 5 Largest fuzzy item sets generated

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

The candidate fuzzy sets with a value not less than thethreshold value are kept in 1198712 and also find their supportvalue Now 1198712 contain fuzzy itemsets shown in Table 3

(iv) Generation of 1198623 and 1198713 For generating 1198623 join 1198712 with1198712 not joining the items generated from same order of datapoints In case of 1198623 join is possible between only thosefuzzy sets where at least one data item is common in bothAfter generation of 1198623 only those fuzzy itemsets are put in1198713 (shown in Table 4) whose count is not less than thresholdvalue

(v) Generation of 1198624 Join 1198713 with 1198713 not joining the itemsgenerated from same order of data point Join is possible onlyfor those fuzzy sets where at least two data items are commonin both

After generation of 1198624 it is found that no element hascount more than threshold value therefore 1198714 = Null

(vi) Removal of Redundant Large ItemsetsRemove the redun-dant large itemsets from 1198713 using Step 8 of the algorithmdescribed in Section 34

After applying Step 8 of algorithm we are left with theselarge itemsets (shown in Table 5)

(vii) Generate Association Rules Generate the associationrules from these large itemsets using Step 8 of the algorithmdescribed in Section 34 All the generated rules are shown inTable 6 where strong rules having count more than thresholdconfidence are marked as bold

In this experiment we use 120572 = 65 it means only thoserules are valid (and are called strong association rules) andhave support value not less than 65 We found 18 rulesin this case each rule acts as the knowledge base for theproject manager and developers of the project For examplethe generated rule A1Middle cap A2Middle rarr A3Middle specifiesthat if the value of first and second data point lies in themiddle then there is high probability that the third data pointhas middle value also All these rules act as a precise andcompact knowledge for project manager and analyst

52 Validation of Association Rules Using Training andRemaining Dataset

(i) Validation on Training Dataset It is found that 65 ofthe transactions in the Eclipse CDT training set follow theserules These rules act as a compact and concrete knowledgeof this data

(ii) Validation on Remaining Dataset All generated associ-ation rules are tested on remaining dataset values to findthe regularity and trend in the number of commits analysedto find whether the commits are performed at same rate ornot If the rate is same then it means Eclipse has consistentgrowth otherwise there exists a variation in the consideredsoftware growth It is found that in case of remainingset the applicability of these rules decreases This thing isagain verified by generating the association rules from theremaining set The generated rules from remaining datasetspecify that the data in the remaining dataset is more towardslower range of commits performed

Following are the factors due to which this variation inthe commit rate is found

(a) Most of the values in the remaining dataset aretowards low rangeThis point is verified by finding thelargest frequent fuzzy itemsets and then generatingthe association rules from the remaining dataset byusing fuzzy data mining for time series data algo-rithmThe largest frequent itemset found is shown inTable 7

The generated frequent itemsets consist of only datapoints with low range Hence most of the entriesof commits in remaining datasets are probably lowThis specifies that the numbers of commits anal-ysed from 612012 to 9232013 is less as com-pared to the number of commits analysed in thetraining dataset It specifies the less activity in thedevelopment of considered software growth in thisperiod

Advances in Fuzzy Systems 7

Table 6 Calculated confidence value of various association rules

A1Middle cap A2Middle rarr A3Middle 0741 741198603Middle rarr 1198601Middle cap 1198602Middle 0522 52A1Middle cap A3Middle rarr A2Middle 0781 781198602Middle rarr 1198601Middle cap 1198603Middle 0526 53

A2Middle cap A3Middle rarr A1Middle 0735 741198601Middle rarr 1198602Middle cap 1198603Middle 0529 53A1Middle cap A2Middle rarr A4Middle 0692 691198604Middle rarr 1198601Middle cap 1198602Middle 0491 49A1Middle cap A4Middle rarr A2Middle 0762 761198602Middle rarr 1198601Middle cap 1198604Middle 0491 49A2Middle cap A4Middle rarr A1Middle 0724 721198601Middle rarr 1198602Middle cap 1198604Middle 0494 49A1Middle cap A2Middle rarr A5Middle 0695 691198605Middle rarr 1198601Middle cap 1198602Middle 0497 50

A1Middle cap A5Middle rarr A2Middle 0737 741198602Middle rarr 1198601Middle cap 1198605Middle 0493 49A2Middle cap A5Middle rarr A1Middle 0759 761198601Middle rarr 1198602Middle cap 1198605Middle 0496 50A1Middle cap A3Middle rarr A4Middle 0742 741198604Middle rarr 1198601Middle cap 1198603Middle 0499 50A1Middle cap A4Middle rarr A3Middle 0775 781198603Middle rarr 1198601Middle cap 1198604Middle 0496 50A3Middle cap A4Middle rarr A1Middle 0691 691198601Middle rarr 1198603Middle cap 1198604Middle 0502 50

A1Middle cap A3Middle rarr A5Middle 0751 751198605Middle rarr 1198601Middle cap 1198603Middle 0510 51A1Middle cap A5Middle rarr A3Middle 0757 761198603Middle rarr 1198601Middle cap 1198605Middle 0502 50A3Middle cap A5Middle rarr A1Middle 0738 741198601Middle rarr 1198603Middle cap 1198605Middle 0509 51A1Middle cap A4Middle rarr A5Middle 0792 791198605Middle rarr 1198601Middle cap 1198604Middle 0515 51A1Middle cap A5Middle rarr A4Middle 0764 761198604Middle rarr 1198601Middle cap 1198605Middle 0511 51A4Middle cap A5Middle rarr A1Middle 0715 711198601Middle rarr 1198604Middle cap 1198605Middle 0513 51

Table 7 Largest frequent fuzzy item set for remaining dataset

1198601Low cap 1198602Low cap 1198603Low cap 1198604Low

1198601Low cap 1198602Low cap 1198603Low cap 1198605Low

1198601Low cap 1198602Low cap 1198604Low cap 1198605Low

1198601Low cap 1198603Low cap 1198604Low cap 1198605Low

(b) Factors like the number of active users and numberof files changed (addition deletion andmodification)are less

6 Discussion

The fuzzy data mining algorithm for time series data [11]allows efficient mining of the association rules from the large

Table 8 Generated subsequences

Sp Subsequences1 9 25 252 240 3612 25 252 240 361 2983 252 240 361 298 1184 240 361 298 118 2945 361 298 118 294 2196 298 118 294 219 1937 118 294 219 193 3598 294 219 193 359 969 219 193 359 96 15110 193 359 96 151 14111 359 96 151 141 19812 96 151 141 198 29613 151 141 198 296 18314 141 198 296 183 16015 198 296 183 160 11716 296 183 160 117 13917 183 160 117 139 17918 160 117 139 179 25519 117 139 179 255 35120 139 179 255 351 32021 179 255 351 320 44022 255 351 320 440 21923 351 320 440 219 26024 320 440 219 260 25425 440 219 260 254 22726 219 260 254 227 36327 260 254 227 363 18528 254 227 363 185 20029 227 363 185 200 25430 363 185 200 254 17531 185 200 254 175 25332 200 254 175 253 22133 254 175 253 221 28134 175 253 221 281 26335 253 221 281 263 8736 221 281 263 87 8337 281 263 87 83 4238 263 87 83 42 8039 87 83 42 80 4440 83 42 80 44 9941 42 80 44 99 7142 80 44 99 71 8343 44 99 71 83 15044 99 71 83 150 12045 71 83 150 120 6046 83 150 120 60 13247 150 120 60 132 12748 120 60 132 127 15149 60 132 127 151 10950 132 127 151 109 141

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in Fuzzy Systems 5

Table 2 Fuzzy candidate item sets in 1198622

1198601Low cap 1198602Low 119860

1Low cap 1198602Middle 1198601Low cap 1198603Low 119860

1LowΛ1198603Middle 1198601LowΛ1198604Low

1198601Low cap 1198604Middle 1198601Low cap 1198605Low 1198601Low cap 1198605Middle 1198601Middle cap 1198602Low 1198601Middle cap 1198602Middle

1198601Middle cap 1198603Low 1198601Middle cap 1198603Middle 1198601Middle cap 1198604Low 1198601Middle cap 1198604Middle 1198601Middle cap 1198605Low

1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198603Middle 1198602Low cap 1198604Low 1198602Low cap 1198604Middle

1198602Low cap 1198605Low 1198602Low cap 1198605Middle 1198602Middle cap 1198603Low 1198602Middle cap 1198603Middle 1198602Middle cap 1198604Low

1198602Middle cap 1198604Middle 1198602Middle cap 1198605Low 1198602Middle cap 1198605Middle 1198603Low cap 1198604Low 1198603Low cap 1198604Middle

1198603Low cap 1198605Low 1198603Low cap 1198605Middle 1198603Middle cap 1198604Low 1198603Middle cap 1198604Middle 1198603Middle cap 1198605Low

1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Low cap 1198605Middle 1198604Middle cap 1198605Low 1198604Middle cap 1198605Middle

Low Middle High

0

1

0 100 250 300 450

Figure 2 Membership function

Step 9 Generate the association rules using Apriori rulegeneration method and calculate the confidence of each ruleThe only variation is that in place of normal data itemsetwe have fuzzy itemset so concept of intersection is used inrule not union If confidence value is not less than minimumconfidence 120572 then keep the rule otherwise reject

4 Experimental Setup

The experiment is performed on x86 Family 6 Model 15Stepping 6 GenuineIntel sim2131Mhz Processor 1 GB RAMMicrosoft Windows XP Professional operating system ver-sion 512600 Service Pack 3 Build 2600 240GB hard diskThe fuzzy data mining algorithm for time series data isimplemented using JAVA

5 Results and Analysis

The dataset is divided into two subsets training dataset (120months) and remaining dataset (16 months) The completeprocess of generating the association rule using fuzzy datamining algorithm is performed in two steps

(A) The association rules are generated on trainingdataset using fuzzy data mining for time series dataalgorithm

(B) The validation of generated association rule is doneusing training and remaining dataset

51 Generation of Association Rules from Training DatasetThe fuzzy data mining algorithm of time series [11] is appliedon training dataset to generate association rule We assumed119908 = 5 120582 = 25 (25100 lowast number of subsequences) and120572 = 65 membership function and its values are shownin Figure 2 These assumptions are made by referring to and

understanding the concepts of fuzzy data mining algorithmgiven in [11] The commits are divided according to the levelof activity The commits in the range of 0ndash100 250ndash300 and450 onwards indicate the lowmiddle (average) and high levelof activity respectively The level of commit activity indicatesthe amount of work done in a commit

(i) Generation of Subsequences

We have 119870 = 120 (as number of months)119908 = 5 (window size)Thenumber of subsequences is generated as (119870minus119908+1)Therefore number of subsequences = (119870 minus 119908 + 1) =(120 minus 5 + 1) = 116

All generated subsequences for 119870 = 120 (shown inTable 8 of the Appendix)

(ii) Transformation of Data to Fuzzy Sets In this step wetransform the commit activity data into fuzzy set (usingmembership function) and count the scalar cardinality ofeach data variable (shown in Table 9 of the Appendix)

All these data variables are considered as candidateitemsets (1198621)For generation of 1198711 data variables having countmore than 120582 = 25 (25100 lowast 116) = 29 areconsideredIt is found that 1198711 contain 1198601Low 1198601Middle 1198602Low1198602Middle 1198603Low 1198603Middle 1198604Low 1198604Middle 1198605Lowand 1198605Middle

Further calculate support value of each candidate itemsetusing

Support value = Count119870 minus 119908 + 1

(3)

(iii) Generation of 1198622 and 1198712 For generating 1198622 (shown inTable 2) join1198711with1198711 not joining the items generated fromsame order of data points that is1198601Low join with1198601Middle isnot allowed similar to all others also Meanwhile joining allthe properties of Apriori algorithm is used

Next useMin (1198911cap1198912) function to find the value of each ofthe candidate fuzzy sets Count the scalar cardinality of eachof the candidate sets in 1198622

6 Advances in Fuzzy Systems

Table 3 Fuzzy item sets in 1198712

1198601Low cap 1198602Low 1198601Low cap 1198603Low 1198601Low cap 1198604Middle 1198601Middle cap 1198602Middle 1198601Middlecap3Middle

1198601Middlecap4Middle 1198601Middle cap 1198605Low 1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198604Low

1198602Middlecap3Middle 119860

2Middle cap 1198604Middle 1198602Middle cap 1198605Middle 119860

3Low cap 1198604Low 1198603Low cap 1198605Low

1198603MiddleΛ1198604Middle 1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Middle cap 1198605Middle

Table 4 Fuzzy item sets in 1198716

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

1198602Middle cap 1198603Middle cap 1198604Middle 1198602Middle cap 1198603Middle cap 1198605Middle 1198602Middle cap 1198604Middle cap 1198605Middle

1198603Middle cap 1198604Middle cap 1198605Middle

Table 5 Largest fuzzy item sets generated

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

The candidate fuzzy sets with a value not less than thethreshold value are kept in 1198712 and also find their supportvalue Now 1198712 contain fuzzy itemsets shown in Table 3

(iv) Generation of 1198623 and 1198713 For generating 1198623 join 1198712 with1198712 not joining the items generated from same order of datapoints In case of 1198623 join is possible between only thosefuzzy sets where at least one data item is common in bothAfter generation of 1198623 only those fuzzy itemsets are put in1198713 (shown in Table 4) whose count is not less than thresholdvalue

(v) Generation of 1198624 Join 1198713 with 1198713 not joining the itemsgenerated from same order of data point Join is possible onlyfor those fuzzy sets where at least two data items are commonin both

After generation of 1198624 it is found that no element hascount more than threshold value therefore 1198714 = Null

(vi) Removal of Redundant Large ItemsetsRemove the redun-dant large itemsets from 1198713 using Step 8 of the algorithmdescribed in Section 34

After applying Step 8 of algorithm we are left with theselarge itemsets (shown in Table 5)

(vii) Generate Association Rules Generate the associationrules from these large itemsets using Step 8 of the algorithmdescribed in Section 34 All the generated rules are shown inTable 6 where strong rules having count more than thresholdconfidence are marked as bold

In this experiment we use 120572 = 65 it means only thoserules are valid (and are called strong association rules) andhave support value not less than 65 We found 18 rulesin this case each rule acts as the knowledge base for theproject manager and developers of the project For examplethe generated rule A1Middle cap A2Middle rarr A3Middle specifiesthat if the value of first and second data point lies in themiddle then there is high probability that the third data pointhas middle value also All these rules act as a precise andcompact knowledge for project manager and analyst

52 Validation of Association Rules Using Training andRemaining Dataset

(i) Validation on Training Dataset It is found that 65 ofthe transactions in the Eclipse CDT training set follow theserules These rules act as a compact and concrete knowledgeof this data

(ii) Validation on Remaining Dataset All generated associ-ation rules are tested on remaining dataset values to findthe regularity and trend in the number of commits analysedto find whether the commits are performed at same rate ornot If the rate is same then it means Eclipse has consistentgrowth otherwise there exists a variation in the consideredsoftware growth It is found that in case of remainingset the applicability of these rules decreases This thing isagain verified by generating the association rules from theremaining set The generated rules from remaining datasetspecify that the data in the remaining dataset is more towardslower range of commits performed

Following are the factors due to which this variation inthe commit rate is found

(a) Most of the values in the remaining dataset aretowards low rangeThis point is verified by finding thelargest frequent fuzzy itemsets and then generatingthe association rules from the remaining dataset byusing fuzzy data mining for time series data algo-rithmThe largest frequent itemset found is shown inTable 7

The generated frequent itemsets consist of only datapoints with low range Hence most of the entriesof commits in remaining datasets are probably lowThis specifies that the numbers of commits anal-ysed from 612012 to 9232013 is less as com-pared to the number of commits analysed in thetraining dataset It specifies the less activity in thedevelopment of considered software growth in thisperiod

Advances in Fuzzy Systems 7

Table 6 Calculated confidence value of various association rules

A1Middle cap A2Middle rarr A3Middle 0741 741198603Middle rarr 1198601Middle cap 1198602Middle 0522 52A1Middle cap A3Middle rarr A2Middle 0781 781198602Middle rarr 1198601Middle cap 1198603Middle 0526 53

A2Middle cap A3Middle rarr A1Middle 0735 741198601Middle rarr 1198602Middle cap 1198603Middle 0529 53A1Middle cap A2Middle rarr A4Middle 0692 691198604Middle rarr 1198601Middle cap 1198602Middle 0491 49A1Middle cap A4Middle rarr A2Middle 0762 761198602Middle rarr 1198601Middle cap 1198604Middle 0491 49A2Middle cap A4Middle rarr A1Middle 0724 721198601Middle rarr 1198602Middle cap 1198604Middle 0494 49A1Middle cap A2Middle rarr A5Middle 0695 691198605Middle rarr 1198601Middle cap 1198602Middle 0497 50

A1Middle cap A5Middle rarr A2Middle 0737 741198602Middle rarr 1198601Middle cap 1198605Middle 0493 49A2Middle cap A5Middle rarr A1Middle 0759 761198601Middle rarr 1198602Middle cap 1198605Middle 0496 50A1Middle cap A3Middle rarr A4Middle 0742 741198604Middle rarr 1198601Middle cap 1198603Middle 0499 50A1Middle cap A4Middle rarr A3Middle 0775 781198603Middle rarr 1198601Middle cap 1198604Middle 0496 50A3Middle cap A4Middle rarr A1Middle 0691 691198601Middle rarr 1198603Middle cap 1198604Middle 0502 50

A1Middle cap A3Middle rarr A5Middle 0751 751198605Middle rarr 1198601Middle cap 1198603Middle 0510 51A1Middle cap A5Middle rarr A3Middle 0757 761198603Middle rarr 1198601Middle cap 1198605Middle 0502 50A3Middle cap A5Middle rarr A1Middle 0738 741198601Middle rarr 1198603Middle cap 1198605Middle 0509 51A1Middle cap A4Middle rarr A5Middle 0792 791198605Middle rarr 1198601Middle cap 1198604Middle 0515 51A1Middle cap A5Middle rarr A4Middle 0764 761198604Middle rarr 1198601Middle cap 1198605Middle 0511 51A4Middle cap A5Middle rarr A1Middle 0715 711198601Middle rarr 1198604Middle cap 1198605Middle 0513 51

Table 7 Largest frequent fuzzy item set for remaining dataset

1198601Low cap 1198602Low cap 1198603Low cap 1198604Low

1198601Low cap 1198602Low cap 1198603Low cap 1198605Low

1198601Low cap 1198602Low cap 1198604Low cap 1198605Low

1198601Low cap 1198603Low cap 1198604Low cap 1198605Low

(b) Factors like the number of active users and numberof files changed (addition deletion andmodification)are less

6 Discussion

The fuzzy data mining algorithm for time series data [11]allows efficient mining of the association rules from the large

Table 8 Generated subsequences

Sp Subsequences1 9 25 252 240 3612 25 252 240 361 2983 252 240 361 298 1184 240 361 298 118 2945 361 298 118 294 2196 298 118 294 219 1937 118 294 219 193 3598 294 219 193 359 969 219 193 359 96 15110 193 359 96 151 14111 359 96 151 141 19812 96 151 141 198 29613 151 141 198 296 18314 141 198 296 183 16015 198 296 183 160 11716 296 183 160 117 13917 183 160 117 139 17918 160 117 139 179 25519 117 139 179 255 35120 139 179 255 351 32021 179 255 351 320 44022 255 351 320 440 21923 351 320 440 219 26024 320 440 219 260 25425 440 219 260 254 22726 219 260 254 227 36327 260 254 227 363 18528 254 227 363 185 20029 227 363 185 200 25430 363 185 200 254 17531 185 200 254 175 25332 200 254 175 253 22133 254 175 253 221 28134 175 253 221 281 26335 253 221 281 263 8736 221 281 263 87 8337 281 263 87 83 4238 263 87 83 42 8039 87 83 42 80 4440 83 42 80 44 9941 42 80 44 99 7142 80 44 99 71 8343 44 99 71 83 15044 99 71 83 150 12045 71 83 150 120 6046 83 150 120 60 13247 150 120 60 132 12748 120 60 132 127 15149 60 132 127 151 10950 132 127 151 109 141

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

6 Advances in Fuzzy Systems

Table 3 Fuzzy item sets in 1198712

1198601Low cap 1198602Low 1198601Low cap 1198603Low 1198601Low cap 1198604Middle 1198601Middle cap 1198602Middle 1198601Middlecap3Middle

1198601Middlecap4Middle 1198601Middle cap 1198605Low 1198601Middle cap 1198605Middle 1198602Low cap 1198603Low 1198602Low cap 1198604Low

1198602Middlecap3Middle 119860

2Middle cap 1198604Middle 1198602Middle cap 1198605Middle 119860

3Low cap 1198604Low 1198603Low cap 1198605Low

1198603MiddleΛ1198604Middle 1198603Middle cap 1198605Middle 1198604Low cap 1198605Low 1198604Middle cap 1198605Middle

Table 4 Fuzzy item sets in 1198716

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

1198602Middle cap 1198603Middle cap 1198604Middle 1198602Middle cap 1198603Middle cap 1198605Middle 1198602Middle cap 1198604Middle cap 1198605Middle

1198603Middle cap 1198604Middle cap 1198605Middle

Table 5 Largest fuzzy item sets generated

1198601Middle cap 1198602Middle cap 1198603Middle 1198601Middle cap 1198602Middle cap 1198604Middle 1198601Middle cap 1198602Middle cap 1198605Middle

1198601Middle cap 1198603Middle cap 1198604Middle 1198601Middle cap 1198603Middle cap 1198605Middle 1198601Middle cap 1198604Middle cap 1198605Middle

The candidate fuzzy sets with a value not less than thethreshold value are kept in 1198712 and also find their supportvalue Now 1198712 contain fuzzy itemsets shown in Table 3

(iv) Generation of 1198623 and 1198713 For generating 1198623 join 1198712 with1198712 not joining the items generated from same order of datapoints In case of 1198623 join is possible between only thosefuzzy sets where at least one data item is common in bothAfter generation of 1198623 only those fuzzy itemsets are put in1198713 (shown in Table 4) whose count is not less than thresholdvalue

(v) Generation of 1198624 Join 1198713 with 1198713 not joining the itemsgenerated from same order of data point Join is possible onlyfor those fuzzy sets where at least two data items are commonin both

After generation of 1198624 it is found that no element hascount more than threshold value therefore 1198714 = Null

(vi) Removal of Redundant Large ItemsetsRemove the redun-dant large itemsets from 1198713 using Step 8 of the algorithmdescribed in Section 34

After applying Step 8 of algorithm we are left with theselarge itemsets (shown in Table 5)

(vii) Generate Association Rules Generate the associationrules from these large itemsets using Step 8 of the algorithmdescribed in Section 34 All the generated rules are shown inTable 6 where strong rules having count more than thresholdconfidence are marked as bold

In this experiment we use 120572 = 65 it means only thoserules are valid (and are called strong association rules) andhave support value not less than 65 We found 18 rulesin this case each rule acts as the knowledge base for theproject manager and developers of the project For examplethe generated rule A1Middle cap A2Middle rarr A3Middle specifiesthat if the value of first and second data point lies in themiddle then there is high probability that the third data pointhas middle value also All these rules act as a precise andcompact knowledge for project manager and analyst

52 Validation of Association Rules Using Training andRemaining Dataset

(i) Validation on Training Dataset It is found that 65 ofthe transactions in the Eclipse CDT training set follow theserules These rules act as a compact and concrete knowledgeof this data

(ii) Validation on Remaining Dataset All generated associ-ation rules are tested on remaining dataset values to findthe regularity and trend in the number of commits analysedto find whether the commits are performed at same rate ornot If the rate is same then it means Eclipse has consistentgrowth otherwise there exists a variation in the consideredsoftware growth It is found that in case of remainingset the applicability of these rules decreases This thing isagain verified by generating the association rules from theremaining set The generated rules from remaining datasetspecify that the data in the remaining dataset is more towardslower range of commits performed

Following are the factors due to which this variation inthe commit rate is found

(a) Most of the values in the remaining dataset aretowards low rangeThis point is verified by finding thelargest frequent fuzzy itemsets and then generatingthe association rules from the remaining dataset byusing fuzzy data mining for time series data algo-rithmThe largest frequent itemset found is shown inTable 7

The generated frequent itemsets consist of only datapoints with low range Hence most of the entriesof commits in remaining datasets are probably lowThis specifies that the numbers of commits anal-ysed from 612012 to 9232013 is less as com-pared to the number of commits analysed in thetraining dataset It specifies the less activity in thedevelopment of considered software growth in thisperiod

Advances in Fuzzy Systems 7

Table 6 Calculated confidence value of various association rules

A1Middle cap A2Middle rarr A3Middle 0741 741198603Middle rarr 1198601Middle cap 1198602Middle 0522 52A1Middle cap A3Middle rarr A2Middle 0781 781198602Middle rarr 1198601Middle cap 1198603Middle 0526 53

A2Middle cap A3Middle rarr A1Middle 0735 741198601Middle rarr 1198602Middle cap 1198603Middle 0529 53A1Middle cap A2Middle rarr A4Middle 0692 691198604Middle rarr 1198601Middle cap 1198602Middle 0491 49A1Middle cap A4Middle rarr A2Middle 0762 761198602Middle rarr 1198601Middle cap 1198604Middle 0491 49A2Middle cap A4Middle rarr A1Middle 0724 721198601Middle rarr 1198602Middle cap 1198604Middle 0494 49A1Middle cap A2Middle rarr A5Middle 0695 691198605Middle rarr 1198601Middle cap 1198602Middle 0497 50

A1Middle cap A5Middle rarr A2Middle 0737 741198602Middle rarr 1198601Middle cap 1198605Middle 0493 49A2Middle cap A5Middle rarr A1Middle 0759 761198601Middle rarr 1198602Middle cap 1198605Middle 0496 50A1Middle cap A3Middle rarr A4Middle 0742 741198604Middle rarr 1198601Middle cap 1198603Middle 0499 50A1Middle cap A4Middle rarr A3Middle 0775 781198603Middle rarr 1198601Middle cap 1198604Middle 0496 50A3Middle cap A4Middle rarr A1Middle 0691 691198601Middle rarr 1198603Middle cap 1198604Middle 0502 50

A1Middle cap A3Middle rarr A5Middle 0751 751198605Middle rarr 1198601Middle cap 1198603Middle 0510 51A1Middle cap A5Middle rarr A3Middle 0757 761198603Middle rarr 1198601Middle cap 1198605Middle 0502 50A3Middle cap A5Middle rarr A1Middle 0738 741198601Middle rarr 1198603Middle cap 1198605Middle 0509 51A1Middle cap A4Middle rarr A5Middle 0792 791198605Middle rarr 1198601Middle cap 1198604Middle 0515 51A1Middle cap A5Middle rarr A4Middle 0764 761198604Middle rarr 1198601Middle cap 1198605Middle 0511 51A4Middle cap A5Middle rarr A1Middle 0715 711198601Middle rarr 1198604Middle cap 1198605Middle 0513 51

Table 7 Largest frequent fuzzy item set for remaining dataset

1198601Low cap 1198602Low cap 1198603Low cap 1198604Low

1198601Low cap 1198602Low cap 1198603Low cap 1198605Low

1198601Low cap 1198602Low cap 1198604Low cap 1198605Low

1198601Low cap 1198603Low cap 1198604Low cap 1198605Low

(b) Factors like the number of active users and numberof files changed (addition deletion andmodification)are less

6 Discussion

The fuzzy data mining algorithm for time series data [11]allows efficient mining of the association rules from the large

Table 8 Generated subsequences

Sp Subsequences1 9 25 252 240 3612 25 252 240 361 2983 252 240 361 298 1184 240 361 298 118 2945 361 298 118 294 2196 298 118 294 219 1937 118 294 219 193 3598 294 219 193 359 969 219 193 359 96 15110 193 359 96 151 14111 359 96 151 141 19812 96 151 141 198 29613 151 141 198 296 18314 141 198 296 183 16015 198 296 183 160 11716 296 183 160 117 13917 183 160 117 139 17918 160 117 139 179 25519 117 139 179 255 35120 139 179 255 351 32021 179 255 351 320 44022 255 351 320 440 21923 351 320 440 219 26024 320 440 219 260 25425 440 219 260 254 22726 219 260 254 227 36327 260 254 227 363 18528 254 227 363 185 20029 227 363 185 200 25430 363 185 200 254 17531 185 200 254 175 25332 200 254 175 253 22133 254 175 253 221 28134 175 253 221 281 26335 253 221 281 263 8736 221 281 263 87 8337 281 263 87 83 4238 263 87 83 42 8039 87 83 42 80 4440 83 42 80 44 9941 42 80 44 99 7142 80 44 99 71 8343 44 99 71 83 15044 99 71 83 150 12045 71 83 150 120 6046 83 150 120 60 13247 150 120 60 132 12748 120 60 132 127 15149 60 132 127 151 10950 132 127 151 109 141

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in Fuzzy Systems 7

Table 6 Calculated confidence value of various association rules

A1Middle cap A2Middle rarr A3Middle 0741 741198603Middle rarr 1198601Middle cap 1198602Middle 0522 52A1Middle cap A3Middle rarr A2Middle 0781 781198602Middle rarr 1198601Middle cap 1198603Middle 0526 53

A2Middle cap A3Middle rarr A1Middle 0735 741198601Middle rarr 1198602Middle cap 1198603Middle 0529 53A1Middle cap A2Middle rarr A4Middle 0692 691198604Middle rarr 1198601Middle cap 1198602Middle 0491 49A1Middle cap A4Middle rarr A2Middle 0762 761198602Middle rarr 1198601Middle cap 1198604Middle 0491 49A2Middle cap A4Middle rarr A1Middle 0724 721198601Middle rarr 1198602Middle cap 1198604Middle 0494 49A1Middle cap A2Middle rarr A5Middle 0695 691198605Middle rarr 1198601Middle cap 1198602Middle 0497 50

A1Middle cap A5Middle rarr A2Middle 0737 741198602Middle rarr 1198601Middle cap 1198605Middle 0493 49A2Middle cap A5Middle rarr A1Middle 0759 761198601Middle rarr 1198602Middle cap 1198605Middle 0496 50A1Middle cap A3Middle rarr A4Middle 0742 741198604Middle rarr 1198601Middle cap 1198603Middle 0499 50A1Middle cap A4Middle rarr A3Middle 0775 781198603Middle rarr 1198601Middle cap 1198604Middle 0496 50A3Middle cap A4Middle rarr A1Middle 0691 691198601Middle rarr 1198603Middle cap 1198604Middle 0502 50

A1Middle cap A3Middle rarr A5Middle 0751 751198605Middle rarr 1198601Middle cap 1198603Middle 0510 51A1Middle cap A5Middle rarr A3Middle 0757 761198603Middle rarr 1198601Middle cap 1198605Middle 0502 50A3Middle cap A5Middle rarr A1Middle 0738 741198601Middle rarr 1198603Middle cap 1198605Middle 0509 51A1Middle cap A4Middle rarr A5Middle 0792 791198605Middle rarr 1198601Middle cap 1198604Middle 0515 51A1Middle cap A5Middle rarr A4Middle 0764 761198604Middle rarr 1198601Middle cap 1198605Middle 0511 51A4Middle cap A5Middle rarr A1Middle 0715 711198601Middle rarr 1198604Middle cap 1198605Middle 0513 51

Table 7 Largest frequent fuzzy item set for remaining dataset

1198601Low cap 1198602Low cap 1198603Low cap 1198604Low

1198601Low cap 1198602Low cap 1198603Low cap 1198605Low

1198601Low cap 1198602Low cap 1198604Low cap 1198605Low

1198601Low cap 1198603Low cap 1198604Low cap 1198605Low

(b) Factors like the number of active users and numberof files changed (addition deletion andmodification)are less

6 Discussion

The fuzzy data mining algorithm for time series data [11]allows efficient mining of the association rules from the large

Table 8 Generated subsequences

Sp Subsequences1 9 25 252 240 3612 25 252 240 361 2983 252 240 361 298 1184 240 361 298 118 2945 361 298 118 294 2196 298 118 294 219 1937 118 294 219 193 3598 294 219 193 359 969 219 193 359 96 15110 193 359 96 151 14111 359 96 151 141 19812 96 151 141 198 29613 151 141 198 296 18314 141 198 296 183 16015 198 296 183 160 11716 296 183 160 117 13917 183 160 117 139 17918 160 117 139 179 25519 117 139 179 255 35120 139 179 255 351 32021 179 255 351 320 44022 255 351 320 440 21923 351 320 440 219 26024 320 440 219 260 25425 440 219 260 254 22726 219 260 254 227 36327 260 254 227 363 18528 254 227 363 185 20029 227 363 185 200 25430 363 185 200 254 17531 185 200 254 175 25332 200 254 175 253 22133 254 175 253 221 28134 175 253 221 281 26335 253 221 281 263 8736 221 281 263 87 8337 281 263 87 83 4238 263 87 83 42 8039 87 83 42 80 4440 83 42 80 44 9941 42 80 44 99 7142 80 44 99 71 8343 44 99 71 83 15044 99 71 83 150 12045 71 83 150 120 6046 83 150 120 60 13247 150 120 60 132 12748 120 60 132 127 15149 60 132 127 151 10950 132 127 151 109 141

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

8 Advances in Fuzzy Systems

Table 8 Continued

Sp Subsequences51 127 151 109 141 14152 151 109 141 141 13553 109 141 141 135 24254 141 141 135 242 32055 141 135 242 320 38056 135 242 320 380 43857 242 320 380 438 35558 320 380 438 355 14159 380 438 355 141 10760 438 355 141 107 11461 355 141 107 114 15362 141 107 114 153 21263 107 114 153 212 13364 114 153 212 133 21265 153 212 133 212 27366 212 133 212 273 32667 133 212 273 326 45568 212 273 326 455 37869 273 326 455 378 18670 326 455 378 186 31771 455 378 186 317 13972 378 186 317 139 15573 186 317 139 155 25774 317 139 155 257 20875 139 155 257 208 16276 155 257 208 162 22977 257 208 162 229 20578 208 162 229 205 20379 162 229 205 203 23480 229 205 203 234 21481 205 203 234 214 14482 203 234 214 144 18583 234 214 144 185 15384 214 144 185 153 20685 144 185 153 206 31586 185 153 206 315 20487 153 206 315 204 10388 206 315 204 103 33689 315 204 103 336 27890 204 103 336 278 32991 103 336 278 329 37092 336 278 329 370 41993 278 329 370 419 23094 329 370 419 230 22495 370 419 230 224 21796 419 230 224 217 18897 230 224 217 188 17698 224 217 188 176 12999 217 188 176 129 91100 188 176 129 91 188

Table 8 Continued

Sp Subsequences101 176 129 91 188 169102 129 91 188 169 256103 91 188 169 256 245104 188 169 256 245 237105 169 256 245 237 74106 256 245 237 74 228107 245 237 74 228 179108 237 74 228 179 212109 74 228 179 212 128110 228 179 212 128 154111 179 212 128 154 150112 212 128 154 150 192113 128 154 150 192 148114 154 150 192 148 171115 150 192 148 171 181116 192 148 171 181 163

dataset These generated rules help in finding the regularityand existing trend forOSS projectsWe have used the commitactivity data of Eclipse CDT The commits in repository aredirectly related to the activities such as file or code changeddeleted or modified By analysing the trend in commitsdata we can interpret the development or evolution activityof the considered software In the above experiment theoriginal dataset is divided into two subdatasets (training andremaining dataset)

The generated association rules from training set allowanalysing the regularity and trends in the commits of OSSproject These rules also help in predicting and analysing thefuture evolution or development activity of the Eclipse CDTThe generated rules are validated on the remaining datasetto find its applicability It is found that applicability of theserules on remaining set decreasesThe results of the algorithmindicate that the commit activity of remaining dataset haslow activity range There may be other factors also whichare the cause of this decrease behavior in the number ofcommitsThese may include factors like the number of activeusers and number of files changed (addition deletion andmodification) which are less

7 Threats to Validity

This section discusses the threats to validity of the studyConstruct validity threats concern the relationship

between theory and observationThese threats can be mainlydue to the fact that we assumed all the commits posted in therevision control tool git [4] Any changes performed in thesource code but not logged through the tool may not havebecome part of the study

Internal validity concerns the selection of subject systemsand the analysis methods This study uses a month as theunit of measure for tracking the types of change activities Inthe future we would like to use more natural and insightfulpartition based on majorminor versions of the OSS project

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in Fuzzy Systems 9

Table9Transfo

rmationof

dataele

mentsinto

fuzzyset

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

11

00

10

00

10

0067

0933

00

0593

040

72

10

00

10

0067

0933

00

0593

040

70

10

30

10

0067

0933

00

0593

040

70

10

088

012

04

0067

0933

00

0593

040

70

10

088

012

00

10

50

0593

040

70

10

088

012

00

10

0207

0793

06

01

0088

012

00

10

0207

0793

0038

062

07

088

012

00

10

0207

0793

0038

062

00

0607

0393

80

10

0207

0793

0038

062

00

0607

0393

10

09

0207

0793

0038

062

00

0607

0393

10

0066

034

010

038

062

00

0607

0393

10

0066

034

00727

0273

011

00607

0393

10

0066

034

00727

0273

00347

0653

012

10

0066

034

00727

0273

00347

0653

00

10

13066

034

00727

0273

00347

0653

00

10

044

70553

014

0727

0273

00347

0653

00

10

044

70553

006

04

015

0347

0653

00

10

044

70553

006

04

00887

0113

016

01

0044

70553

006

04

00887

0113

0074

026

017

044

70553

006

04

00887

0113

0074

026

00473

0527

018

06

04

00887

0113

0074

026

00473

0527

00

10

190887

0113

0074

026

00473

0527

00

10

0066

034

20074

026

00473

0527

00

10

0066

034

00867

0133

210473

0527

00

10

0066

034

00867

0133

00067

0933

220

10

0066

034

00867

0133

00067

0933

0207

0793

023

0066

034

00867

0133

00067

0933

0207

0793

00

10

240

0867

0133

00067

0933

0207

0793

00

10

01

025

00067

0933

0207

0793

00

10

01

00153

0847

026

0207

0793

00

10

01

00153

0847

00

058

042

270

10

01

00153

0847

00

058

042

0433

0567

028

01

00153

0847

00

058

042

0433

0567

00333

0667

029

0153

0847

00

058

042

0433

0567

00333

0667

00

10

300

058

042

0433

0567

00333

0667

00

10

05

05

031

0433

0567

00333

0667

00

10

05

05

00

10

320333

0667

00

10

05

05

00

10

0193

0807

033

01

005

05

00

10

0193

0807

00

10

3405

05

00

10

0193

0807

00

10

01

035

01

00193

0807

00

10

01

01

00

360193

0807

00

10

01

01

00

10

037

01

00

10

10

01

00

10

038

01

01

00

10

01

00

10

039

10

01

00

10

01

00

10

040

10

01

00

10

01

00

10

041

10

01

00

10

01

00

10

0

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

10 Advances in Fuzzy Systems

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

421

00

10

01

00

10

01

00

431

00

10

01

00

10

00667

0333

044

10

01

00

10

0066

70333

00867

0133

045

10

01

00

0667

0333

00867

0133

01

00

461

00

0667

0333

00867

0133

01

00

0787

0213

047

0667

0333

00867

0133

01

00

0787

0213

0082

018

048

0867

0133

01

00

0787

0213

0082

018

0066

034

049

10

00787

0213

0082

018

0066

034

0094

006

050

0787

0213

0082

018

0066

034

0094

006

00727

0273

051

082

018

0066

034

0094

006

00727

0273

00727

0273

052

066

034

0094

006

00727

0273

00727

0273

00767

0233

053

094

006

00727

0273

00727

0273

00767

0233

00053

0947

054

0727

0273

00727

0273

00767

0233

00053

0947

00

0867

0133

550727

0273

00767

0233

00053

0947

00

0867

0133

0046

70533

560767

0233

00053

0947

00

0867

0133

0046

70533

0008

092

570053

0947

00

0867

0133

0046

70533

0008

092

00633

0367

580

0867

0133

0046

70533

0008

092

00633

0367

0727

0273

059

0046

70533

0008

092

00633

0367

0727

0273

00953

0047

060

0008

092

00633

0367

0727

0273

00953

0047

00907

0093

061

00633

0367

0727

0273

00953

0047

00907

0093

0064

70353

062

0727

0273

00953

0047

00907

0093

0064

70353

00253

0747

063

0953

0047

00907

0093

0064

70353

00253

0747

0078

022

064

0907

0093

0064

70353

00253

0747

0078

022

00253

0747

065

064

70353

00253

0747

0078

022

00253

0747

00

10

660253

0747

0078

022

00253

0747

00

10

00827

0173

67078

022

00253

0747

00

10

00827

0173

00

168

0253

0747

00

10

00827

0173

00

10

048

052

690

10

00827

0173

00

10

048

052

0427

0573

070

00827

0173

00

10

048

052

0427

0573

00

0887

0113

710

01

0048

052

0427

0573

00

0887

0113

074

026

072

0048

052

0427

0573

00

0887

0113

074

026

00633

0367

073

0427

0573

00

0887

0113

074

026

00633

0367

00

10

740

0887

0113

074

026

00633

0367

00

10

028

072

075

074

026

00633

0367

00

10

028

072

00587

0413

076

0633

0367

00

10

028

072

00587

0413

0014

086

077

01

0028

072

00587

0413

0014

086

003

07

078

028

072

00587

0413

0014

086

003

07

00313

0687

079

0587

0413

0014

086

003

07

00313

0687

00107

0893

080

014

086

003

07

00313

0687

00107

0893

0024

076

081

03

07

00313

0687

00107

0893

0024

076

00707

0293

0

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in Fuzzy Systems 11

Table9Con

tinued

Cou

nt47595

60227

8178

47275

60547

8178

46802

6102

8178

47262

6056

8178

47775

60047

8178

SpA

1Lo

wA

1Middle

A1High

A2Lo

wA

2Middle

A2High

A3Lo

wA

3Middle

A3High

A4Lo

wA

4Middle

A4High

A5Lo

wA

5Middle

A5High

820313

0687

00107

0893

0024

076

00707

0293

00433

0567

083

0107

0893

0024

076

00707

0293

00433

0567

0064

70353

084

024

076

00707

0293

00433

0567

0064

70353

00293

0707

085

0707

0293

00433

0567

0064

70353

00293

0707

00

09

01

860433

0567

0064

70353

00293

0707

00

09

01

0307

0693

087

064

70353

00293

0707

00

09

01

0307

0693

0098

002

088

0293

0707

00

09

01

0307

0693

0098

002

00

076

024

890

09

01

0307

0693

0098

002

00

076

024

01

090

0307

0693

0098

002

00

076

024

01

00

0807

0193

91098

002

00

076

024

01

00

0807

0193

00533

046

792

0076

024

01

00

0807

0193

00533

046

70

0207

0793

930

10

00807

0193

00533

046

70

0207

0793

0133

0867

094

00807

0193

00533

046

70

0207

0793

0133

0867

00173

0827

095

00533

046

70

0207

0793

0133

0867

00173

0827

0022

078

096

00207

0793

0133

0867

00173

0827

0022

078

00413

0587

097

0133

0867

00173

0827

0022

078

00413

0587

00493

0507

098

0173

0827

0022

078

00413

0587

00493

0507

00807

0193

099

022

078

00413

0587

00493

0507

00807

0193

01

00

100

0413

0587

00493

0507

00807

0193

01

00

0413

0587

0101

0493

0507

00807

0193

01

00

0413

0587

0054

046

0102

0807

0193

01

00

0413

0587

0054

046

00

10

103

10

00413

0587

0054

046

00

10

0033

0967

0104

0413

0587

0054

046

00

10

0033

0967

00087

0913

0105

054

046

00

10

0033

0967

00087

0913

01

00

106

01

00033

0967

00087

0913

01

00

0147

0853

0107

0033

0967

00087

0913

01

00

0147

0853

00473

0527

0108

0087

0913

01

00

0147

0853

00473

0527

00253

0747

0109

10

00147

0853

00473

0527

00253

0747

00813

0187

0110

0147

0853

00473

0527

00253

0747

00813

0187

0064

036

0111

0473

0527

00253

0747

00813

0187

0064

036

00667

0333

0112

0253

0747

00813

0187

0064

036

00667

0333

00387

0613

0113

0813

0187

0064

036

00667

0333

00387

0613

0068

032

0114

064

036

00667

0333

00387

0613

0068

032

00527

0473

0115

0667

0333

00387

0613

0068

032

00527

0473

0046

054

0116

0387

0613

0068

032

00527

0473

0046

054

0058

042

0

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

12 Advances in Fuzzy Systems

for analysing the change activity of OSS projects Subjectsystems were selected from public repositories but selectionis biased towards projects with valid git repositories

External validity concerns the generalization of the find-ings In the future we would like to providemore generalizedresults by considering higher number of OSS projects

Reliability validity concerns the possibility of replicationof the study The subject systems are available in the publicdomain We have attempted to put all the necessary details ofthe experiment process in the paper

8 Conclusion and Future Work

The commit activity data available in the development repos-itory of open source software can be used to analyse theevolution of OSS projects as each commit is directly relatedto the development activity such as code deletion additionmodification comments and file addition In this studythe Eclipse CDT commits data is analysed to find theregularity and trend in the commit data The fuzzy datamining algorithm for time series data is used to generate theassociation rules from the dataset The dataset is divided intotwo subsets (training and remaining dataset) to evaluate thepattern of evolution of the Eclipse CDT

After applying and validating the generated associationrule from training dataset it is found that the rates at whichcommits performed in training dataset and remaining datasetare different This thing is again verified by generating theassociation rules from the remaining set The generated rulesfrom remaining dataset specify that the data in the consideredremaining dataset is more towards lower range of commitsperformed This thing validates the applicability of the rulesgenerated from the training dataset

These association rules indicate that the overall commitsin the Eclipse CDT are towardsmiddle range except the varia-tion found near the end where there is a high probability thatcommits lie in the lower range The continuous availabilityand existence of commits in the repository of the EclipseCDTillustrate that the development or evolution of Eclipse CDTis active with most of the commits per month that lie in themiddle range and at the end lie near to the lower range In thefuture we want to consider any prediction algorithm alongwith the concept of fuzzy data mining algorithm for timeseries data to give a prediction about the number of commitsto be performed in the particular month although there areother various factors that need to be considered on which thenumber of commits depends

Appendix

See Tables 8 and 9

Competing Interests

The authors declare that they have no competing interests

References

[1] MW Godfrey and Q Tu ldquoEvolution in open source software acase studyrdquo in Proceedings of the IEEE Interantional Conferenceon SoftwareMaintenance (ICMS rsquo00) pp 131ndash142October 2000

[2] H Zhang and S Kim ldquoMonitoring software quality evolutionfor defectsrdquo IEEE Software vol 27 no 4 pp 58ndash64 2010

[3] Y Fang andDNeufeld ldquoUnderstanding sustained participationin open source software projectsrdquo Journal of ManagementInformation Systems vol 25 no 4 pp 9ndash50 2008

[4] ldquoGITrdquo July 2015 httpgit-scmcom[5] R Sen S S Singh and S Borle ldquoOpen source software success

measures and analysisrdquo Decision Support Systems vol 52 no 2pp 364ndash372 2012

[6] R McCleary R A Hay E E Meidinger and D McDowallApplied Time Series Analysis for the Social Sciences Sage BeverlyHills Calif USA 1980

[7] R Grewal G L Lilien and G Mallapragada ldquoLocation loca-tion location how network embeddedness affects project suc-cess in open source systemsrdquo Management Science vol 52 no7 pp 1043ndash1056 2006

[8] I Herraiz J M Gonzalez-Barahona G Robles and D MGerman ldquoOn the prediction of the evolution of libre softwareprojectsrdquo in Proceedings of the 23rd International Conference onSoftware Maintenance (ICSM rsquo07) pp 405ndash414 Paris FranceOctober 2007

[9] C F Kemerer and S Slaughter ldquoAn empirical approach tostudying software evolutionrdquo IEEE Transactions on SoftwareEngineering vol 25 no 4 pp 493ndash509 1999

[10] A Mockus and L G Votta ldquoIdentifying reasons for softwareChanges using historic databasesrdquo in Proceedings of the Inter-national Conference on Software Maintenance (ICSM rsquo00) pp120ndash130 IEEE San Jose Calif USA 2000

[11] C-H Chen T-P Hong and V S Tseng ldquoFuzzy datamining fortime-series datardquo Applied Soft Computing Journal vol 12 no 1pp 536ndash542 2012

[12] C C H Yuen ldquoAn empirical approach to the study of errors inlarge software under maintenancerdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo85)pp 96ndash105 1985

[13] C C H Yuen ldquoA statistical rationale for evolution dynamicsconceptsrdquo in Proceedings of the IEEE International Conferenceon Software Maintenance (ICSM rsquo87) pp 156ndash164 September1987

[14] C C H Yuen ldquoOn analytic maintenance process data at theglobal and the detailed levels a case studyrdquo in Proceedings of theInternational Conference on Software Maintenance (ICSM rsquo88)pp 248ndash255 IEEE Scottsdale Ariz USA 1988

[15] M Goulao N Fonte M Wermelinger and F B AbreuldquoSoftware evolution prediction using seasonal time analysisa comparative studyrdquo in Proceedings of the 16th EuropeanConference on Software Maintenance and Reengineering (CSMRrsquo12) pp 213ndash222 IEEE Szeged Hungary March 2012

[16] B Kenmei G Antoniol and M Di Penta ldquoTrend analysisand issue prediction in large-scale open source systemsrdquo inProceedings of the 12th IEEE European Conference on SoftwareMaintenance and Reengineering (CSMR rsquo08) pp 73ndash82 AthensGreece April 2008

[17] F Caprio G Casazza U Penta M D Penta and U VillanoldquoMeasuring and predicting the Linux kernel evolutionrdquo inProceedings of the International Workshop of Empirical Studieson Software Maintenance Florence Italy November 2001

[18] E Fuentetaja and D J Bagert ldquoSoftware evolution from a time-series perspectiverdquo in Proceedings of the IEEE InternationalConference on Software Maintenance pp 226ndash229 October2002

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in Fuzzy Systems 13

[19] M Klas F Elberzhager J Munch K Hartjes and O VonGraevemeyer ldquoTransparent combination of expert and mea-surement data for defect predictionmdashAn Industrial Case Studyrdquoin Proceedings of the 32nd ACMIEEE International Conferenceon Software Engineering (ICSE rsquo10) pp 119ndash128 Cape TownSouth Africa May 2010

[20] U Raja D P Hale and J E Hale ldquoModeling software evolutiondefects a time series approachrdquo Journal of Software Mainte-nance and Evolution Research and Practice vol 21 no 1 pp49ndash71 2009

[21] G Antoniol G Casazza M D Penta and E Merlo ldquoModelingclones evolution through time seriesrdquo in Proceedings of the IEEEInternational Conference on Software Maintenance (ICSM rsquo01)pp 273ndash280 IEEEComputer Society Florence Italy November2001

[22] L Yu ldquoIndirectly predicting the maintenance effort of open-source softwarerdquo Journal of Software Maintenance and Evolu-tion vol 18 no 5 pp 311ndash332 2006

[23] A Amin L Grunske and A Colman ldquoAn approach to softwarereliability prediction based on time series modelingrdquo Journal ofSystems and Software vol 86 no 7 pp 1923ndash1932 2013

[24] Forecasting Model httppeopledukeedusimrnau411fcsthtm[25] T-P Hong C-S Kuo and C-C Chi ldquoA fuzzy data mining

algorithm for quantitative valuesrdquo inProceedings of the 3rd Inter-national Conference on Knowledge-Based Intelligent InformationEngineering Systems (KES rsquo99) pp 480ndash483 September 1999

[26] H Suresh andK Raimond ldquoMining association rules from timeseries data using hybrid approachesrdquo Proceedings of Interna-tional Journal of Computational Engineering Research vol 3 no3 pp 181ndash189 2013

[27] ldquoGIT HUBrdquo July 2015 httpsgithubcom[28] ldquoEclipserdquo ldquoWhere did Eclipse come fromrdquo Eclipse Wiki[29] ldquoEclipserdquo 2015 httpseclipseorgcdt[30] ldquoEclipserdquo July 2015 httpwwweclipseorgcdtdocoverview

3[31] L A Zadeh ldquoFuzzy setsrdquo Information and Computation vol 8

no 3 pp 338ndash353 1965[32] J Han and M Kamber Data Mining Concepts and Techniques

Morgan Kaufman Academic Press 2001[33] R Agrawal ldquoMining association rules between sets of items in

large databasesrdquo in Proceedings of the ACM SIGMOD Confer-ence pp 207ndash216 Washington DC USA May 1993

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014