Using Structured Text Source Code Metrics and Artiﬁcial ... to create Programmable Logic...

Using Structured Text Source Code Metrics and ArtificialNeural Networks to Predict Change Proneness at Code Tab

and Program Organization Level

Lov [email protected]

ABB Corporate Research CenterBangalore, India

Ashish [email protected] Corporate Research Center

Bangalore, India

ABSTRACTStructured Text (ST) is a high-level text-based program-ming language which is part of the IEC 61131-3 standard.ST is widely used in the domain of industrial automationengineering to create Programmable Logic Controller (PLC)programs. ST is a Domain Specific Language (DSL) which isspecialized to the Automation Engineering (AE) applicationdomain. ST has specialized features and programming con-structs which are different than general purpose program-ming languages. We define, develop a tool and compute 10source code metrics and their correlation with each-other atthe Code Tab (CT) and Program Organization Unit (POU)level for two real-world industrial projects at a leading au-tomation engineering company. We study the correlationbetween the 10 ST source code metrics and their relation-ship with change proneness at the CT and POU level bycreating experimental dataset consisting of different versionsof the system. We build predictive models using ArtificialNeural Network (ANN) based techniques to predict changeproneness of the software. We conduct a series of exper-iments using various training algorithms and measure theperformance of our approach using accuracy and F-measuremetrics. We also apply two feature selection techniques toselect optimal features aiming to improve the overall accu-racy of the classifier.

KeywordsArtificial Neural Networks (ANN), Change Proneness Pre-diction, Machine Learning Applications in Software Engi-neering, Programmable Logic Controller (PLC) Applications,Source Code Analysis, Source Code Metrics, Structured Text(ST)

1. RESEARCH MOTIVATION AND AIMThe IEC (International Electrotechnical Commission) 61131-

3 international standard provides guidelines for PLC (Pro-grammable Logic Controller) programming and is accepted

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

ISEC ’17, February 05-07, 2017, Jaipur, India© 2017 ACM. ISBN 978-1-4503-4856-0/17/02. . . $15.00

DOI: http://dx.doi.org/10.1145/3021460.3021481

as a standard by PLC manufacturers [3][10]. The IEC 61131-3 standard defines five languages: Ladder Diagram (LD),Function Block Diagram (FBD), Sequential Function Chart(SFC), Instruction List (IL) and Structured Text (ST). LD,FBD and SFC are graphical languages whereas IL and STare text-based languages. ST is a high-level powerful lan-guage which is easy to use, widely accepted by PLC man-ufacturers and is probably the most widely used controllerprogramming language in the industrial automation engi-neering domain [3][10]. ST is a domain specific language andseveral of its language features and programming constructsare different than that of general purpose programming lan-guages.

Several characteristics of ST is different than that of gen-eral purpose programming languages as the primary purposeof a PLC is to control an industrial process [13]. A PLChas several digitalized analogue signals representing the in-puts and outputs which are controlled by the PLC. Thereare several fundamental differences between ST and generalpurpose programming languages. For example, ST has aconcept of function block for which there is no precise equiv-alent in C#, Java and Python. Another difference is that inST, a function can return more than one output value with-out the values being encapsulated in a structure or array.Recursion is common in general purpose programming lan-guages whereas the notion of recursion is absent in ST [13].Hence the source code metrics for computing the structuralcomplexity of PLC applications and control systems devel-oped using ST are different than non-industrial automationapplications developed using general purpose programminglanguages.

There has been a lot of work done in the area of predict-ing change proneness of software using source code metricsfor general purpose programming languages [4][7][12]. How-ever, the area of change proneness prediction using sourcecode metrics for control systems in automation engineeringdomain is unexplored due to the lack of availability of open-source automation engineering projects and lack of researchstudies from industries on commercial data. The work pre-sented in this paper is motivated by the need to contributeto the body of knowledge on empirical software engineeringand measurement based research studies on change prone-ness prediction in automation engineering for domain spe-cific programming languages like ST. Our specific researchaim is to conduct experiments on two real-world, large, com-plex, matured, active and diverse projects at an automationengineering company. Our aim is to investigate the rela-tion between 10 source code metrics between themselves and

172

with change-proneness at two level of granularity (Code Taband Program Organization Unit) and build change prone-ness prediction model based on Artificial Neural Networks(ANN). Our aim is to also examine the effectiveness of PCAbased and rough set analysis based feature selection tech-niques as well as different ANN training algorithms on thepredictive accuracy of the statistical model.

2. RELATED WORK AND RESEARCH CON-TRIBUTIONS

In this Section, we present closely related work to our re-search and list our novel research contributions in contextto existing work. We organize the closely related work intotwo lines of research: (1) Source code analysis for PLC pro-gramming languages (2) Change-proneness prediction usingsource code metrics for general purpose languages.

Code Analysis for PLC Languages: Kumar et al. presentsource code level metrics to measure size, vocabulary, cogni-tive complexity and testing complexity of Ladder Diagram(LD) which is a visual PLC programming language [5]. Nairet al. presents a methodology to define metrics for IEC61131-3 domain specific languages. Using their proposedmethodology, they define a set of product metrics that canbe used for managing the software project development us-ing PLC languages [9]. Prahofer et al. mention that staticcode analysis tools are rare in the domain of PLC program-ming and they present an approach and tool support forstatic code analysis of PLC programs [11]

Code Metrics to Predict Change-Proneness: Lu etal. conduct experiments on 102 Java systems to investigatethe ability of 62 Object-Oriented metrics to predict change-proneness [7]. They conclude that size metrics exhibit mod-erate or almost moderate ability in discriminating betweenchange-prone and not change-prone classes [7]. Koru et al.conduct experiments on two open-source projects (KOfficeand Mozilla) and identified and characterized the change-prone classes in these two products by producing tree-basedmodels [4]. Romano et al. investigate the extent to which ex-isting source code metrics can be used for predicting change-prone Java interfaces [12].

Research Contributions While there has been severalstudies on change-proneness prediction based on source codemetrics for software systems developed using general pur-pose languages, the research presented in this paper is thefirst study on change-proneness prediction in the domainof automation engineering for PLC programming languages.While there has been several empirical studies on open-source software, there is lack of case-studies on industrialclose-source software. We conduct experiments on two real-world, large and complex software developed and maintainedat an industrial automation engineering company. We defineand implement 10 source code metrics for Structured Textsuch as size, vocabulary, program length, cognitivity com-plexity and testing complexity. We conduct a series of ex-periments to compute the proposed 10 source code metrics,investigate the relationship between source code metrics andchange-proneness, examine the impact of PCA and roughtset analysis feature extraction and selection techniques andapply artificial neural networks using three different training

algorithms to build statistical models for predicting changeproneness.

3. EXPERIMENTAL DATASETIn Structured Text PLC programming language, Program

Organization Unit (POU) are software units within an ap-plication. POUs are independent units or building blocks ofthe programming system. The POUs provide data encapsu-lation (scope of the variable is limited within the unit) andthe data exchange between POUs are done using interfaces.A POU within the programming system can contain multi-ple Code Tabs (input and output variable declarations andcontrol logic) [3][10]. Table 1 shows the experimental datasetused in our study. We take two versions of two real worldprojects in our organization. As shown in Table 1, Version 1of Project 1 consists of 214 Code Tabs and Version 2 consistsof 240 Code Tabs. For Project 1, the number of commonCode Tabs between the two versions is 158. POUs are basiccode containers and a program is structured in one or morePOUs.

Table 1 shows the number of POUs for both the version ofthe two projects. POUs can be further classified into threetypes: Functions (FUN), Function Blocks (FB) and Pro-grams (PROG) [8]. However, differentiation between thetype of POUs is not required for the study presented in thispaper. We conduct experiments on dataset belonging to twoprojects of different sizes to test the generalizability of ourapproach. We chose two projects (context or the experimen-tal dataset of our study) such that the size and the domainsare different. All the source code for both the projects areimplemented in our organization which embodies industrialpractice and is representative of an industrial setting in theautomation engineering domain.

Table 1: Experimental Dataset [CT: Number of Code Tabs,POU: Number of Program Organization Units]

Version 1 Version 2 CommonCT POU CT POU CT POU

Project 1 214 82 240 82 158 56Project 2 293 104 344 104 209 71

4. RESEARCH METHODOLOGY, FRAME-WORK

Figure 1 shows our research methodology consisting ofvarious steps. The first step in the process is to computethe proposed source code metrics at the Code Tab and POUlevel. The proposed metrics serves as the predictor or inde-pendent variables. We apply a data-driven fuzzy cluster-ing based algorithm to annotate the data into three cate-gories: high, medium and low change-proneness.The anno-tated data serves as ground truth for conducting machinelearning experiments. As show in Figure 1, we conduct ex-periments with three different set of source code metrics.One set of metrics consists of all the 10 metrics. Another setconsist of experimenting with Principal Component Analysis(PCA) as a pre-processing step for dimensionality reduction.PCA reduces dimensionality by selecting a subset of vari-ables that preserves as much information present in the orig-inal set of variables. We also apply another technique using

173

Figure 1: Research Methodology and Framework

rough set theory for feature selection. We use rough set anal-ysis to remove features with little and no effect on the de-pendent variable. As shown in Figure 1, we apply ArtificialNeural Network (ANN) with three different training algo-rithms i.e., Gradient Descent method (GD), Quasi-Newtonmethod (NM), and Levenberg-Marquardt (LM), to estimatethe change-proneness of a Structured Text PLC program atCode Tab and POU Level. We apply the supervised learn-ing paradigm using ANN as they have shown to performwell for nonlinear statistical modeling problems and can de-tect complex nonlinear relationships between the dependent(change proneness) and independent variables (source codemetrics). We use 10-fold cross validation to create differentpartitions of training and testing data and generalize the re-sult of our analysis. We normalize the attributes and scalethe data matrix to a range from 0 to 1. The normalizationis done to standardize the data before learning the machinelearning estimators (data pre-processing). The last step inthe process consists of computing the predictive accuracyof various models using confusion matrix and then applyingt-test analysis to identify the most accurate model.

5. PROPOSED SOURCE CODE METRICSWe propose, define and implement 10 source code met-

rics. Except the Cognitive Complexity (CC) and TestingComplexity (TC), rest all metrics are same for the POUand Code Tab level.

Size: We define size as the number of LOC (Lines of Code)metric (excluding the comments) which is equal to the num-ber of executable statements in the Structured Text pro-gram.

Vocabulary: We define vocabulary as the total number ofdistinct operators and operands used in a given program.

%CMT: It measures the percentage of comment in source(% of comments in LOC)

Program Length: We define program length as the totalnumber of operators and operands used in a given ST pro-gram.

Calculated Program Length (Cproglen) We define theCproglen of a ST program using the following equation:

Cproglen = η1logη1 + η2logη2 (1)

where η1 and η1 represents the total number of distinct op-erators and operands respectively.

Volume We define the Volume of a ST program using thefollowing equation:

V olume = (N1 +N2)log(η1 + η2) (2)

where N1, and N2 represents the total number of operatorsand operands in a given text.

Difficulty We define the Difficulty of a ST program usingthe following equation:

Difficulty =η12

∗ N2

η2(3)

Effort Effort of a structured text program is computed asthe product of volume and difficulty.

Cognitive Complexity We define Cognitive Complexity(CC) of a given POU as a metric to measure how easy ordifficult it is to understand and comprehend the POU. Ifthere are n number of Code Tabs, m number of attributes,and the POU is derived from o number of POUs then thecognitive complexity of the POU is calculated using follow-ing equation:

CCPOU = (η1+η2)∗(

n∑i=1

CWCTi +

m∑j=1

CWATj +

o∑k=1

CWIHk )

(4)where η1 and η2 are the number of distinct operator andoperands in the POU. CWCT , CWAT , and CWIH repre-sents the cognitive weight of the Code Tab, Attribute, andInheritance Level respectively.

Code Tab Cognitive Weight (CWCT ) is used to calcu-late the complexity of a Code Tab in a POU. It is computed

174

using following equation:

CWCT =

P∑i=1

Wi (5)

where P represents the number of basic control structuresand Wi represents the cognitive weight of ith basic controlstructure. The cognitive weight of different basic controlstructures are motioned in the paper by Lov Kumar et al[5]. A basic control structure (such as a if, for, while etc.)may consists of Q layers of nested basic control structureswhich increases the cognitive complexity of the program.Hence, to incorporate nested control structure, we extendEquation 5 to Equation 6.

CWCT =

Q∏j=1

P∑j=1

Wi,j (6)

Attribute Cognitive Weight (CWAT ) is used to calcu-late the complexity of attributes used in the POU. It is cal-culated using the following equation:

CWAT = NPDT ∗WPDT +NUDT ∗WUDT (7)

where NPDT, and NUDT represents the number of prede-fined and user defined data types in the POU and WPDT andWUDT represents the weight of predefined and user defineddata types respectively. In this work, we consider WPDT =1and WUDT =2.

Inheritance Level Cognitive Weight (CWIH) is usedto calculate the complexity of inheritance level of POU. It

is calculated using the following equation:

CWIH = DIT ∗NOA ∗ CL (8)

where DIT, NOA, and CL represents the depth of the in-heritance tree, number of attribute used for inheritance andcognitive weight of the inheritance level.

Testing Complexity: Testing Complexity (TC) of a POUis defined as the number of test cases required to test theprogram. We define Testing Complexity (TC) as a measureof the number of all possible control flows in POU. TC atthe POU level is defined using the following equation:

TCPOU =

n∑i=1

TCCTi − n (9)

where TCPOU and TCCT represents the testing complex-ity of POU and Code Tab respectively and n represents thenumber of Code Tabs in the given POU.

The testing complexity of a Code Tab TCCT is computedusing following equation:

TCCT = E −N + 2 ∗ P (10)

where E represents the number of edges of the control flowgraph, N represents the number of nodes of the control flowgraph and P represents the number of connected compo-nents.

6. CODE METRICS - DESCRIPTIVE STATIS-TICS AND CORRELATION

Table 2: Descriptive Statistics of 10 Source Code Metrics of Version 1 of both the Projects in Experimental

Project 1

Metrics Code Tab Level POU Level

Min. Max. Mean Median Std Dev. Min. Max. Mean Median Std Dev.

Size 1 198 16 6 27.46 1 263 44.23 19 55.4

ProgLen 4 2752 195.18 83 327.74 11 3804 559.74 267 715.4

Vocabulary 4 227 37.27 30 30.47 10 342 91.47 69 70.14

Cproglen 2.77 1163.79 126.03 82.32 148.12 18.02 1826.33 385.63 246.44 374.53

Volume 5.55 14929.46 812.74 290.03 1626.02 25.33 22195.62 2787.9 1088.7 3961.5

Difficulty 1 69.59 12.17 8.42 12.52 3.5 215.44 46.9 29.05 48.34

Effort 5.55 941205.24 24044.21 2402.74 88355.21 101.31 4753333.28 284672.67 26049.2 711924.9

CC 4 66057 1804.26 256 6577.14 340 142272 23674.05 10752 31139.87

TC 1 42 3.49 2 5.31 1 52 7.91 3 10.61

% Comment 0 0.55 0.03 0.01 0.06 0 0.29 0.05 0.03 0.06

Project 2

Metrics Code Tab Level POU Level

Min. Max. Mean Median Std Dev. Min. Max. Mean Median Std Dev.

Size 1 216 23.29 10 35.18 1 426 67.81 27.5 84.54

ProgLen 4 3791 350 114 571.7 11 6600 1041.44 461 1356.68

Vocabulary 4 245 47.66 34 42.96 10 416 121.79 87.5 95.65

Cproglen 2.77 1310.45 178.77 100.88 221.86 18.02 2335.37 557.52 337.8 537.64

Volume 5.55 16265.13 1542.44 395.09 2744.13 25.33 32846.44 5445.73 2064.45 7465.05

Difficulty 1 186.55 17.38 9.39 25.39 3.5 711.95 76.37 31.77 115.96

Effort 5.55 3034260.33 75949.77 3634.67 284794.55 101.31 23385158.95 1129510.9 76361.37 3351393.02

CC 4 104976 4287.78 378 12755.03 440 719766 63289.15 24996.5 105317.94

TC 1 42 3.56 2 5.21 1 92 8.49 3.5 13.84

% Comment 0 0.55 0.04 0.02 0.06 0 0.29 0.05 0.03 0.06

175

Project 1 (Upper Triangle) and Project 2 (Lower triangle)

SZ PL VC CPL VOL DIF EFF CC TC PC

SZ

PL

VC

CPL

VOL

DIF

EFF

CC

TC

PC

Project 1 (Upper Triangle) and Project 2 (Lower triangle)

SZ PL VC CPL VOL DIF EFF CC TC PC

SZ

PL

VC

CPL

VOL

DIF

EFF

CC

TC

PC

POU LevelCodeTab Level

SZ PL VC CPL VOL DIF EFF CC TC PC SZ PL VC CPL VOL DIF EFF CC TC PC

Figure 2: Correlations between 10 Source Code Metrics

We compute the descriptive statistics for the 10 sourcecode metrics for both the projects at the Code Tab and POULevel. Table 2 displays the minimum, maximum, mean,median and standard deviation for all the metrics acrossboth projects. The descriptive statistics in Table 2 describesand characterizes the features of the two versions of thetwo projects. We observe enough variance and dispersionin majority of the variable values from Table 2 and hencewe believe that our experimental dataset provides a goodsample or context for our analysis. Table 2 reveals that thecognitive complexity of Project 2 is much higher than thecognitive complexity of Project 1 but the difference betweenthe testing complexity of both the projects is less. Table 2reveals wide variance in various source code complexity met-rics within the same project across Code Tabs and POUs.The variance in code complexity metrics shows that somemodules or units are more complex than others. We noticethat the range computed as the difference between the min-imum and maximum values in the distribution of the tenmetrics between the two versions of the same project variessubstantially. We use range as one measure of variabilitybut the higher standard deviation for several metrics showsa wide distribution around the mean.

We compute the dependency between 10 code metrics us-ing Pearson Correlation Coefficient. The coefficient of cor-relation (r) measures the strength and direction of the lin-ear relationship between two variables. Figure 2 displaysthe correlation between all the metrics for both the projectsat the Code Tab and POU level. A black circle denotesa r value between 0.7 and 1.0 indicating a strong positivelinear relationship and a white circle denotes a r value be-tween 0.3 and 0.7 indicating a weak positive linear relation-ship. An empty cell indicates no linear relationship. We didnot find instances of negative relationship. Figure 2 showswhether there is a correlation between different metrics andin the subsequent sections we investigate whether there is acause and effect relationship between the metrics and changeproneness.

7. DATA ANNOTATION USING FUZZY CLUS-TERING

We define change proneness of a POU or a Code Tab into

Table 3: Fuzzy Cluster Centers [Lines Changed] of POU

Project 1 Project 2

Proneness CT Level POU Level CT Level POU Level

C1 4.93 16.02 8.45 23.04

C2 60.30 205.42 97.87 331.91

C3 172.03 505.45 244.49 788.43

Table 4: Descriptive Statistics of Code Tabs and POUs in-terms of Categorization into High, Medium and Low LinesChanged [CP: Change Proneness]

Project 1 Project 2

CP Code Tab Level POU Level CodeTab Level POU Level

No. % No. % No. % No. %

Low (L) 127 80.89 48 84.21 166 79.43 59 81.94

Medium (M) 21 13.38 7 12.28 33 15.79 11 15.28

High (H) 9 5.73 2 3.51 10 4.78 2 2.78

three categories: High (H), Medium (M) and Low (L). Wecompute the number of changed lines of code between twoversions of the system at the Code Tab and POU level boththe projects. Instead of arbitrarily defining a fixed thresh-old for the number of lines of code to categorize each CodeTab or POU into H, M or L, we apply fuzzy clustering tech-nique (data driven) to automatically derive or determine thethreshold and classification of each unit into categories. Ta-ble 3 shows the output of the fuzzy clustering algorithm andthe center point of the clusters. Table 4 shows the descrip-tive statistics of Code Tabs and POUs in-terms of catego-rization into High, Medium and Low lines changed. Table 4reveals that there are 127, 21 and 9 Code Tabs belonging tothe Low, Medium and High category.

8. EXPERIMENTAL RESULTS

8.1 Feature Extraction and SelectionTable 5 shows the output of Principal Component Anal-

ysis (PCA) displaying a smaller number of 4 variables ac-counting for the majority of the variance in the dependentvariable. Our objective is to determine the principal compo-nents and then use them as predictors for change proneness.Table 5 shows each of the 10 source code metrics score oneach of the four principal component. Table 5 shows the

176

Table 5: Principal Component Analysis

Project 1 Project 2

CodeTab Level POU Level CodeTab Level POU Level

PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC4 PC5

Size 0.33 0.30 0.32 0.05 0.31 0.43 0.21 0.11 0.35 -0.23 -0.30 0.20 0.34 -0.27 -0.31 -0.06 -0.35

ProgLen 0.36 -0.13 0.02 -0.09 0.36 -0.10 0.04 -0.32 0.39 0.16 0.20 0.06 0.39 0.16 0.14 0.14 -0.10

vocabulary 0.32 -0.30 -0.01 0.55 0.34 -0.26 0.19 0.21 0.36 -0.29 0.04 -0.14 0.35 -0.33 0.24 -0.07 -0.16

Cproglen 0.32 -0.35 0.01 0.43 0.34 -0.30 0.18 0.17 0.36 -0.33 0.08 -0.13 0.35 -0.35 0.25 -0.05 -0.18

Volume 0.36 -0.19 0.00 -0.17 0.36 -0.15 0.02 -0.32 0.40 0.08 0.20 0.03 0.40 0.10 0.15 0.13 -0.11

Difficulty 0.31 0.35 -0.18 0.14 0.34 0.24 -0.15 -0.16 0.27 0.52 0.03 0.08 0.32 0.44 -0.01 0.12 0.16

Effort 0.34 -0.12 -0.12 -0.55 0.33 -0.11 -0.24 -0.45 0.26 0.48 0.32 0.05 0.30 0.47 0.04 0.24 0.15

CC 0.33 -0.16 0.22 -0.38 0.31 -0.21 0.13 0.60 0.33 -0.37 0.02 0.09 0.26 -0.40 -0.08 0.00 0.86

TC 0.22 0.62 0.40 0.12 0.21 0.70 0.21 0.09 0.16 0.20 -0.72 0.47 0.22 0.05 -0.85 -0.11 -0.07

% cmt 0.23 0.32 -0.80 0.02 0.19 0.10 -0.86 0.35 0.18 0.21 -0.45 -0.83 0.14 0.27 0.14 -0.93 0.08

Eigenvalues 6.98 1.36 0.68 0.49 6.97 1.28 0.89 0.43 5.87 1.88 1.19 0.71 5.96 1.67 0.96 0.84 0.42

% Variance 69.81 13.56 6.76 4.89 69.75 12.83 8.87 4.34 58.68 18.76 11.91 7.11 59.59 16.74 9.56 8.41 4.20

Cumulative % variance 69.81 83.37 90.13 95.02 69.75 82.58 91.45 95.79 58.68 77.44 89.35 96.46 59.59 76.34 85.89 94.31 98.50

Table 6: Selected Set of Source Code Metrics using RoughSet Analysis

Project Level Source Code MetricsProject 1 CodeTab Size, ProgLen, Vocabulary,

Cproglen, Difficulty, Effort,TC, %cmt

POU Cproglen, Volume, Effort,TC, % cmt,

Project 2 CodeTab Size, ProgLen, Vocabulary,Cproglen, Volume, Diffi-culty, TC,% cmt

POU Size, Vocabulary, Volume,Difficulty, Effort, CC

eigenvalues which indicates the variance in the data in thedirection of the eigenvector. Table 5 shows the correlationbetween each of the 10 independent variables and the 4 prin-cipal components. Table 5 shows that some of the variablesare strongly correlated and some are weakly correlated withthe four principal components. For example, vocabulary hasa strong correlation with PC4 and % cmt has a strong corre-lation with PC3. Similarly, we observe a strong correlationbetween TC and PC2. We observe that PC1 has nearlyequal correlation with all the variables. The third principalcomponent decreases significantly with increase in % cmtand increases significantly with increase in size and TC.

We apply rough set theory and expectation maximizationbased clustering algorithm for feature selection [2][6]. Ta-ble 6 shows the subset of source code metrics selected afterapplying the rough set theory based technique. PCA tech-nique falls in the category of feature extraction wherein wemapped the 10 source code metrics feature into a new spaceconsisting of 4 principal components. Rough set analysisfalls into the category of feature selection wherein we chosethe most informative subset of features from the original setof features. Table 6 reveals that the dimensionality of theattributes has been reduced from 10 to 5 for Project 1 atPOU level. Similarly, the dimensionality has been reducedfor Project 2 at both the Code Tab and POU level.

8.2 ANN Training and Performance Evalua-tion

We consider three different subset of metrics as input to

Table 7: Confusion Matrix (Project 1 [Code Tab Level])

(a) ALL(ANN+GD)

L M H

L 124 3 0

M 18 3 0

H 7 1 1

(b) PCA(ANN+GD)

L M H

L 127 0 0

M 20 0 1

H 9 0 0

(c) RSA(ANN+GD)

L M H

L 121 6 0

M 17 4 0

H 7 2 0

(d) ALL(ANN+NM)

L M H

L 124 2 1

M 15 6 0

H 7 2 0

(e) PCA(ANN+NM)

L M H

L 125 2 0

M 20 0 1

H 9 0 0

(f) RSA(ANN+NM)

L M H

L 125 2 0

M 18 2 1

H 8 1 0

(g) ALL(ANN+GD)

L M H

L 119 7 1

M 10 11 0

H 2 4 3

(h) PCA(ANN+GD)

L M H

L 123 4 0

M 14 5 2

H 7 1 1

(i) RSA(ANN+GD)

L M H

L 121 5 1

M 13 8 0

H 5 1 3

design a model for predicting change proneness of PLC pro-grams using ANN with three different type of training al-gorithm i.e., GD, NM, and LM. The performance of eachprediction model is evaluated in terms of two different per-formance parameters i.e., accuracy and F-measure. Table 7,8 and 9 shows the resulted confusion matrix after applyingthe ANN algorithm (we do not show the confusion matrixfor Project 2 at POU level due to limited space in the paper).

Table 7, 8 and 9 describes the performance of all the clas-sification models and shows the type and number of errorsmade by the respective classifiers. The column representsthe predicted class whereas the rows represents the actualclass. Table 7 reveals that the classifier is confusing betweenthe classes M and L indicating areas of improvement. Sim-ilarly, in the error matric of Table 9, we observe false posi-tives and negatives between the L and M class. However, adetailed analysis of Tables 7, 8 and 9 a good proportion ofcorrect predictions.

Figure 3, 4, 5, 6 shows the box-plot diagrams for each ofthe experimental results enabling a visual comparison. The

177

Table 8: Confusion Matrix (Project 1 [POU Level])

(a) ALL(ANN+GD)

L M H

L 48 0 0

M 4 3 0

H 1 0 1

(b) PCA(ANN+GD)

L M H

L 48 0 0

M 6 1 0

H 2 0 0

(c) RSA(ANN+GD)

L M H

L 46 2 0

M 5 2 0

H 1 0 1

(d) ALL(ANN+NM)

L M H

L 48 0 0

M 4 2 1

H 0 1 1

(e) PCA(ANN+NM)

L M H

L 48 0 0

M 6 1 0

H 1 1 0

(f) RSA(ANN+NM)

L M H

L 47 1 0

M 4 3 0

H 1 0 1

(g) ALL(ANN+GD)

L M H

L 47 1 0

M 2 5 0

H 0 1 1

(h) PCA(ANN+GD)

L M H

L 48 0 0

M 6 1 0

H 1 0 1

(i) RSA(ANN+GD)

L M H

L 47 1 0

M 1 5 1

H 0 2 0

line in the middle of each box represents the median of theperformance parameters. We apply 10 fold cross validationfor all the combinations and the accuracy and f-measuremetric values are summarized in the box blots. Figure 3,4, 5, 6 contains three different type of box-plots, one forall metrics set, one for PCA, and the last one for RSA. Inour study, ANN with three different training algorithm andtwo different performance parameters have been considerfor change-proneness prediction of PLC project and hence6 different box-plot diagrams have been displayed (one foreach combination). Each box-plot diagram is partitionedinto three parts: one for all metrics, one for PCA and onefor RSA. The box-plot diagrams presents performance of allfeature selection methods within a single diagram. Table10 shows the performance results after applying ANN withthree different training algorithm for PLC projects.

Figure 3 and 4 shows the distributional characteristics ofthe accuracy results. Figure 3 and 4 reveals that the medianor middle quartile of the accuracy (marked with a red line)varies significantly across the box-plots. We observe that forproject 2 at the code tab level, the box plots are relativelyshort in comparison to box plot for project 2 at the POUlevel. A taller box plot for both the projects at the POU levelshows that accuracy varies significantly with different foldsin the training and testing data. We observe from Figure 5and 6 an uneven size in various box plots. We observe thatfor project 1 at the code tab level (ALL), the Q1, Q2 andQ3 are higher for LM in comparison to the Q1, Q2 and Q3for GD and NM. For project 2 at code tab level (RSA), wenotice that the Q1 for LM is higher than the Q3 of both GDand NM. At the POU level (ALL) for project 1, the medianvalue of F-Measure is the same for GD, NM and LM.

We use pairwise t-test to compare the performance of fea-ture selection techniques and classifier training methods. Weuse pairwise t-test to investigate if the differences betweenthe multiple classifiers in terms of their accuracy is a co-incidence or random or they are real [1]. We consider ANNwith three different types of training methods to develop

Table 9: Confusion Matrix (Project 2 [Code Tab Level])

(a) ALL(ANN+GD)

L M H

L 156 20 6

M 10 12 4

H 0 1 0

(b) PCA(ANN+GD)

L M H

L 164 33 10

M 0 0 0

H 2 0 0

(c) RSA(ANN+GD)

L M H

L 156 21 3

M 9 12 7

H 1 0 0

(d) ALL(ANN+NM)

L M H

L 157 16 2

M 9 16 3

H 0 1 5

(e) PCA(ANN+NM)

L M H

L 144 17 3

M 21 16 5

H 1 0 2

(f) RSA(ANN+NM)

L M H

L 156 13 2

M 10 20 6

H 0 0 2

(g) ALL(ANN+GD)

L M H

L 160 11 2

M 5 21 3

H 1 1 5

(h) PCA(ANN+GD)

L M H

L 152 22 2

M 13 11 8

H 1 0 0

(i) RSA(ANN+GD)

L M H

L 163 12 3

M 3 20 2

H 0 1 5

Table 10: Performance Results Based on Accuracy and F-Measure

Accuracy (%) F-Measure

Project Level Classifier AM PCA RSA AM PCA RSA

GD 81.53 80.89 79.62 0.93 0.93 0.92

Code Tab NM 82.80 79.62 80.89 0.94 0.92 0.93

LM 84.71 82.17 84.08 0.94 0.93 0.94

Project 1 GD 91.23 85.96 85.96 0.97 0.95 0.95

POU NM 89.47 85.96 89.47 0.96 0.95 0.96

LM 92.98 87.72 91.23 0.98 0.96 0.97

GD 80.38 78.47 80.38 0.92 0.92 0.92

Code Tab NM 85.17 77.51 85.17 0.95 0.91 0.95

LM 89.00 77.99 89.95 0.96 0.91 0.96

Project 2 GD 90.28 83.33 91.67 0.97 0.94 0.97

POU NM 93.06 88.89 91.67 0.98 0.96 0.97

LM 95.83 90.28 93.06 0.99 0.97 0.98

a model to predict change-proneness. We use three differ-ent subset of metrics of two different version of PLC projectswith two different performance parameters i.e., accuracy andF-Measure. Hence, for each prediction technique a totalnumber of two sets (one for each performance) are used,each with 12 data point [(2 feature selection method + 1considering all features) * 4 datasets)]. The results of t-testanalysis for different performance parameters are summa-rized in Table 12. Table 11 displays two parts. One part ofthe Table 11 shows the p-value and the other part shows themean difference values of performance parameter. Table 12reveals that for most of the cases there is a significant dif-ference between different approaches as the p-value is lesserthan 0.05. When p-value is less than 0.05 then we refer toit as statistically significant (significance level) and we re-ject the null hypothesis. We observe that the p-value of theGD and NM combination is 0.05. According to the value ofmean difference, LM i.e., ANN with LM yields better resultcompared to other classifiers.

9. THREATS TO VALIDITYWe believe that multiple experiments on multiple dataset

can increase the confidence level of the results and confirmthe findings. We have applied one of the methods called as

178

GD NM LM

Ac

cu

rac

y (

%)

75

80

85

GD NM LM

Ac

cu

rac

y (

%)

78

78.5

79

79.5

80

80.5

81

81.5

82

82.5

GD NM LM

Ac

cu

rac

y (

%)

72

74

76

78

80

82

84

RSAALL PCA

(a) Project 1 (Code Tab Level)

GD NM LM

Ac

cu

rac

y (

%)

55

60

65

70

75

80

85

90

GD NM LM

Ac

cu

rac

y (

%)

68

70

72

74

76

78

GD NM LM

Ac

cu

rac

y (

%)

74

76

78

80

82

84

86

88

90

RSAALL

PCA

(b) Project 2 (Code Tab Level)

Figure 3: Accuracy (%) [CodeTab Level]

Table 11: t-test - Feature Selection Techniques

Accuracy

P-value Mean Difference

ALL PCA RSA ALL PCA RSA

ALL NaN 0.00 0.06 0.00 4.80 1.11

PCA 0.00 NaN 0.01 -4.80 0.00 -3.70

RSA 0.06 0.01 NaN -1.11 3.70 0.00

F-Measure


ALL NaN 0.00 0.05 0.00 0.02 0.00

PCA 0.00 NaN 0.007014693 -0.02 0.00 -0.02

RSA 0.05 0.01 NaN 0.00 0.02 0.00

pairwise t-test to statistically compare multiple classifiersin-terms of their accuracy. In order to remove bias, severalother methods can be applied to compare the performance ofthe learning algorithm on multiple datasets. ANN trainingrequires several parameters and an optimal selection of theparameter values can significantly impact the accuracy ofthe classifier.

10. CONCLUSIONOur main conclusion is that it is possible to ac-

curately predict the change-proneness of StructuredText programs using source code metrics by employ-ing ANNs and PCA and RSA based feature selectiontechniques. We conclude that that Artificial Neural Net-works with Levenberg-Marquardt training method resultsin better accuracy (highest median and maximum valuesof performance parameters) in comparison to other trainingmethods.

The result obtained from our study indicates that it is

GD NM LM

Ac

cu

rac

y (

%)

74

76

78

80

82

84

86

88

90

92

GD NM LM

Ac

cu

rac

y (

%)

78

80

82

84

86

88

GD NM LM

Ac

cu

rac

y (

%)

76

78

80

82

84

86

88

90

92

RSAPCAALL

(a) Project 1 (POU Level)

GD NM LM

Ac

cu

rac

y (

%)

80

82

84

86

88

90

92

94

96

GD NM LM

Ac

cu

rac

y (

%)

78

80

82

84

86

88

90

GD NM LM

Ac

cu

rac

y (

%)

80

82

84

86

88

90

92

RSAPCAALL

(b) Project 2 (POU Level)

Figure 4: Accuracy (%) [POU Level]

Table 12: Classification Methods

Accuracy


GD NM LM GD NM LM

GD NaN 0.05 0.00 0.00 -1.66 -4.11

NM 0.05 NaN 0.00 1.66 0.00 -2.44

LM 0.00 0.00 NaN 4.11 2.44 0.00

F-Measure


GD NaN 0.04 0.00 0 -0.00 -0.01

NM 0.04 NaN 0.00 0.00 0 -0.00

LM 0.00 0.00 NaN 0.016 0.00 0

possible to identify a reduced subset of source code met-rics and attributes based on feature extraction and selectiontechnique for the task of change proneness prediction in thedomain of structured text programmable logic control pro-grams. From t-test analysis, it is evident that, there is a sig-nificant difference between various models developed usingdifferent several set of metrics, due to the fact that p-valuebeing greater than 0.05. However, by judging the value ofmean differences, all 10 metrics as a feature set yields betterresult compared to other approaches. Our results indicatethat despite different syntax and language semantics of do-main specific languages like ST in comparison to that ofgeneral purpose languages, classical source code metrics area good indicator of change proneness.

11. ACKNOWLEDGEMENTSWe acknowledge the support of our colleagues Raoul Jet-

179

GD NM LM

F-M

easure

90

90.5

91

91.5

92

92.5

93

93.5

94

94.5

GD NM LM

F-M

easure

91.5

92

92.5

93

GD NM LM

F-M

easure

88

89

90

91

92

93

94

RSAPCAALL

(a) Project 1 (Code Tab Level)

GD NM LM

F-M

easure

80

85

90

95

GD NM LM

F-M

easure

87

88

89

90

91

92

GD NM LM

F-M

easure

89

90

91

92

93

94

95

96

RSAPCAALL

(b) Project 2 (Code Tab Level)

Figure 5: F-Measure (Code Tab Level)

ley and Sreeja Nair in helping us getting access to the exper-imental data and base code on top of which we implementedour code.

References[1] Janez Demsar. Statistical comparisons of classifiers over

multiple data sets. Journal of Machine learning re-search, 7(Jan):1–30, 2006.

[2] Farideh Fazayeli, Lipo Wang, and Jacek Mandziuk.Feature selection based on the rough set theory andexpectation-maximization clustering algorithm. RoughSets and Current Trends in Computing RSCTC, pages272–282, 2008.

[3] Karl-Heinz John and Michael Tiegelkamp. IEC 61131-3: programming industrial automation systems: con-cepts and programming languages, requirements for pro-gramming systems, decision-making aids. Springer Sci-ence & Business Media, 2010.

[4] A Gunes Koru and Hongfang Liu. Identifying andcharacterizing change-prone classes in two large-scaleopen-source products. Journal of Systems and Soft-ware, 80(1):63–73, 2007.

[5] Lov Kumar, Raoul Jetley, and Ashish Sureka. Sourcecode metrics for programmable logic controller (plc)ladder diagram (ld) visual programming language. InWorkshop on Emerging Trends in Software Metrics,WETSoM, pages 15–21. ACM, 2016.

[6] Lov Kumar, Santanu Rath, and Ashish Sureka. Pre-dicting quality of service (qos) parameters using ex-treme learning machines with various kernel methods.

GD NM LM

F-M

easure

89

90

91

92

93

94

95

96

97

GD NM LM

F-M

easure

91

91.5

92

92.5

93

93.5

94

94.5

95

95.5

GD NM LM

F-M

easure

90

91

92

93

94

95

96

97

RSAPCAALL

(a) Project 1 (POU Level)

GD NM LM

F-M

easure

93

94

95

96

97

98

GD NM LM

F-M

easure

92

93

94

95

96

GD NM LM

F-M

easure

93

94

95

96

97

RSAALLPCA

(b) Project 2 (POU Level)

Figure 6: F-Measure (POU Level)

In Workshop on Quantitative Approaches to SoftwareQuality (QuASoQ 2016) co-located to (APSEC 2016).CEUR, 2016.

[7] Hongmin Lu, Yuming Zhou, Baowen Xu, Hareton Le-ung, and Lin Chen. The ability of object-oriented met-rics to predict change-proneness: a meta-analysis. Em-pirical Software Engineering, 17(3):200–242, 2012.

[8] FJ Malian, JLCM Barbancho, C Leon, A Malian, andA Gomez. Using industrial standards on plc pro-gramming learning. In Control & Automation, 2007.MED’07. Mediterranean Conference on, pages 1–6.IEEE, 2007.

[9] A. Nair. Product metrics for iec 61131-3 languages. InConference on Emerging Technologies Factory Automa-tion (ETFA), pages 1–8, Sept 2012.

[10] Andreas Otto and Klas Hellmann. Iec 61131: A generaloverview and emerging trends. Industrial ElectronicsMagazine, IEEE, 3(4):27–31, 2009.

[11] Herbert Prahofer, Florian Angerer, Rudolf Ramler,Hermann Lacheiner, and Friedrich Grillenberger. Op-portunities and challenges of static code analysis of iec61131-3 programs. In ETFA, pages 1–8. IEEE, 2012.

[12] Daniele Romano and Martin Pinzger. Using source codemetrics to predict change-prone java interfaces. In Con-ference on Software Maintenance (ICSM), pages 303–312. IEEE, 2011.

[13] Nieke Roos. Programming plcs using structured text.In International Multiconference on Computer Scienceand Information Technology, pages 20–22. Citeseer,2008.

180

Using Structured Text Source Code Metrics and Artiﬁcial ... to create Programmable Logic...

Documents

Transcript of Using Structured Text Source Code Metrics and Artiﬁcial ... to create Programmable Logic...