the performance of soft computing techniques on content-based sms ...

42
THE PERFORMANCE OF SOFT COMPUTING TECHNIQUES ON CONTENT-BASED SMS SPAM FILTERING WADDAH WAHEEB HASSAN SAEED A thesis submitted in partial fulfillment of the requirement for the award of the Degree of Master of Computer Science (Soft Computing) Faculty of Computer Science and Information Technology Universiti Tun Hussein Onn Malaysia FEBRUARY 2015

Transcript of the performance of soft computing techniques on content-based sms ...

Page 1: the performance of soft computing techniques on content-based sms ...

THE PERFORMANCE OF SOFT COMPUTING TECHNIQUES

ON CONTENT-BASED SMS SPAM FILTERING

WADDAH WAHEEB HASSAN SAEED

A thesis submitted in partial

fulfillment of the requirement for the award of the

Degree of Master of Computer Science (Soft Computing)

Faculty of Computer Science and Information Technology

Universiti Tun Hussein Onn Malaysia

FEBRUARY 2015

Page 2: the performance of soft computing techniques on content-based sms ...

iv

ABSTRACT

Content-based filtering is one of the most widely used methods to combat SMS (Short

Message Service) spam. This method represents SMS text messages by a set of se-

lected features which are extracted from data sets. Most of the available data sets have

imbalanced class distribution problem. However, not much attention has been paid to

handle this problem which affect the characteristics and size of selected features and

cause undesired performance. Soft computing approaches have been applied success-

fully in content-based spam filtering. In order to enhance soft computing performance,

suitable feature subset should be selected. Therefore, this research investigates how

well suited three soft computing techniques: Fuzzy Similarity, Artificial Neural Net-

work and Support Vector Machines (SVM) are for content-based SMS spam filtering

using an appropriate size of features which are selected by the Gini Index metric as

it has the ability to extract suitable features from imbalanced data sets. The data sets

used in this research were taken from three sources: UCI repository, Dublin Institute of

Technology (DIT) and British English SMS. The performance of each of the technique

was compared in terms of True Positive Rate against False Positive Rate, F1 score and

Matthews Correlation Coefficient. The results showed that SVM with 150 features

outperformed the other techniques in all the comparison measures. The average time

needed to classify an SMS text message is a fraction of a millisecond. Another test

using NUS SMS corpus was conducted in order to validate the SVM classifier with

150 features. The results again proved the efficiency of the SVM classifier with 150

features for SMS spam filtering with an accuracy of about 99.2%.

Page 3: the performance of soft computing techniques on content-based sms ...

v

ABSTRAK

Penapisan berasaskan kandungan merupakan salah satu kaedah yang paling banyak

digunakan untuk mengatasi spam SMS (Short Message Service). Kaedah ini mewak-

ili mesej teks SMS dengan satu set ciri terpilih yang diekstrak daripada set-set data.

Kebanyakan daripada set-set data sedia ada mempunyai permasalahan pengagihan ke-

las yang tidak seimbang. Walau bagaimanapun, tidak banyak perhatian diberi dalam

menangani permasalahan ini yang mana ia memberi kesan pada ciri-ciri dan saiz ciri

yang dipilih dan menyebabkan prestasi yang tidak diingini. Pendekatan pengkomput-

eran lembut telah digunakan dengan jayanya dalam penapisan spam berasaskan kan-

dungan. Bagi meningkatkan kecekapan pengkomputeran lembut, subset ciri yang bers-

esuaian perlu dipilih. Oleh itu, kajian ini mengkaji bagaimana tiga teknik pengkomput-

eran lembut: Fuzzy Similarity, Artificial Neural Network dan Support Vector Machines

(SVM) sesuai bagi penapisan spam berasaskan kandungan menggunakan saiz ciri yang

bersesuaian yang dipilih menggunakan pengukuran Indeks Gini yang mempunyai ke-

upayaan untuk mengekstrak ciri yang bersesuaian daripada set-set data yang tidak se-

imbang. Set-set data yang digunakan dalam kajian ini telah diambil dari tiga sum-

ber: repositori UCI, Dublin Institute of Technology (DIT) dan British English SMS.

Prestasi teknik-teknik ini telah dibandingkan dari segi True Positive Rate against False

Positive Rate, F1 score dan Matthews Correlation Coefficient. Hasil dapatan menun-

jukkan bahawa SVM dengan 150 ciri lebih baik daripada kedua-dua teknik bandin-

gan dalam kesemua pengukuran perbandingan. Purata masa yang diperlukan untuk

mengkelaskan mesej teks SMS adalah pecahan milisaat. Bagi mengesahkan penge-

las SVM dengan 150 ciri, pengujian lain menggunakan NUS SMS corpus dijalankan.

Hasil dapatan membuktikan bahawa kecekapan pengelas SVM dengan 150 ciri bagi

menapis spam SMS dengan ketepatan sekitar 99.2%.

Page 4: the performance of soft computing techniques on content-based sms ...

vi

CONTENTS

TITLE i

DECLARATION ii

ACKNOWLEDGEMENT iii

ABSTRACT iv

ABSTRAK v

CONTENTS vi

LIST OF TABLES x

LIST OF FIGURES xi

LIST OF SYMBOLS AND ABBREVIATIONS xii

LIST OF APPENDICES xiii

CHAPTER 1 INTRODUCTION 1

1.1 Overview 1

1.2 Problem Statement 2

1.3 Aim of Research 5

1.4 Objective of Research 5

1.5 Scope of Research 5

1.6 Significance of Research 6

1.7 Research Outline 6

Page 5: the performance of soft computing techniques on content-based sms ...

vii

CHAPTER 2 LITERATURE REVIEW 8

2.1 Introduction 8

2.2 Fuzzy Logic 9

2.2.1 Fuzzy Similarity 10

2.2.2 T-norms and S-norms 11

2.3 Artificial Neural Network 13

2.3.1 Neuronal Model 13

2.3.2 Activation Functions 14

2.3.3 Multilayer Perceptron 15

2.3.4 Back-propagation Algorithm 16

2.3.5 Scaled Conjugate Gradient 16

2.4 Support Vector Machine 19

2.4.1 Support Vector Classification 20

2.4.2 Radial Basis Function Kernel 24

2.5 Content-based Filtering for SMS Spam 24

2.6 Chapter Summary 27

CHAPTER 3 METHODOLOGY 28

3.1 Introduction 28

3.2 Data Collection 29

3.3 Data Preprocessing 30

3.3.1 Removal of Duplicated Messages 31

3.3.2 Removal of non-English Messages 32

3.3.3 Lower case Conversion 32

3.3.4 Feature Abstraction Replacement 32

3.3.5 Tokenization 33

Page 6: the performance of soft computing techniques on content-based sms ...

viii

3.3.6 Stemming 33

3.4 Dimensionality Reduction 33

3.5 Data Partition 35

3.6 Training and Testing 36

3.6.1 SMS Text Message Representation 36

3.6.2 Training Fuzzy Similarity Model 37

3.6.3 Training Artificial Neural Network Model 38

3.6.3.1 Number of Input and Output Units 39

3.6.3.2 Activation Function 39

3.6.3.3 Inputs and Outputs Scaling 39

3.6.3.4 Number of Hidden Layers and Neurons 40

3.6.3.5 Parameters Setting 40

3.6.4 Training Support Vector Machines Model 41

3.6.4.1 SVM Kernel Parameters Searching 41

3.7 Model Selection 42

3.7.1 True Positive Rate against False Positive Rate 42

3.7.2 F1 Score 43

3.7.3 Matthews Correlation Coefficient 43

3.8 Chapter Summary 44

CHAPTER 4 DATA ANALYSIS AND RESULTS 45

4.1 Introduction 45

4.2 Experimental Design 45

4.3 True Positive Rate against False Positive Rate Analysis 46

4.4 F1 Score Analysis 48

4.5 Matthews Correlation Coefficient Analysis 50

Page 7: the performance of soft computing techniques on content-based sms ...

ix

4.6 Feature Characteristics Analysis 51

4.7 Classification Time Analysis 52

4.8 Additional Testing Using NUS SMS Corpus 52

4.9 Chapter Summary 53

CHAPTER 5 DISCUSSION AND CONCLUSIONS 54

5.1 Introduction 54

5.2 Novelty and Research Contribution 55

5.2.1 Selecting Feature Subsets Using Gini Index Metric 55

5.2.2 Applying Soft Computing Techniques for SMSSpam Filtering with Selected Feature Subsets 55

5.2.3 Evaluating Soft Computing Techniques Perfor-mance for SMS Spam Filtering Using SuitableMeasures 56

5.3 Recommendations for Future Work 56

5.4 Chapter Summary 57

REFERENCES 58

VITAE 72

Page 8: the performance of soft computing techniques on content-based sms ...

x

LIST OF TABLES

2.1 T-norms and s-norms operators 12

3.1 Class information of the collected data sets

30

3.2 Number of instances before and after removal of

message duplicates

31

3.3 Number of instances before and after removal of non-

English messages

32

3.4 List of replaced features 33

3.5 Two-way contingency table of a feature fi and a class C j

for binary classification

34

3.6 Number of hidden neurons with its number of features 40

3.7 C and γ values with its number of features

41

4.1 Number of instances in training and testing data set

46

4.2 TPR against FPR classifiers comparison with its number

of features [AUC]

47

4.3 Number of instances before and after data preprocessing

steps in NUS SMS corpus

53

A.1 Hidden neurons searching with its number of features

65

B.1 C and γ values searching with its number of features 67

C.1 ANN simulation results with its number of features

69

Page 9: the performance of soft computing techniques on content-based sms ...

xi

LIST OF FIGURES

2.1 An example of ANN with one hidden layer 14

2.2 Neuronal model 14

2.3 Linearly separable training data 20

2.4 Possible separating hyperplanes with their associated

margins

21

2.5 Support vectors 22

2.6 Two class nonlinear separable problem 23

3.1 Research framework 29

3.2 Extract taken from the removal of message duplicates 31

3.3 Representation of vector space model 36

4.1 TPR against FPR classifiers comparison with 150

features

48

4.2 True positive rate classifiers comparison with 150

features

49

4.3 False positive rate classifiers comparison with 150

features

49

4.4 F1 classifiers comparison with its number of features 50

4.5 Precision classifiers comparison with 150 features 50

4.6 MCC classifiers comparison with its number of features 51

4.7 Degree of combination of positive and negative features 52

Page 10: the performance of soft computing techniques on content-based sms ...

xii

x

LIST OF SYMBOLS AND ABBREVIATIONS

ANN - Artificial Neural Network

AUC - Area Under Curve

FN - False Negative

FP - False Positive

FPR - False Positive Rate

MCC - Matthews Correlation Coefficient

MLP - Multilayer Perceptron

MSE - Mean Squared Error

RBF - Radial Basis Function

ROC - Receiver Operating Characteristic

SCG - Scaled conjugate gradient

SMS - Short Message Service

SVM - Support Vector Machine

TN - True Negative

TP - True Positive

TPR - True Positive Rate

Page 11: the performance of soft computing techniques on content-based sms ...

xiii

LIST OF APPENDICES

APPENDIX TITLE PAGE

A

Performance of ANN classifier using range of

hidden neurons with its number of features

65

B

Performance of SVM classifier using range of C and

γ values with its number of features

67

C

ANN Simulation results with its number of features 69

Page 12: the performance of soft computing techniques on content-based sms ...

CHAPTER 1

INTRODUCTION

1.1 Overview

SMS which stands for “Short Message Service” is a service used to send short text

messages from a mobile device or via the web and received by a mobile device. This

service is a very popular type of communication between people, for its ease of use,

its fast response and its relatively cheap cost as compared to telephone calls. Thus in

2012, 7.5 trillion SMS messages were sent all over the world (GSMA, 2013). However,

not all SMS messages are solicited - mobile device users receive legitimate messages

as well as unwanted messages which are called spam.

SMS spam forms 20 to 30% of all SMS traffic in some parts of Asia such

as China and India (GSMA, 2011). Some methods are used to combat SMS spam

such as black-and-white listing, traffic analysis and content-based filtering (Delany,

Buckley & Greene, 2012). According to Delany et al. (2012), content-based filtering

method is required to counteract the increasing threat of SMS spam and to avoid the

disadvantages of other filtering methods. Content-based filtering uses some techniques

to analyze the contents of SMS text messages to ascertain whether it is legitimate or

spam.

Many studies on content-based SMS spam filtering selected some features (lex-

ical or stylistic) to represent SMS text messages and these selected features are ex-

Page 13: the performance of soft computing techniques on content-based sms ...

2

tracted from SMS data sets with imbalanced class distribution problems. However,

not much attention has been paid to handle the imbalanced class distribution problem

which could produce unsuitable features or a huge number of features in order to filter

SMS spam. Therefore, a suitable feature selection metric is required to select proper

features from the imbalanced data sets in order to improve filtering performance. Be-

sides a suitable feature selection metric, a suitable technique which has been engaged

in spam filtering is essential. Soft computing techniques have been present in almost

every domain (e.g. spam filtering) and their ability have been proven (El-Alfy & Al-

Qunaieer, 2008; Guzella & Caminhas, 2009).

In this research, the main purpose is to find out how well suited soft computing

techniques, namely Fuzzy Similarity, Artificial Neural Network (ANN) and Support

Vector Machines (SVMs) are for content-based SMS spam filtering using appropriate

features which are selected by the Gini Index metric.

1.2 Problem Statement

SMS spam is a growing problem. Mobile device users in the U.S. received 1.1 billion

spam messages in 2007 (Hart, 2008) and 4.5 billion in 2011 (Kharif, 2012). SMS spam

can be defined as unsolicited bulk electronic messages. Unsolicited means the recip-

ients receive unwanted messages without their consent and bulk because the sender

sends many identical messages to different recipients (Bueti, 2005).

Many reasons motivate spammers to use this service which support the growth

of this problem such as the attraction to read all received messages by mobile device

users, the accessibility of this service from anywhere, lack of laws and regulations

to control the purchase of phone numbers and the handling of this problem in some

countries (Liu & Yang, 2012). In addition, there is an increasing number of mobile

device users who can be targeted (GSMA, 2013), the limited availability of mobile

applications for SMS spam filtering (Almeida, Hidalgo & Yamakami, 2011), the higher

response rate for this service and the availability of very cheap bulk pre-paid SMS

Page 14: the performance of soft computing techniques on content-based sms ...

3

packages in some countries in Asia with easy solutions to send bulk messages (Delany

et al., 2012) as well as mobile network operators who contribute to this problem by

sending messages about their offers.

SMS spam has caused mobile device users and mobile network operators a lot

of problems. Spam messages irritate mobile users by filling their in-boxes and wasting

their time reading and deleting the messages (Uysal et al., 2012). Some types of SMS

spam try to bill mobile device users by tricking them to call premium rate numbers or

subscribe to services or, trick the users to call certain numbers to collect confidential

information from them to use for other purposes — called phishing (GSMA, 2011).

Other types of SMS spam attack mobile device users to steal their money (GSMA,

2011), subject smart-phones to viruses (Murynets & Jover, 2012), harm mobile device

operating systems, spread viruses to other mobile device users and violate privacy. Fur-

thermore, in some countries mobile device users pay to receive their messages which

may include spam messages (Almeida et al., 2011). Mobile network operators also

suffer from this problem. They are prone to lose their subscribers because the perfor-

mance of the network is affected by the load that SMS spam generates which in turn

delay the reception of legitimate messages (Yadav et al., 2011). They may also lose

some revenue because they cannot bill the sender(s) a termination fee as some types of

SMS spam are sent from fraudulent addresses (Cisco, 2005).

Many methods have been used to prevent SMS spam due to these problems,

such as black-and-white listing which is used by mobile applications such as android

applications (GooglePlay, n.d.), traffic analysis (GSMA, 2011), content-based filter-

ing(Hidalgo, Bringas & Sánz, 2006; Almeida et al., 2011; Sohn et al., 2012) and

a combination of black-and-white listing and content-based filtering (Deng & Peng,

2006; Mahmoud & Mahfouz, 2012). With black-and-white listing, the mobile de-

vice user saves the phone numbers of legitimate and spam message senders into two

groups: legitimate group (white list) and spam group (black list). The disadvantages

of the black-and-white listing method, is that if the phone numbers are not in the black

list, the recipient will receive the spam message(s). In addition, this method will dis-

Page 15: the performance of soft computing techniques on content-based sms ...

4

card legitimate messages that may be sent from a black-listed phone number(s) (Uysal

et al., 2012). Another anti-spam method uses traffic analysis to compare the sub-

scriber’s volume of sent messages to volume limits, but spammers avoid this method

by sending low volumes of messages to observe the operator system response and then

determine the operator’s volume limit policies (Delany et al., 2012). Content-based

filtering method uses some techniques to analyze SMS text message content in order

to decide whether it is legitimate or spam. The spammer tries to avoid these filters by

making sophisticated message modifications (GSMA, 2011), however, content-based

filtering still needs to avoid spammers’ traffic analysis tricks (Delany et al., 2012) as

well as the black-and-white listing.

Many studies in the literature on content-based SMS spam filtering selected

some features to represent SMS text messages and these selected features are extracted

from SMS data sets with imbalanced class distribution problem. However, not much

attention has been paid to handle the imbalanced class distribution problem which

affect the characteristics and the size of the selected features and cause undesired per-

formance. Therefore, in order to select suitable features from the imbalanced data sets,

a suitable feature selection scheme is needed. The Gini Index (Shang et al., 2007) is

a feature selection metric which has the ability to handle class imbalance problem by

selecting proper features (Ogura, Amano & Kondo, 2011) which will improve the per-

formance of filtering. Besides a suitable feature selection metric, a suitable technique

which has been engaged in spam filtering is required. Soft computing techniques have

been present in almost every domain (e.g. spam filtering) and their ability has been

proven (El-Alfy & Al-Qunaieer, 2008; Guzella & Caminhas, 2009).

Therefore, this research investigates the performance of three selected soft

computing techniques: Fuzzy Similarity, Artificial Neural Network and Support Vec-

tor Machines and whether they are suitable for content-based SMS spam filtering using

appropriate size of features selected by the Gini Index metric.

Page 16: the performance of soft computing techniques on content-based sms ...

5

1.3 Aim of Research

The aim of this research is to filter SMS spam based on its contents using soft com-

puting techniques, namely Fuzzy Similarity, Artificial Neural Network and Support

Vector Machine with appropriate features selected by the Gini Index metric.

1.4 Objective of Research

In order to achieve the above mentioned aim of the research, the following are three

research objectives:

i To select feature subsets using the Gini Index metric to represent SMS text mes-

sages.

ii To apply soft computing techniques: Fuzzy Similarity, Artificial Neural Network

and Support Vector Machine for SMS spam filtering with feature subsets selected

in (i).

iii To compare the performance of (ii) in terms of True Positive Rate (TPR) against

False Positive Rate (FPR), F1 score and Matthews Correlation Coefficient (MCC).

1.5 Scope of Research

This research was to filter English SMS text message into two classes either legitimate

or spam based on its contents. The data was taken from three sources: UCI machine

learning repository (Bache & Lichman, 2013), Dublin Institute of Technology (DIT)

(Delany et al., 2012) and British English SMS (Nuruzzaman, Lee & Choi, 2011). Fea-

ture subsets were selected using the Gini Index metric (Shang et al., 2007). Three soft

computing techniques: Fuzzy Similarity (Widyantoro & Yen, 2000), Artificial Neu-

ral Network which trained using Scaled Conjugate Gradient algorithm (SCG) (Møller,

1993) and Support Vector Machine with Radial Basis Function (RBF) kernel (Chang

Page 17: the performance of soft computing techniques on content-based sms ...

6

& Lin, 2011) were used to filter SMS spam. Results were compared in terms of True

Positive Rate (TPR) against False Positive Rate (FPR) , F1 score and Matthews Corre-

lation Coefficient (MCC).

1.6 Significance of Research

The efficiency of soft computing techniques for SMS spam filtering with feature sub-

sets selected by the Gini Index metric was examined in this research. Therefore, this

research was conducted to establish a comparison in performance between Fuzzy Sim-

ilarity, Artificial Neural Network and Support Vector Machine to investigate whether

they can provide better results based on the selected feature subsets. The outcome of

this research could contribute to verifying the best performance with small size features

for SMS spam filtering and also contribute to future work in exploring the possibility of

other feature selection metrics with soft computing techniques in SMS spam filtering.

1.7 Research Outline

The remaining part of this research is arranged in the following chapters. Chapter 2 is

concerned with the relevant background in using content-based filtering technique for

SMS spam filtering. Likewise, the chapter also highlights soft computing techniques,

namely Fuzzy Similarity, Artificial Neural Network and Support Vector Machine.

Chapter 3 describes briefly steps on how to use soft computing techniques for

SMS spam filtering, starting from data collection, data preprocessing, dimensionality

reduction, data partition, training and testing, and selecting the best soft computing

technique based on specified measures.

Simulations results with analysis which evaluate the soft computing techniques

are presented in Chapter 4. Feature subset characteristics and classification time are

also analyzed. The best soft computing technique with the best feature subsets are

tested using another SMS corpus. In order to simplify the discussions, graphs that

Page 18: the performance of soft computing techniques on content-based sms ...

7

summarize the results are provided. Chapter 5 concludes the work done and provides

several recommendations to improve and validate the performance of the soft comput-

ing techniques for SMS spam filtering.

Page 19: the performance of soft computing techniques on content-based sms ...

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

Many real world problems cannot be solved using hard computing techniques that

deal with precision and certainty due to the fact that either these real-world problems

are difficult to model mathematically or computationally expensive or require huge

amounts of memory (Shukla, Tiwari & Kala, 2012). However, in some cases, human

experts can deal with these problems successfully, e.g. face recognition. According

to Zadeh, soft computing is “an emerging approach to computing, which parallels

the remarkable ability of the human mind to reason and learn in an environment of

uncertainty and imprecision” (Zadeh, 1994). From this definition, it is clear that soft

computing is inspired by natural processes — especially the human brain. Therefore,

soft computing techniques are needed to offer simple, reliable and low cost solutions

to these types of problems with best results.

The development of soft computing techniques has attracted the interest of re-

searchers from different disciplines over the past two decades. Soft computing tech-

niques are applied in various domains such as bioinformatics, biomedical systems, data

mining, image processing, machine control, robotics, time series prediction, wireless

networks, etc.(Shukla et al., 2012).

Classification problem is one of three main categories of problems for which

Page 20: the performance of soft computing techniques on content-based sms ...

9

soft computing is applied (Shukla et al., 2012). A classification problem relates an

object depending on its attributes into a known group or class. If there are many dif-

ferences among the classes based on their attributes then the classification problem

becomes quite simple. However, if the classes are quite similar, it becomes rather

difficult. Therefore, soft computing is needed to offer solutions to these problems.

In this research, three soft computing techniques are used, namely Fuzzy Sim-

ilarity, Artificial Neural Networks and Support Vector Machine to classify SMS text

messages into two classes either legitimate or spam. These techniques have been used

for email spam filtering (El-Alfy & Al-Qunaieer, 2008; Guzella & Caminhas, 2009).

Therefore, in order to be more certain about these techniques, this chapter provides a

discussion on them. This chapter also reviews the related works regarding the problem

under study; the content-based filtering for SMS spam.

2.2 Fuzzy Logic

The concept of fuzzy logic was introduced in 1965 by Zadeh as a new concept to deal

with problems in which the imprecision is the absence of precisely defined criteria of

class membership (Zadeh, 1965). The acceptance of fuzzy logic started in the second

half of the 1970s after the success of the first practical application which is called fuzzy

control. Since then, fuzzy logic has been applied in many mathematical and practical

areas including clustering, optimization, operations research, control and expert sys-

tems, medicine, data mining and pattern recognition (Zimmermann, 2010).

Fuzzy logic deals with fuzzy sets which are an extension of the definition on

crisp sets. Unlike the characteristic function for crisp sets, the characteristic function

(membership function) of fuzzy sets is represented by a degree of relevance in the range

[0,1]. This provides flexibility in dealing with uncertainty in systems such as spam

filtering (El-Alfy & Al-Qunaieer, 2008). Fuzzy logic has not received much attention

for SMS spam filtering. Fuzzy Similarity (Widyantoro & Yen, 2000) performs well in

email spam filtering (El-Alfy & Al-Qunaieer, 2008). Thus, this research investigates

Page 21: the performance of soft computing techniques on content-based sms ...

10

the effectiveness of Fuzzy Similarity in content-based SMS spam filtering.

2.2.1 Fuzzy Similarity

Fuzzy similarity is adapted from the Rocchio algorithm (Rocchio, 1971). In this al-

gorithm, a cluster center is created for each category from training samples and the

similarity between each test sample and a category is measured using cosine coeffi-

cient. In fuzzy similarity which was proposed by (Widyantoro & Yen, 2000), a fuzzy

term-category relation is developed, whereby the Rocchio cluster is represented by a

set of membership degree of words to a particular category. Based on the fuzzy term-

category relation, the similarity between a document and a category’s cluster center is

calculated using fuzzy conjunction and disjunction operators, and the calculated simi-

larity represents the membership degree of document to the category.

Fuzzy similarity has two finite sets, a set of terms T = t1, t2, . . . , tn and a set

of categories C = c1,c2, ...,cn. A fuzzy relation R : T ×C → [0,1] , whereby the

membership value of the relation, which denotes by µR(ti,c j), specifies the degree of

relevance of term ti to category c j. The membership values of this relation are extracted

from a training set.

Every training example in the training set is represented by a set of term-

frequency pairs d = {(t1,o1),(t2,o2), ...,(tm,om)}where o j is the occurrence frequency

of term t j in the document. Given a set of training documents D, the membership value

of the relation R(ti,c j), denoted by µR(ti,c j), is calculated as follows. First, all docu-

ments are grouped according to their category. Next, the occurrence frequency of each

term for each category is collected by summing up the term frequency of individual

documents in that category. Then the value of µR(ti,c j) is calculated from the total

number of occurrences of term ti in category c j divided by the total number of term

frequency t j in all categories as expressed in Eq. (2.1).

Page 22: the performance of soft computing techniques on content-based sms ...

11

µR(ti,c j) =

∑{wi∈dk∧dk∈D∧c(dk)=c j}

wi

∑{wi∈dk∧dk∈D}

wi(2.1)

Now, the membership values of fuzzy term-category relation are known, the

similarity between a document and the category’s membership values of the term is

given by Eq. (2.2),

Sim(d,c j) =

∑tεd

µR(t,c j)⊗µd(t)

∑tεd

µR(t,c j)⊕µd(t)(2.2)

in which µd(t) is the membership degree that term t belongs to d for each term t in

d, ⊗ and ⊕ denote fuzzy conjunction (t-norm) and fuzzy disjunction (s-norm) opera-

tors, respectively. The category of the document is the category that has the highest

similarity measure.

2.2.2 T-norms and S-norms

There are various t-norms and s-norms which are frequently used in the literature. In

order to define any t-norms and s-norms operations, there are some axioms that should

be satisfied. For t-norms operation, any binary operation t should satisfy the following

axioms in order to be a t-norm operation, given x,y,z ∈ [0,1]:

Axiom1. t(x,1) = x (boundary condition)

Axiom2. y≤ z implies t(x,y)≤ t(x,z) (monotonicity)

Axiom3. t(x,y) = t(y,x) (commutativity)

Axiom4. t(x, t(y,z)) = t(t(x,y),z) (associativity)

Almost the same axioms are defined for s-norms operation, given x,y,z ∈ [0,1]:

Axiom1. s(x,0) = x (boundary condition)

Axiom2. y≤ z implies s(x,y)≤ s(x,z) (monotonicity)

Axiom3. s(x,y) = s(y,x) (commutativity)

Page 23: the performance of soft computing techniques on content-based sms ...

12

Axiom4. s(x,s(y,z)) = s(s(x,y),z) (associativity)

The boundary condition is to range the results to be in [0,1]. Monotonicity

and commutativity are to ensure that a decrease in the degree of membership in set

X or Y cannot produce an increase in the degree of membership in the intersection or

union. Commutativity ensures that the fuzzy intersection and fuzzy union are symmet-

ric therefore there is no consideration for order. The last axiom, associativity, allows

taking the intersection of any number of sets in any order of pairwise grouping desired

(Klir & Yuan, 1995).

Among the various t-norms and s-norms as shown in Table 2.1, the standard

fuzzy intersection and the standard fuzzy union have special features. One of the de-

sirable features is that the standard fuzzy intersection, min operator, and the standard

fuzzy union, max operator, prevent the compounding of errors in the operands which

is lacking in most alternative norms (Klir & Yuan, 1995). For example, If any error e

is associated with the membership values µA(x) and µB(x), then the maximum error

associated with the membership value of x in µA(x) , µA∪B(x) and µA∩B(x) remains e

(Klir & Yuan, 1995). For that, the standard fuzzy intersection, min operator, and the

standard fuzzy union, max operator, are selected in this research.

Table 2.1: T-norms and s-norms operators

t-norms t(x,y) s-norms s(x,y)Standard intersection Standard union

t(x,y) = min(x,y) s(x,y) = max(x,y)Algebraic product Algebraic sum

t(x,y) = x · y s(x,y) = x+ y− x · yBounded difference Bounded sum

t(x,y) = max(0,x+ y−1) s(x,y) = min(1,x+ y)Drastic intersection Drastic union

t(x,y) =

x when y = 1y when x = 10 otherwise.

s(x,y) =

x when y = 0y when x = 01 otherwise.

Page 24: the performance of soft computing techniques on content-based sms ...

13

2.3 Artificial Neural Network

Artificial Neural Network (ANN) is inspired by biological nervous systems, such as

the human brain; it can learn and memorize sets of data and adjust its weight matrices

to build classifiers that can be used to classify unseen data.

ANN is a machine designed to model how the brain performs a particular task

or function of interest (Haykin, 2009). It resembles the human brain in two ways: its

knowledge is acquired through training process and this acquired knowledge is stored

within inter-neuron connection strengths known as synaptic weights (Haykin, 2009).

There are many advantages of ANN such as its high tolerance of noisy data (Han,

Kamber & Pei, 2011), its ability to produce a reasonable output for unseen data, its

non-linearity that allows it to model complex real world relationships, its input-output

mapping in which the model structure is determined from data and its ability to adapt

its weights to deal with changes in the environment by retraining (Haykin, 2009).

ANN generally consists of three sets of layers that are ordered in the network as

input layer, hidden layer(s) and output layer. Each layer consists of a number of nodes

or neurons (computation nodes) that are linked with neurons in the adjacent layers

through weighted connections, as shown in Figure 2.1. The information flow during

the training process is from the input layer to the output layer through the hidden

layer(s).

2.3.1 Neuronal Model

According to Haykin (2009), ANN consists of a large number of simple processing

units called nodes or neurons. The basic design for ANN is the neuronal model as

shown in Figure 2.2; a set of connecting links associated with weights that link the

neurons, adder to sum the weights, an activation function that limit the resulting value

from the adder into a specified range and bias which affects the net input of the activa-

tion function by increasing or lowering it (Haykin, 2009).

Page 25: the performance of soft computing techniques on content-based sms ...

14

Figure 2.1: An example of ANN with one hidden layer (Haykin, 2009)

Figure 2.2: Neuronal model (Haykin, 2009)

2.3.2 Activation Functions

As mentioned before, activation functions limit the resulting value from the adder into

a specified range. Non-linear and continuous properties are the most important char-

acteristics in the activation functions which allow ANN to work with complex non-

linear domains. The non-linear property allows ANN to map the inputs and outputs in

non-linearity, while continuous means that they remain within a specified finite range

(Samarasinghe, 2006). Figure 2.2 shows the activation function which is denoted by f

and its input uk.

Page 26: the performance of soft computing techniques on content-based sms ...

15

The most commonly used activation function to construct neural networks is

the sigmoid function (Samarasinghe, 2006; Haykin, 2009). It is defined in the range [0

- 1] and its expression is given by Eq. (2.3).

f(v) =1

1+ e−v (2.3)

in which u is the neuron’s output.

2.3.3 Multilayer Perceptron

Multilayer Perceptron (MLP) is a class of different architecture classes in ANN (Haykin,

2009). It consists of an input layer and one or more hidden layers and an output layer.

Each layer consists of a number of nodes or neurons (computation nodes). Input nodes

in the input layer represent the problem’s input variables, the output layer’s neurons

represent the problem classes, while the hidden neurons in the hidden layer(s) help to

capture the non-linear relationship between inputs and outputs.

MLP performs complex classification tasks and it is the most popular and

widely used non-linear networks for solving many practical problems in many fields

such as ecology, biology and engineering (Samarasinghe, 2006). The power of MLP

comes from its hidden layer which allows it to approximate any non-linear relationship

between inputs and outputs (Samarasinghe, 2006). MLP is trained to capture the rela-

tionship between inputs and outputs by supervised training algorithm which means for

each input the desired output must be presented to the network (Samarasinghe, 2006).

The most popular supervised training algorithm for training MLP is back-propagation

algorithm (Haykin, 2009). This algorithm is explained in the next section.

Page 27: the performance of soft computing techniques on content-based sms ...

16

2.3.4 Back-propagation Algorithm

The ability to learn from the environment and improve its performance during the train-

ing (learning) process is the most significant property in ANN (Haykin, 2009). Train-

ing allows MLP to extract knowledge that is hidden in the samples. MLP is trained

by supervised algorithms which show the network the desired output for each input.

The training is conducted in two phases. In the first phase which is called the forward

phase, inputs are presented to the network at the input layer and are then passed to the

output layer via the hidden layer(s) to generate network output. In the second phase

which is called the backward phase, the difference between the generated network out-

put and the desired output is calculated and this is called the network error. If this error

is larger than an acceptable threshold, the error is back-propagated through the net-

work to adjust its weights using an appropriate training method to minimize the error.

In general, the training process is repeated until the optimum network performance is

achieved (Samarasinghe, 2006).

2.3.5 Scaled Conjugate Gradient

Scaled conjugate gradient algorithm (SCG) can be used to train MLP. It uses second

order information from ANN but requires O(N) memory usage, where N is the num-

ber of the network’s weight (Møller, 1993). Unlike gradient descent algorithm which

relies on the user dependent parameters learning rate and momentum, SCG is a fully

automated training algorithm which includes no critical user-dependent parameters

(Møller, 1993).

There are different types of conjugate gradient algorithms. Each algorithm re-

quires a line search each repetition (Demuth, Beale & Hagan, 2008). This line search is

computationally expensive, because it involves several calculations of either the global

error function or the derivative of the global error function (Møller, 1993). SCG avoids

line search by using the Levenberg-Marquardt approach to scale the step size.

Page 28: the performance of soft computing techniques on content-based sms ...

17

The derivation of this algorithm is found in (Møller, 1993). Details of the SCG

algorithm are as follows:

1. Initialize weight vector at the first iteration, w1, and set the values of σ > 0 which

represents changes in weight for second derivative approximation, λ1 > 0 is the

parameter for regulating the indefiniteness of the Hessian, and λ = 0 . Set the

initial conjugate solution, p1, and the steepest descent direction, r1, equal to the

error surface gradient, p1 = r1 =−E′(w1). Set k = 1 and success = true.

2. If success = true then calculate second order information seck:

σk =σ

|pk|, (2.4a)

seck =E′(wk +σk pk)−E

′(wk)

σk, (2.4b)

δk = pTk seck. (2.4c)

3. Scale seck and δk

seck = seck +(λk−λk)pk, (2.5a)

δk = δk +(λk−λk)|pk|2. (2.5b)

4. If δk ≤ 0 then make the Hessian matrix positive definite:

seck = seck +(λk−2δk

|pk|2)pk, (2.6a)

λk = 2(λk−δk

|pk|2), (2.6b)

δk = −δk +λk|pk|2, (2.6c)

λk = λk. (2.6d)

Page 29: the performance of soft computing techniques on content-based sms ...

18

5. Calculate step size αk:

µk = pTk rk, (2.7a)

αk =µk

δk. (2.7b)

6. Calculate the comparison parameter4k:

4k =2δk[E(wk)−E(wk +αk pk)]

µ2k

. (2.8)

7. If4k ≥ 0 then a successful reduction in error can be made:

wk+1 = wk +αk pk, (2.9a)

rk+1 = −E′(wk+1), (2.9b)

λk = 0, (2.9c)

success = true. (2.9d)

7a. If k mod N = 0 then restart algorithm by: pk+1 = rk+1, else create new

conjugate direction:

βk =|rk+1|2− rk+1rk

µk, (2.10a)

pk+1 = rk+1 +βk pk. (2.10b)

7b. If 4k ≥ 0.75 then reduce the scale parameter: λk = 0.5λk else a reduction

in the error is not possible: λk = λk,success = f alse.

8. If4k < 0.25 then increase the scale parameter: λk = 4λk

9. If the steepest descent direction rk 6= 0 then set k = k+1 and go to 2 else termi-

nate and return wk+1 as the desired weights.

Generally, there are many algorithms which can be used in back-propagation

learning. The difference between these training algorithms is the way how they adjust

Page 30: the performance of soft computing techniques on content-based sms ...

19

the weights of the network (including biases) (Haykin, 2009). It is difficult to know

which training algorithm will produce the best accuracy for a given problem. The

most widely used algorithm is the gradient descent algorithm. The standard gradient

descent algorithm (Rumelhart et al., 1986) is generally very slow because it requires

small learning rates for stable learning; momentum variation is introduced to improve

the convergence of the standard algorithm by increasing the learning rates while main-

taining stability, but it is still too slow for many practical applications (Demuth et al.,

2008). The Levenberg-Marquardt training algorithm, due to (Levenberg, 1944) and

(Marquardt, 1963), is a relatively fast algorithm with small and medium sized net-

works (up to several hundred weights) but it needs enough memory (Demuth et al.,

2008). If memory is a problem, then the SCG algorithm is recommended (Demuth

et al., 2008). It is fast, has relatively modest memory requirements and performs well

over a wide variety of problems (Demuth et al., 2008). For these reasons, the SCG

algorithm is used to train multilayer perceptron in this research.

2.4 Support Vector Machine

Support Vector Machine (SVM) is a classification method for linear and nonlinear data.

It is presented by (Boser, Guyon & Vapnik, 1992) based on early work on statistical

learning theory (Vapnik & Chervonenkis, 1971). SVMs have been applied success-

fully in many real world problems such as handwritten characters and digit recognition,

image classification, object recognition, speaker identification and text categorization

(Cristianini & Shawe-Taylor, 2000; Han et al., 2011). The general idea of SVM is

separating two data sets with maximum distance between them. The following subsec-

tions relate the SVM concepts.

Page 31: the performance of soft computing techniques on content-based sms ...

20

Figure 2.3: Linearly separable training data (Han et al., 2011)

2.4.1 Support Vector Classification

Assume that a two-class classification problem is given as follows. Let D be the data set

which is given as (X1,y1),(X2,y2), ...,(X|D|,y|D|), where Xi is the set of training samples

with associated class labels, yi ∈{−1,+1}. It is also assumed that each training sample

is represented using two attributes A1 and A2, as shown in Figure 2.3. From the figure,

it can be seen that the training data are linearly separable, because a straight line can

be drawn to separate the two classes.

As shown in Figure 2.3, there are many numbers of straight lines that could

be drawn to separate the two classes. However, the best straight line (or generally

best hyperplane, regardless of the number of input attributes) is the one that has the

minimum classification error on unseen samples. Therefore, in order to find the best

hyperplane SVM searches for the maximum marginal hyperplane. In Figure 2.4, two

possible separating hyperplanes are shown with their associated margins. Both of these

hyperplanes can classify all the given data samples correctly, but the hyperplane with

the largest margin in Figure 2.4(b) is expected to be more generalized because the

largest margins gives the largest separation between classes.

The shortest distance from the maximum marginal hyperplane to one side of its

margin is equal to the shortest distance from the maximum marginal hyperplane to the

Page 32: the performance of soft computing techniques on content-based sms ...

21

(a)

(b)

Figure 2.4: Possible separating hyperplanes with their associated margins (Han et al.,2011)

closest training sample of either class. For linear classification, the two classes and the

maximum marginal hyperplane separating them can be identified by Eq. (2.11a) and

Eq. (2.11b),

wXi +b ≥ 1, yi = 1 (2.11a)

wXi +b ≤ −1, yi =−1 (2.11b)

in which w is a weight vector and b is a bias. By combining Eq. (2.11a) and Eq. (2.11b),

yi× (wXi +b) ≥ 1, i = 1, ..., l (2.12)

in which l denotes the number of support vectors. The samples that lie on the margin

are called support vectors. Figure 2.5 shows one support vector from each class which

Page 33: the performance of soft computing techniques on content-based sms ...

22

Figure 2.5: Support vectors (Han et al., 2011)

is encircled with a thicker border.

The distance between the support vectors can be given by Eq. (2.13):

d =2‖w‖

(2.13)

where ‖w‖ is the Euclidean norm of w.

Better separation between the two classes can be achieved by maximizing d. In

order to maximize d while making sure that all the training samples are on the correct

side of the hyperplane, w must be minimized. This problem can be solved using the

Lagrange function L = f −αg, where f is the function that will be minimized which

is equal to 12‖w‖

2 a simplification of the weight vector calculation, α is the Lagrange

multiplier and g is the constraint which is equal to yi(wXi + b)− 1. Therefore by

applying the Lagrange function, it yields:

L(w,b,α) =12‖w‖2−

l

∑i=1

αi[yi(wXi +b)−1] (2.14)

Eq. (2.14) can be solved by minimizing according to w and b, maximizing ac-

cording to αi ≥ 0 values, w can be obtained in Eq. (2.15):

w =l

∑i=1

αiyiXi, αi ≥ 0 (2.15)

Page 34: the performance of soft computing techniques on content-based sms ...

23

(a) (b) (c)

Figure 2.6: Two class nonlinear separable problem

According to ∑li=1 αi · yi = 0, αi ≥ 0,Eq. (2.15) can be replaced into the La-

grangian formula Eq. (2.14) which yields:

L(w,b,α) =l

∑i=1

αi−12

l

∑i=1

l

∑j=1

αiα jyiy jXi ·X j, αi ≥ 0 (2.16)

An upper bound for the Lagrangian multipliers is introduced by C, where

0 ≤ α ≤ C. C is called a penalty parameter of the error term and is best determined

experimentally.

Next, the sample’s class can be obtained by Eq. (2.17):

y(XT ) = Sign((l

∑i=1

αiyiXi)XT +b), 0≤ αi ≤C (2.17)

If the data is not linearly separable, as in Figure 2.6(a), it means that no straight

line can be drawn to separate the two classes. In such cases a simple conversion of

feature space is needed. Point Xi in the input space is mapped to a feature space with

a higher dimension using a nonlinear mapping function φ(.) then a linear separation is

retried in the new space as shown in Figure 2.6(b). Optimum separation is provided by

a nonlinear separating surface in the original space as shown in Figure 2.6(c).

The dot product computation involved in Eq. (2.16) and Eq. (2.17) is expensive

to compute for the transformed data samples. Thus, kernel function K(Xi,X) = (φ(Xi) ·

φ(X)) is used so that all calculations are made in the input space. Many kernels have

been proposed by researchers such as Linear, Polynomial, Sigmoid and Radial Basis

Page 35: the performance of soft computing techniques on content-based sms ...

24

Function kernels (Hsu, Chang & Lin, 2003).

2.4.2 Radial Basis Function Kernel

There are no rules to determine which kernel will produce the most accurate perfor-

mance and also no large difference between the kernel’s resulting accuracy (Han et al.,

2011). However, the Radial Basis Function (RBF) kernel is suggested to be the first

choice due to several reasons (Hsu et al., 2003). Firstly, the ability of the RBF ker-

nel to handle the nonlinear relation between class labels and attributes by mapping

the samples into a higher dimensional space nonlinearly. Furthermore, linear and sig-

moid kernels can be a special case of RBF kernel for certain parameters. Secondly, the

complexity of the RBF kernel is less than the polynomial kernel because more hyper

parameters are found in the polynomial kernel than in the RBF kernel. Finally, there

are fewer numerical difficulties found in the RBF kernel as compared to the polyno-

mial and sigmoid kernels. However, if the number of features is very large, then the

linear kernel is a more suitable choice than the RBF kernel (Hsu et al., 2003). For these

reasons and because the number of features is small in this research, the RBF kernel

function is selected and it is given by:

K(Xi,X) = e−γ‖Xi−X‖2,γ > 0 (2.18)

in which γ controls the bell’s aperture.

2.5 Content-based Filtering for SMS Spam

Numerous content-based filtering studies in the literature have proposed different types

of models such as the Bayesian variations, C4.5 variations, Graph-Based Model, L-

BFGS algorithm and SVM variations (Hidalgo et al., 2006; Maier & Ferens, 2009;

Khemapatapan, 2010; Almeida et al., 2011; Rafique & Abulaish, 2012; Sohn et al.,

Page 36: the performance of soft computing techniques on content-based sms ...

58

REFERENCES

Almeida, T., Hidalgo, J.M.G. & Silva, T.P. (2013). Towards SMS spam filtering: Re-

sults under a new dataset. International Journal of Information Security Science,

2(1), pp. 1–18.

Almeida, T.A., Hidalgo, J.M.G. & Yamakami, A. (2011). Contributions to the study

of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM

symposium on Document engineering, ACM, pp. 259–262.

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository

[http://archive.ics.uci.edu/ml].Irvine, CA: University of California, School of

Information and Computer Science.

Basheer, I. & Hajmeer, M. (2000). Artificial neural networks: fundamentals, comput-

ing, design, and application. Journal of microbiological methods, 43(1), pp. 3–31.

Bishop, C.M. (1995). Neural Networks for Pattern Recognition. New York, NY, USA:

Oxford University Press, Inc., ISBN 0198538642.

Boser, B.E., Guyon, I.M. & Vapnik, V.N. (1992). A training algorithm for optimal

margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational

Learning Theory, COLT ’92, New York, NY, USA: ACM, ISBN 0-89791-497-X,

pp. 144–152.

Bueti, S. (2005). International Telecommunication Union: WSIS Thematic Meeting

on Cybersecurity, ITU Survey on Anti-Spam Legislation Worldwide.

Page 37: the performance of soft computing techniques on content-based sms ...

59

Chang, C.C. & Lin, C.J. (2011). LIBSVM: A library for support vector machines. ACM

Transactions on Intelligent Systems and Technology, 2, pp. 27:1–27:27, software

available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Chen, T. & Kan, M.Y. (2013). Creating a live, public short message service corpus:

The nus sms corpus. Language Resources and Evaluation, 47(2), pp. 299–335.

Cisco (2005). SMS SPAM AND FRAUD PREVENTION.

Cristianini, N. & Shawe-Taylor, J. (2000). An introduction to support vector ma-

chines and other kernel-based learning methods. Cambridge university press, ISBN

9780521780193.

Delany, S.J., Buckley, M. & Greene, D. (2012). SMS spam filtering: methods and data.

Expert Systems with Applications, 39(10), pp. 9899–9908.

Demuth, H., Beale, M. & Hagan, M. (2008). Neural network toolbox 6. User’s guide.

Deng, W.W. & Peng, H. (2006). Research on a naive bayesian based short message fil-

tering system. In: Machine learning and cybernetics, 2006 international conference

on, IEEE, pp. 1233–1237.

El-Alfy, E.S. & Al-Qunaieer, F.S. (2008). A fuzzy similarity approach for auto-

mated spam filtering. In: Computer Systems and Applications, 2008. AICCSA 2008.

IEEE/ACS International Conference on, IEEE, pp. 544–550.

GooglePlay (n.d.). Retrieved on December 2, 2014 from https://play.google.

com/store.

GSMA (2011). SMS Spam and Mobile Messaging Attacks: Introduction, Trends And

Examples. GSMA: Spam Reporting Service.

GSMA (2013). GSMA: The Mobile Economy 2013.

Guzella, T.S. & Caminhas, W.M. (2009). A review of machine learning approaches to

spam filtering. Expert Systems with Applications, 36(7), pp. 10206–10222.

Page 38: the performance of soft computing techniques on content-based sms ...

60

Han, J., Kamber, M. & Pei, J. (2011). Data mining: concepts and techniques. Morgan

kaufmann.

Hart, K. (2008). Advertising Sent To Cellphones Opens New Front In

War on Spam. Washington post. Retrieved on December 2, 2014 from

http://articles.washingtonpost.com/2008-03-10/news/36788122_1_

text-messages-spam-e-mail-block-texts.

Haykin, S. (2009). Neural networks and learning machines, volume 3. Pearson Edu-

cation Upper Saddle River.

Hidalgo, J., Bringas, G. & Sánz, E. (2006). Content based SMS spam filtering. In:

Proceedings of the 2006 ACM symposium on Document engineering, ACM, pp. 107–

114.

Hsu, C.W., Chang, C.C. & Lin, C.J. (2003). A practical guide to support vector classi-

fication.

Japkowicz, N. & Shah, M. (2011). Evaluating Learning Algorithms. Cambridge Uni-

versity Press.

Joe, I. & Shim, H. (2010). An sms spam filtering system using support vector machine.

In: Future Generation Information Technology, Springer, pp. 577–584.

Jurman, G., Riccadonna, S. & Furlanello, C. (2012). A comparison of mcc and cen

error measures in multi-class prediction. PloS one, 7(8), p. e41882.

Karami, A. & Zhou, L. (2014). Improving static sms spam detection by using new

content-based features.

Kharif, O. (2012). Mobile Spam Texts Hit 4.5 Billion Raising Con-

sumer Ire. Bloomberg Business Week News. Retrieved on December

2, 2014 from http://www.businessweek.com/news/2012-04-30/

mobile-spam-texts-hit-4-dot-5-billion-raising-consumer-ire.

Page 39: the performance of soft computing techniques on content-based sms ...

61

Khemapatapan, C. (2010). Thai-english spam sms filtering. In: Communications

(APCC), 2010 16th Asia-Pacific Conference on, IEEE, pp. 226–230.

Klir, G.J. & Yuan, B. (1995). Fuzzy Sets and Fuzzy Logic: Theory and Applications.

Upper Saddle River, NJ, USA: Prentice-Hall, Inc., ISBN 0-13-101171-5.

Kotsiantis, S., Kanellopoulos, D. & Pintelas, P. (2006). Data preprocessing for super-

vised leaning. International Journal of Computer Science, 1(2), pp. 111–117.

Levenberg, K. (1944). A method for the solution of certain problems in least squares.

Quarterly of applied mathematics, 2, pp. 164–168.

Liu, G. & Yang, F. (2012). The application of data mining in the classification of

spam messages. In: Computer Science and Information Processing (CSIP), 2012

International Conference on, IEEE, pp. 1315–1317.

Liu, Y., Loh, H.T. & Sun, A. (2009). Imbalanced text classification: A term weighting

approach. Expert systems with Applications, 36(1), pp. 690–701.

Mahmoud, T.M. & Mahfouz, A.M. (2012). SMS spam filtering technique based on

artificial immune system. IJCSI International Journal of Computer Science Issues,

9(1).

Maier, J. & Ferens, K. (2009). Classification of english phrases and sms text messages

using bayes and support vector machine classifiers. In: Electrical and Computer

Engineering, 2009. CCECE’09. Canadian Conference on, IEEE, pp. 415–418.

Marquardt, D.W. (1963). An algorithm for least-squares estimation of nonlinear pa-

rameters. Journal of the Society for Industrial & Applied Mathematics, 11(2), pp.

431–441.

Mathew, K. & Issac, B. (2011). Intelligent spam classification for mobile text mes-

sage. In: Computer Science and Network Technology (ICCSNT), 2011 International

Conference on, volume 1, IEEE, pp. 101–105.

Page 40: the performance of soft computing techniques on content-based sms ...

62

Møller, M.F. (1993). A scaled conjugate gradient algorithm for fast supervised learn-

ing. Neural networks, 6(4), pp. 525–533.

Mujtaba, G. & Yasin, M. (2014). SMS Spam Detection Using Simple Message Content

Features.

Murynets, I. & Jover, R.P. (2012). Crime Scene Investigation : SMS Spam Data Anal-

ysis Categories and Subject Descriptors, pp. 441–452.

Najadat, H., Abdulla, N., Abooraig, R. & Nawasrah, S. (2014). Mobile SMS Spam

Filtering based on Mixing Classifiers.

Nuruzzaman, M.T., Lee, C., bin Abdullah, M.F.A. & Choi, D. (2012). Simple SMS

spam filtering on independent mobile phone. Security and Communication Networks,

5(10), pp. 1209–1220.

Nuruzzaman, M.T., Lee, C. & Choi, D. (2011). Independent and personal SMS spam

filtering. In: Computer and Information Technology (CIT), 2011 IEEE 11th Interna-

tional Conference on, IEEE, pp. 429–435.

Ogura, H., Amano, H. & Kondo, M. (2011). Comparison of metrics for feature selec-

tion in imbalanced text classification. Expert Systems with Applications, 38(5), pp.

4978–4989.

Özkan, C. & Erbek, F.S. (2003). The comparison of activation functions for multi-

spectral landsat tm image classification. Photogrammetric Engineering & Remote

Sensing, 69(11), pp. 1225–1234.

Porter, M.F. (1980). An algorithm for suffix stripping. Program: electronic library and

information systems, 14(3), pp. 130–137.

Rafique, M.Z. & Abulaish, M. (2012). Graph-based learning model for detection of

SMS spam on smart phones. In: Wireless Communications and Mobile Computing

Conference (IWCMC), 2012 8th International, IEEE, pp. 1046–1051.

Rocchio, J.J. (1971). Relevance feedback in information retrieval.

Page 41: the performance of soft computing techniques on content-based sms ...

63

Rumelhart, D.E., McClelland, J.L. & PDP Research Group, C. (1986). Parallel Dis-

tributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foun-

dations. Cambridge, MA, USA: MIT Press, ISBN 0-262-68053-X.

Salton, G., Wong, A. & Yang, C.S. (1975). A vector space model for automatic index-

ing. Communications of the ACM, 18(11), pp. 613–620.

Samarasinghe, S. (2006). Neural networks for applied sciences and engineering: from

fundamentals to complex pattern recognition. CRC Press.

Shahi, T.B. & Yadav, A. (2013). Mobile SMS Spam Filtering for Nepali Text Using

Naïve Bayesian and Support Vector Machine. International Journal of Intelligence

Science, 4, p. 24.

Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y. & Wang, Z. (2007). A novel feature

selection algorithm for text categorization. Expert Systems with Applications, 33(1),

pp. 1–5.

Shukla, A., Tiwari, R. & Kala, R. (2012). Real life applications of soft computing.

CRC Press.

Sohn, D.N., Lee, J.T., Han, K.S. & Rim, H.C. (2012). Content-based mobile spam clas-

sification using stylistically motivated features. Pattern Recognition Letters, 33(3),

pp. 364–369.

Sohn, D.N., Lee, J.T. & Rim, H.C. (2009). The contribution of stylistic information

to content-based mobile spam filtering. In: Proceedings of the ACL-IJCNLP 2009

Conference Short Papers, Association for Computational Linguistics, pp. 321–324.

Tagg, C. (2009). A corpus linguistics study of SMS text messaging. Ph.D. thesis, Uni-

versity of Birmingham.

Uysal, A.K., Gunal, S., Ergin, S. & Gunal, E.S. (2012). A novel framework for SMS

spam filtering. In: Innovations in Intelligent Systems and Applications (INISTA),

2012 International Symposium on, IEEE, pp. 1–4.

Page 42: the performance of soft computing techniques on content-based sms ...

64

Vapnik, V.N. & Chervonenkis, A.Y. (1971). On the uniform convergence of relative

frequencies of events to their probabilities. Theory of Probability & Its Applications,

16(2), pp. 264–280.

Widyantoro, D.H. & Yen, J. (2000). A fuzzy similarity approach in text classification

task. In: IEEE International conference on fuzzy systems, volume 2, pp. 653–658.

Yadav, K., Kumaraguru, P., Goyal, A., Gupta, A. & Naik, V. (2011). SMSAssassin:

Crowdsourcing driven mobile-based system for SMS spam filtering. In: Proceedings

of the 12th Workshop on Mobile Computing Systems and Applications, ACM, pp. 1–

6.

Zadeh, L.A. (1965). Fuzzy sets. Information and control, 8(3), pp. 338–353.

Zadeh, L.A. (1994). Fuzzy logic, neural networks, and soft computing. Communica-

tions of the ACM, 37(3), pp. 77–84.

Zhang, J., Li, X., Xu, W. & Li, C. (2011). Filtering algorithm of spam short messages

based on artificial immune system. In: Electrical and Control Engineering (ICECE),

2011 International Conference on, IEEE, pp. 195–198.

Zimmermann, H.J. (2010). Fuzzy set theory. Wiley Interdisciplinary Reviews: Compu-

tational Statistics, 2(3), pp. 317–332.

Zobel, J. & Moffat, A. (1998). Exploring the similarity space. In: ACM SIGIR Forum,

volume 32, ACM, pp. 18–34.