Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid Histogram of Visual Words...
-
Upload
so-yeon-kim -
Category
Data & Analytics
-
view
188 -
download
0
Transcript of Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid Histogram of Visual Words...
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyra-mid Histogram of Visual Words Image Descriptor
SO YEON KIM, KYUNG-AH SOHN
DEPARTMENT OF INFORMATION AND COMPUTER ENGINEERING, AJOU UNIVERSITY
ContentsIntroduction
• Motivation and Challenges
Methods• Feature extraction• Database construction• Image classification and evaluation
Results• Dataset• Performance comparison in 5-fold cross validation• Averaged performance comparison in optimal parameter• Misclassified samples in best-performed cluster
Conclusion• Summary and future works
Introduction We want to predict the image spams in our mobile phone.
66 spam images405 non-spam images
377 images for training (80%)
94 images for test (20%)
Too small to train the predictive model Some image spams in e-mail have similar features to mobile phone spams
Methods
Input image
SIFT feature extraction Concatenation of spatial histogramsBag of visual words
…
…
Feature vector
Feature vector
RGB histogram feature extraction
… …
Feature extraction◦ RGB histogram◦ Pyramid Histogram of Visual Words (PHOW) – color mode: gray / RGB / opponent
Methods Database construction
◦ K-means clustering◦ Elkan k-means clustering algorithm◦ K-means++ algorithm for initializing centroids
Mobile phone spam images
E-mail spam images
euclideandistance ma-
trixEmai
l im
age
Phone image
K-meansClustering
Most similaremail images Phone + Email
Spam Image Dataset
Phone im-ages
Feature vector
Feature vector
Methods Database construction
◦ Spectral clustering
Mobile phone spam images
E-mail spam images
Feature vector
Feature vector
euclideandistance matrix
Emai
l im
age
Phone image
…similarity matrix
Emai
l im
age
Email image
Methods Database construction
◦ Spectral clustering
Phone + EmailSpam Image Dataset
Phone im-ages
Spectral clustering(normalized cut)
Methods Image classification and Evaluation
SVM classification
spam
hamPhone + EmailSpam Image Dataset
Training set
Test set
e-mail phone
5-fold cross validation
80%
20%
ResultsDatasetPerformance comparison in 5-fold cross validationAveraged performance comparison in optimal parameterMisclassified samples in best-performed cluster
Results Dataset
◦ Similar sub-set of e-mail spam images from Image Spam Hunter dataset.
Phone E-mail Total
Spam
RGB histogram
66
12 78PHOW-gray 201 267PHOW-RGB 20 86
PHOW-opponent 324 390
Non-spam 405 - 405
Similar sub-set from spectral clustering
Results Performance comparison in 5-fold cross validation
◦ Evaluation measure
Predicted
Spam Non-spam
ActualSpam TP FN
Non-spam FP TN
Confusion matrix
Results Sample e-mail spam images
◦ Those are correctly grouped in the same cluster with PHOW descriptor but in a different one with RGB histogram feature.
PHOW descriptor considers not only color distribution but geometric in-formation of images
Results Averaged performance comparison in optimal parameter
PHOW descriptors outperform than RGB his-togram feature
The color mode of PHOW descriptor doesn’t affect the performance significantly
Results Averaged performance comparison in k-means clustering
RGBHistogram
PHOW(gray)
PHOW(RGB)
PHOW(opponent)
random
10%
Accuracy 73.47% 95.12% 95.54% 94.27% 72.25%
Sensitivity 42.42% 92.42% 92.42% 87.91% 32.03%
Specificity 78.52% 95.56% 96.05% 95.31% 78.81%
F-score 30.73% 84.19% 85.49% 81.15% 24.14%
Results Averaged performance comparison in spectral clustering
RGB
histogramPHOW(gray)
PHOW(RGB)
PHOW(opponent) random
σ=0.3 σ=0.6 10%
Accuracy 81.75% 96.39% 96.82% 96.39% 72.25%
Sensitivity 30.55% 95.45% 87.91% 84.95% 32.03%
Specificity 90.12% 96.54% 98.27% 98.27% 78.81%
F-score 32.31% 88.28% 88.48% 86.76% 24.14%
Results
(a) False positives (FP) (b) False negatives (FN)
Misclassified samples in best-performed cluster
Conclusion We proposed a mobile phone spam image filtering system using a large set of e-mail spam im-ages to solve the problem of insufficient phone spam image data.
The performances on phone spam image classification with RGB histogram and PHOW descrip-tor with various color modes (gray, RGB, opponent) are compared.
PHOW descriptor which considers both geometric and color information can improve the per-formance.
An advanced clustering technique such as spectral clustering has positive impact on improve-ment.