CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

22
CSc288 Term Project Data mining on predict Voice- over-IP Phones market ----- Huaqin Xu

Transcript of CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Page 1: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

CSc288 Term Project Data mining on predict Voice-over-IP Phones market

----- Huaqin Xu

Page 2: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Agenda

Abstract Introduction Methodology Result Conclusion Learning Experience References

Page 3: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Abstract

This project based on the VoIP survey data sets. Weka explorer’s classifiers are chosen as data mining tool to build models to predict potential customers of VoIP phone and the most important features and services of two VoIP models.

Page 4: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Introduction

BackgroundVoIP phone has a potential opportunity with

the wide use of internet service.Two VoIP phone models: Basic & Deluxe

Data mining ScopeCustomerProduct features and services

Page 5: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Methodology

Data Mining Tools C4.5/C5.0, Cubist Weka Microsoft SQL Server SPSS

Chose: Weka Explorer

Why? Free, Easy, Good Interface, More choices……

Page 6: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Methodology

Explorer Vs KnowledgeFlow

Page 7: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Methodology

Datasets: Totally: 94 instances

Page 8: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Methodology Preprocessing

Split table Customer: 17 attributes Basic-model: 14 attributes Deluxe-model: 10 attributes

Processing Missing data Delete Replaced by “?”

Transfer data typeSPSS Excel Weka

Page 9: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Methodology

Algorithm selectionClassification ClusteringAssociation

Chose: NNgeWhy?

High accuracy rate Simple, clear Rules

Algorithms Correct Instances (%)

Naivebayes 63.82

DecisionStump 65.95

Id3 84.04

J48 75.53

NBTree 79.78

ConjunctiveRule 69.14

DecisionTable 80.85

NNge 87.23

OneR 71.27

PART 72.34

Prism 88.29

Ridor 71.27

JRip 74.46

ZeroR 63.83

AdaBoostM1 65.95

BayesNet 60.63

Page 10: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

NNge classifier Nearest-neighbor like algorithm using

non-nested generalized exemplars. a rule based classifier builds a sort of “hypergeometric” model. shows promise as an ML method that

performs well on a wide range of datasets

Methodology

Page 11: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Result

Page 12: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Result

Page 13: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Result Rules:

One of customer rules :class Would_Buy IF :

cost in {10-20} ^ phone in {yes} ^ email in {yes} ^ fax in {no} ^ chat in {yes,no} ^ other in {no} ^ service type in {Phone_cards_only} ^ price in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ voice_quality in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ service in {Somewhat_Dissatisfied} ^ convenience in {Somewhat_Satisfied} ^ promotion in {Somewhat_Dissatisfied} ^ Know VoIP in {yes,no} ^ marital status in {Single} ^ gender in {Male} (11)

Page 14: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Result Stat:

Classes allocation Feature weights

Page 15: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Result Basic-model & Deluxe-model

Schema: meta.AttributeSelectedClassifier

Subschema: rules.NNge

Selected attributes: 3,6,8,10,11,12 : 6

Why?avoid overfitting

Page 16: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Result

Evaluation

Ten-fold cross-validation Summary

Correctly classified instances > 85% Detailed Accuracy By Class

TP, FP, Precision, Recall, F measure Confusion Matrix

Misclassified instances:12 instances/94 instances

Page 17: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Result

Page 18: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Conclusion

LimitationSmall Datasets Incomplete Data source

ModelsHigh accuracy rateHelp further Market AnalysisHelp product design

Page 19: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Learning Experience

Process a real data mining problem Know Classification algorithms better

Numeric, Nominal Missing data Overfitting

Know Evaluation methods better How to compare algorithms Evaluation factors

Page 20: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Learning Experience

Learn how to use WekaFuture work: learn how to modify source to

perform better data mining Learn from classmates

Page 21: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

References

”Data Mining - Concepts and Techniques" by Jiawei Han and Micheline Kamber, Morgan Kaufmann 2001. 

“Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” by Ian H. Witten and Eibe Frank, Morgan Kaufmann 2000. 

http://www.cs.waikato.ac.nz/~ml/index.html. Machine Learning---Weka Home Page

Marketing Research by David A. Aaker, V. Kumer and George S. Day, eighth edition, Willey 2004.

Page 22: CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Thank you