Data Mining – analyse Bank Marketing Data Set

17
Data Mining – analyse Bank Marketing Data Set by WEKA. EXPLORATORY PROJECT BY MATEUSZ BRZOSKA MIDDLESEX UNIVERSITY 2015 1

Transcript of Data Mining – analyse Bank Marketing Data Set

1

Data Mining – analyse Bank Marketing Data Set

by WEKA.

EXPLORATORY PROJECT BY

MATEUSZ BRZOSKA

MIDDLESEX UNIVERSITY 2015

2

Abstract / Aims / Objectives

Aims To study techniques and

methodologies in data mining

To analyse a data set of interest for clustering, classification, learning dependencies and prediction

To process the data and achieve the final satisfactory result

Objectives To study Knowledge Discovery in

Database (KDD)

To understand the need for analyses of large, complex, information - rich data sets

To provide essential information and demonstrate relevant algorithms onto techniques

3

Bank Marketing Data Set

“The data is come from marketing campaigns of a Portuguese banking

institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was

required, in order to access if the product (bank term deposit) would be

('yes') or not ('no') subscribed.“

41188 instances / 11 inputs

predict if the client will subscribe (yes/no) a term deposit

4Knowledge

Discovery in Databases

The KDD process consists of the following steps (see the picture):

Selection of data which are relevant to the analysis task

Preprocessing of these data, including tasks like data cleaning and data integration

Transformation of the data into forms appropriate for mining

Application of Data Mining algorithms for the extraction of patterns

Interpretation/evaluation of the generated patterns so as to identify those patterns that represent real knowledge, based on some interestingness measures.

5

Data Mining Overview

"sink" in the electronic data

data mining technology can extract knowledge

efficiently and rationally utilize the data collected in the knowledge

"a process of automatic discovery of non-trivial, previously unknown, potentially useful rules, dependencies, patterns, similarities and trends in large data repositories."

6

Data Mining Methods

Discovering association rules

methods of discovering interesting relationship or correlation

Classification and prediction

includes methods for discovering models (classifiers)

Grouping (cluster analysis,

clustering)

finding the classes of finite sets of objects with similar characteristics

7

WEKA Software automatically make predictions

help people make decisions faster and more accurately

freely available for download

the most popular used data mining systems

the tools can be used in many different data mining task

discovering knowledge from Bank Marketing Data Set through:- classification- clustering- association rules

8

Visualization of Data Set and Examining Data

You can Visualize the attributes based on selected class.

9Data Mining – Classification

(OneR, J48, Naive Bayes)

method of data analysis

assign an object (data) to one of the predefined classes based on a set of attributes that describe the object

the purpose of classification is the prediction

the most popular classification algorithms: Decision Trees (J48), Naive Bayes, Bayesian Networks, OneR

10Discovering potentially useful patterns from a data set- classification algorithms

OneROneR generate a one-level decision tree. The rules are

simple to understand but also less accurate.

Deposit = YES (AGE)If 64.5 – 66.5If 75.5 – 80.5

If more than 88.5Deposit = NO (AGE)

If less than 64.5If 66.5 – 75.5If 80.5 – 88.5

J48Divides the original data set

relative to each variable. Creates many variants of the

division.

Deposit = YESAge > 60

Job = retiredEducation = basic.4y

Marital = marriedLoan = no

Housing = yes

Naïve BayesAssign a new case to one of the

classes.

Attribute NO YESAGE 40 41JOB Admin

MARITAL MarriedEDUCATION University degree

DEFAULT NoHOUSING Yes

LOAN NoCONTRACT Cellular

MONTH MayDAY OF WEEK

Monday Thursday

11Data Mining – Clustering

(SimpleKMeans)

a process of grouping objects in a class called clusters

definitions of the concept of the cluster:- a set of objects that are "similar“- a set of objects such that the distance between any two objects belonging to the cluster that is less than the distance between any object

algorithm SimpleKMeans as an example in WEKA

12Discovering potentially useful patterns from a data set

- clustering algorithm

Represent the group with the centroid for the documents that belong to this group.Membership in the group is determined by finding the most similar group centroid for each

document.

SimpleKMeans

13Data Mining - Association

(Rules Function|Apriori)

Association Rule is an unsupervised data mining function

It finds rules associated with frequently co-occurring items

It gives rules that explain how items or events are associated with each other

Apriori algorithm to discover co-occurring items.

14Discovering potentially useful patterns from a data set- association algorithm

AprioriApriori finds rules with support greater than a specified minimum support and confidence

greater than a specified minimum confidence.

1. marital=married contact=telephone month=may 5454 ==> y=no 5283 conf:(0.97) 2. marital=married loan=no contact=telephone month=may 4511 ==> y=no 4367 conf:(0.97) 3. contact=telephone month=may 8251 ==> y=no 7979 conf:(0.97) 4. loan=no contact=telephone month=may 6819 ==> y=no 6593 conf:(0.97) 5. default=no contact=telephone month=may 5726 ==> y=no 5533 conf:(0.97) 6. default=no loan=no contact=telephone month=may 4749 ==> y=no 4587 conf:(0.97) 7. month=aug y=no 5523 ==> contact=cellular 5290 conf:(0.96) 8. month=aug 6178 ==> contact=cellular 5909 conf:(0.96) 9. loan=no month=aug y=no 4562 ==> contact=cellular 4362 conf:(0.96) 10. loan=no month=aug 5120 ==> contact=cellular 4890 conf:(0.96)

15

Conclusion

Analysis shows information about techniques

and methodologies in data mining, also Knowledge Discovery Database

analyses a big dataset

provides essential information and demonstrate relevant algorithms onto techniques

Results knowledge which is potentially

useful;

the computer search engines already provide the best results in gaining of specific goals;

WEKA helped to collect certain rules;

process the data and achieve the final satisfactory result

16

Results

Will subscribe term deposit YES

AGE >65JOB: services, blue-collar, technician, entrepreneur

MARITAL: married

EDUCATION: basic.9y, basic.6y, high.school

DEFAULT: unknown (has credit in default)

HOUSING: no (has housing loan)

LOAN: there is no big difference (has personal loan)

CONTACT: telephone

MONTH: may, jun, jul, agu, nov

DAY OF WEEK: mon, fri

Will subscribe term deposit NO

AGE <65JOB: admin, student, unemployed, retired

MARITAL: single

EDUCATION: university degree, unknown

DEFAULT: no (has credit in default)

HOUSING: yes (has housing loan)

LOAN: there is no big difference (has personal loan)

CONTACT: cellular

MONTH: oct, sep, dec, mar, apr

DAY OF WEEK: tue, wed, thu

Who want that data?marketing companies / banking institutions

17

Thank you for listening