Data Mining – analyse Bank Marketing Data Set
-
Upload
mateusz-brzoska -
Category
Documents
-
view
155 -
download
9
Transcript of Data Mining – analyse Bank Marketing Data Set
1
Data Mining – analyse Bank Marketing Data Set
by WEKA.
EXPLORATORY PROJECT BY
MATEUSZ BRZOSKA
MIDDLESEX UNIVERSITY 2015
2
Abstract / Aims / Objectives
Aims To study techniques and
methodologies in data mining
To analyse a data set of interest for clustering, classification, learning dependencies and prediction
To process the data and achieve the final satisfactory result
Objectives To study Knowledge Discovery in
Database (KDD)
To understand the need for analyses of large, complex, information - rich data sets
To provide essential information and demonstrate relevant algorithms onto techniques
3
Bank Marketing Data Set
“The data is come from marketing campaigns of a Portuguese banking
institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was
required, in order to access if the product (bank term deposit) would be
('yes') or not ('no') subscribed.“
41188 instances / 11 inputs
predict if the client will subscribe (yes/no) a term deposit
4Knowledge
Discovery in Databases
The KDD process consists of the following steps (see the picture):
Selection of data which are relevant to the analysis task
Preprocessing of these data, including tasks like data cleaning and data integration
Transformation of the data into forms appropriate for mining
Application of Data Mining algorithms for the extraction of patterns
Interpretation/evaluation of the generated patterns so as to identify those patterns that represent real knowledge, based on some interestingness measures.
5
Data Mining Overview
"sink" in the electronic data
data mining technology can extract knowledge
efficiently and rationally utilize the data collected in the knowledge
"a process of automatic discovery of non-trivial, previously unknown, potentially useful rules, dependencies, patterns, similarities and trends in large data repositories."
6
Data Mining Methods
Discovering association rules
methods of discovering interesting relationship or correlation
Classification and prediction
includes methods for discovering models (classifiers)
Grouping (cluster analysis,
clustering)
finding the classes of finite sets of objects with similar characteristics
7
WEKA Software automatically make predictions
help people make decisions faster and more accurately
freely available for download
the most popular used data mining systems
the tools can be used in many different data mining task
discovering knowledge from Bank Marketing Data Set through:- classification- clustering- association rules
8
Visualization of Data Set and Examining Data
You can Visualize the attributes based on selected class.
9Data Mining – Classification
(OneR, J48, Naive Bayes)
method of data analysis
assign an object (data) to one of the predefined classes based on a set of attributes that describe the object
the purpose of classification is the prediction
the most popular classification algorithms: Decision Trees (J48), Naive Bayes, Bayesian Networks, OneR
10Discovering potentially useful patterns from a data set- classification algorithms
OneROneR generate a one-level decision tree. The rules are
simple to understand but also less accurate.
Deposit = YES (AGE)If 64.5 – 66.5If 75.5 – 80.5
If more than 88.5Deposit = NO (AGE)
If less than 64.5If 66.5 – 75.5If 80.5 – 88.5
J48Divides the original data set
relative to each variable. Creates many variants of the
division.
Deposit = YESAge > 60
Job = retiredEducation = basic.4y
Marital = marriedLoan = no
Housing = yes
Naïve BayesAssign a new case to one of the
classes.
Attribute NO YESAGE 40 41JOB Admin
MARITAL MarriedEDUCATION University degree
DEFAULT NoHOUSING Yes
LOAN NoCONTRACT Cellular
MONTH MayDAY OF WEEK
Monday Thursday
11Data Mining – Clustering
(SimpleKMeans)
a process of grouping objects in a class called clusters
definitions of the concept of the cluster:- a set of objects that are "similar“- a set of objects such that the distance between any two objects belonging to the cluster that is less than the distance between any object
algorithm SimpleKMeans as an example in WEKA
12Discovering potentially useful patterns from a data set
- clustering algorithm
Represent the group with the centroid for the documents that belong to this group.Membership in the group is determined by finding the most similar group centroid for each
document.
SimpleKMeans
13Data Mining - Association
(Rules Function|Apriori)
Association Rule is an unsupervised data mining function
It finds rules associated with frequently co-occurring items
It gives rules that explain how items or events are associated with each other
Apriori algorithm to discover co-occurring items.
14Discovering potentially useful patterns from a data set- association algorithm
AprioriApriori finds rules with support greater than a specified minimum support and confidence
greater than a specified minimum confidence.
1. marital=married contact=telephone month=may 5454 ==> y=no 5283 conf:(0.97) 2. marital=married loan=no contact=telephone month=may 4511 ==> y=no 4367 conf:(0.97) 3. contact=telephone month=may 8251 ==> y=no 7979 conf:(0.97) 4. loan=no contact=telephone month=may 6819 ==> y=no 6593 conf:(0.97) 5. default=no contact=telephone month=may 5726 ==> y=no 5533 conf:(0.97) 6. default=no loan=no contact=telephone month=may 4749 ==> y=no 4587 conf:(0.97) 7. month=aug y=no 5523 ==> contact=cellular 5290 conf:(0.96) 8. month=aug 6178 ==> contact=cellular 5909 conf:(0.96) 9. loan=no month=aug y=no 4562 ==> contact=cellular 4362 conf:(0.96) 10. loan=no month=aug 5120 ==> contact=cellular 4890 conf:(0.96)
15
Conclusion
Analysis shows information about techniques
and methodologies in data mining, also Knowledge Discovery Database
analyses a big dataset
provides essential information and demonstrate relevant algorithms onto techniques
Results knowledge which is potentially
useful;
the computer search engines already provide the best results in gaining of specific goals;
WEKA helped to collect certain rules;
process the data and achieve the final satisfactory result
16
Results
Will subscribe term deposit YES
AGE >65JOB: services, blue-collar, technician, entrepreneur
MARITAL: married
EDUCATION: basic.9y, basic.6y, high.school
DEFAULT: unknown (has credit in default)
HOUSING: no (has housing loan)
LOAN: there is no big difference (has personal loan)
CONTACT: telephone
MONTH: may, jun, jul, agu, nov
DAY OF WEEK: mon, fri
Will subscribe term deposit NO
AGE <65JOB: admin, student, unemployed, retired
MARITAL: single
EDUCATION: university degree, unknown
DEFAULT: no (has credit in default)
HOUSING: yes (has housing loan)
LOAN: there is no big difference (has personal loan)
CONTACT: cellular
MONTH: oct, sep, dec, mar, apr
DAY OF WEEK: tue, wed, thu
Who want that data?marketing companies / banking institutions