Data Mining - University of...
Transcript of Data Mining - University of...
![Page 1: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/1.jpg)
Data Mining
Dr. Saed SayadUniversity of Toronto
2010
1http://chem-eng.utoronto.ca/~datamining/
![Page 2: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/2.jpg)
Data Mining
http://chem-eng.utoronto.ca/~datamining/ 2
Data mining is about explaining the past and predicting the
future by means of data analysis.
![Page 3: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/3.jpg)
http://chem-eng.utoronto.ca/~datamining/ 3
AI &Machine Learning
Statistics
Data Mining
Database & DW
Data Mining
![Page 4: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/4.jpg)
http://chem-eng.utoronto.ca/~datamining/ 4
0 10 20 30 40 50 60
Gambling
Entertainment/ Music
Investment / Stocks
Junk email / Anti-spam
Security / Anti-terrorism
Travel/Hospitality
Web
Biotech/Genomics
e-Commerce
Other
Government applications
Medical/ Pharma
Health care/ HR
Science
Manufacturing
Telecom
Insurance
Retail
Fraud Detection
Direct Marketing/ Fundraising
Credit Scoring
Banking
CRM
Data Mining Applications
Source: KDnuggets.com
![Page 5: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/5.jpg)
http://chem-eng.utoronto.ca/~datamining/ 5
much higher20%
somewhat higher
30%
about the same41%
somewhat lower4%
much lower5%
Data mining activity in 2007 compare to 2006
Source: KDnuggets.com
![Page 6: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/6.jpg)
Data Mining Steps
1 • Problem Definition
2 • Data Preparation
3 • Data Exploration
4 • Modeling
5 • Evaluation
6 • Deployment
http://chem-eng.utoronto.ca/~datamining/ 6
![Page 7: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/7.jpg)
CRISP-DM Process ModelCRoss-Industry Standard Process for Data Mining
http://chem-eng.utoronto.ca/~datamining/ 7
Source: http://www.crisp-dm.org/Process/index.htm
![Page 8: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/8.jpg)
1. Problem Definition
http://chem-eng.utoronto.ca/~datamining/ 8
Understanding the project objectives and requirements from a business perspective and then converting this knowledge into a data mining problem definition with a preliminary plan designed to achieve the objectives.
Source: http://www.crisp-dm.org/Process/index.htm
![Page 9: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/9.jpg)
2. Data Preparation
Modeling Data
DataText
Data DSN
ETL
http://chem-eng.utoronto.ca/~datamining/ 9
![Page 10: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/10.jpg)
3. Data Exploration
Data Exploration
UnivariateAnalysis
Average, StDev, Min, Max, ...
Bar, Line, Pie, ...
Charts
Bivariate Analysis
Correlation
Z test, ...
Combination Charts
http://chem-eng.utoronto.ca/~datamining/ 10
![Page 11: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/11.jpg)
Data Exploration - Univariate
http://chem-eng.utoronto.ca/~datamining/ 11
![Page 12: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/12.jpg)
Data Exploration - Bivariate
http://chem-eng.utoronto.ca/~datamining/ 12
![Page 13: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/13.jpg)
4. Modeling
Classification
Bayesian
Decision Tree
Logistic Regression
SVM
Regression
Linear Regression
Robust Regression
Neural Network
Clustering
Hierarchical
K-Means
Association
A Priori
http://chem-eng.utoronto.ca/~datamining/ 13
![Page 14: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/14.jpg)
Data Mining: Classification & Regression
http://chem-eng.utoronto.ca/~datamining/ 14
Frequency
Table
OneR
Bayesian
Decision Tree
Markov Chains
HMM
Covariance
Matrix
Linear
Regression
LDA
(Z Score)
PCA/PCR
Logistic
Regression
Robust Regression
Similarity
Functions
KNN
Neural
Networks
Perceptron
Back
Propagation
RBF
Others
SVM
GA
Scalable Methods
![Page 15: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/15.jpg)
Modeling - Classification
http://chem-eng.utoronto.ca/~datamining/ 15
fAge Responder
e.g., Y or N
![Page 16: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/16.jpg)
Modeling - Regression
http://chem-eng.utoronto.ca/~datamining/ 16
fAge AmountPurchased
e.g., $350
![Page 17: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/17.jpg)
Modeling - Clustering
http://chem-eng.utoronto.ca/~datamining/ 17
Age
Income
![Page 18: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/18.jpg)
Association Rules
http://chem-eng.utoronto.ca/~datamining/ 18
Market Basket Analysis
![Page 19: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/19.jpg)
5. Evaluation
Charts Stats
Variables Contribution
Mean Square Error
Confusion Matrix
K-S Chart
Lift Chart
Gain Chart
http://chem-eng.utoronto.ca/~datamining/ 19
![Page 20: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/20.jpg)
Evaluation - Confusion Matrix
http://chem-eng.utoronto.ca/~datamining/ 20
True
Positive
False
Positive
False
Negative
True
Negative
CM
Positive Cases Negative Cases
Pre
dic
ted
Po
siti
veP
red
icte
d
Neg
ativ
e
![Page 21: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/21.jpg)
Evaluation – Gain Chart
http://chem-eng.utoronto.ca/~datamining/ 21
Population%
50%10%
100%
100%
45%
10%
Responder%
![Page 22: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/22.jpg)
6. Deployment
SQL VB
JAVA HTML
http://chem-eng.utoronto.ca/~datamining/ 22
![Page 23: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/23.jpg)
Data Mining Team
Modeler
AnalystDBA
http://chem-eng.utoronto.ca/~datamining/ 23
DomainExpert
![Page 24: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/24.jpg)
Data Mining Software Vendors
http://chem-eng.utoronto.ca/~datamining/ 24
Data Mining
SAS
KXEN
KNIMEAngoss
SPSS
![Page 25: Data Mining - University of Torontochem-eng.utoronto.ca/~datamining/Presentations/DM_Overview.pdf · Data Mining datamining/ 2 Data mining is about explaining the past and predicting](https://reader031.fdocuments.in/reader031/viewer/2022021508/5a711df77f8b9aac538c8999/html5/thumbnails/25.jpg)
Case Study...
http://chem-eng.utoronto.ca/~datamining/ 25