Data Preprocessing - University of Iowauser.engineering.uiowa.edu/~coneng/lectures/Lecture_DM...2...
Transcript of Data Preprocessing - University of Iowauser.engineering.uiowa.edu/~coneng/lectures/Lecture_DM...2...
-
1
The University of Iowa Intelligent Systems Laboratory
Data Mining(Knowledge discovery in database)
The University of Iowa Intelligent Systems Laboratory
What is Data Mining?
• Data Mining: "The non trivial extraction of implicit, previously unknown,
and potentially useful information from data"
– William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus
• Data mining finds valuable information hidden in large volumes of data.
• Data Mining process involves:
– Databases
– Statistics
– Machine Learning
– High Performance Computing
– Visualization
– Mathematics
The University of Iowa Intelligent Systems Laboratory
Data mining: Basic steps
Feature
Produ
ctP
rodu
ctP
rodu
ctP
rodu
ctP
rodu
ctP
rodu
ctP
rodu
ctP
rodu
ctP
rodu
ctP
rodu
ct
FeatureFeatureFeatureFeatureFeatureFeatureFeatureFeatureFeature
Selection
Preprocessing
Mining
Data
Selected
Data
Preprocessed
Data
Knowledge
Classification,
clustering, association
analysis etc.
Handling missing
values, outlier
removal, labeling
output, feature
selection
Data sampling
The University of Iowa Intelligent Systems Laboratory
Mining tasks
• Classification: YES, NO
• Regression: Predict the actual output value
• Clustering: „Grouping identical items‟
• Association rules: Identifying association among input
attributes.
• Anomaly detection: Detect deviation from normal behavior
{Milk} --> {Coke}, {Diaper, Milk} --> {Beer}
Fraud detection, Network intrusions
Recommender systems e.g. Netflix
Weather forecasting
Patient diagnosis
-
2
The University of Iowa Intelligent Systems Laboratory
Data Mining Software
• Enterprise-level: (US $10,000 and more)– Fair Isaac, IBM, Insightful, KXEN, Oracle, SAS, and SPSS
• Department-level: (from $1,000 to $9,999)– Angoss, CART/MARS/TreeNet/Random Forests, Equbits,
GhostMiner, Gornik, Mineset, MATLAB, Megaputer, Microsoft
SQL Server, Statsoft Statistica, ThinkAnalytics
• Personal-level: (from $1 to $999):– Excel, See5, MATLAB
• Free:– C4.5, R, Weka, Xelopes
The University of Iowa Intelligent Systems Laboratory
Polling results
The University of Iowa Intelligent Systems Laboratory
Data Mining: WEKA
The University of Iowa Intelligent Systems Laboratory
Outline
•Data preparation
•Preprocessing and “arff” files
•Filters, classifiers, and visualization
•Attribute selection
•Training and testing
•Quality measurements
•Interpretation of results
-
3
The University of Iowa Intelligent Systems Laboratory
Data Mining Procedures
• Prepare the data into desired formats
• Preprocess the data if necessary
• Select different algorithms based on
application or domain expertise
• Evaluate the results and repeat experiments
again if necessary
The University of Iowa Intelligent Systems Laboratory
Data Sets
The University of Iowa Intelligent Systems Laboratory
Arff file
@relation weather@attribute No real@attribute outlook {sunny,overcast,rainy}@attribute temperature real@attribute humidity real@attribute windy {TRUE,FALSE}@attribute play {yes,no}
The University of Iowa Intelligent Systems Laboratory
Arff file
-
4
The University of Iowa Intelligent Systems Laboratory
Arff file
The University of Iowa Intelligent Systems Laboratory
Arff file
The University of Iowa Intelligent Systems Laboratory
WEKA
The University of Iowa Intelligent Systems Laboratory
WEKA Explorer
-
5
The University of Iowa Intelligent Systems Laboratory
WEKA
The University of Iowa Intelligent Systems Laboratory
WEKA
The University of Iowa Intelligent Systems Laboratory
Parameter Distribution
The University of Iowa Intelligent Systems Laboratory
Filters
-
6
The University of Iowa Intelligent Systems Laboratory
Filters
The University of Iowa Intelligent Systems Laboratory
Filters
The University of Iowa Intelligent Systems Laboratory
Filters
The University of Iowa Intelligent Systems Laboratory
Filters
-
7
The University of Iowa Intelligent Systems Laboratory
Filters
The University of Iowa Intelligent Systems Laboratory
Classifier
The University of Iowa Intelligent Systems Laboratory
Classifier
The University of Iowa Intelligent Systems Laboratory
Classifiers
-
8
The University of Iowa Intelligent Systems Laboratory
Classifiers
The University of Iowa Intelligent Systems Laboratory
Classifiers
The University of Iowa Intelligent Systems Laboratory
Decision Tree
The University of Iowa Intelligent Systems Laboratory
Decision Tree
-
9
The University of Iowa Intelligent Systems Laboratory
Decision Tree
The University of Iowa Intelligent Systems Laboratory
Decision Tree
The University of Iowa Intelligent Systems Laboratory
Decision Tree
The University of Iowa Intelligent Systems Laboratory
Decision Tree
-
10
The University of Iowa Intelligent Systems Laboratory
Classifier: PART
The University of Iowa Intelligent Systems Laboratory
Classifier: PART
The University of Iowa Intelligent Systems Laboratory
Classifier: PART
The University of Iowa Intelligent Systems Laboratory
Classifier: PART
-
11
The University of Iowa Intelligent Systems Laboratory
Neural Networks
The University of Iowa Intelligent Systems Laboratory
Neural Networks
The University of Iowa Intelligent Systems Laboratory
Neural Networks
The University of Iowa Intelligent Systems Laboratory
Association Rules
-
12
The University of Iowa Intelligent Systems Laboratory
Association Rules
The University of Iowa Intelligent Systems Laboratory
Association Rules
The University of Iowa Intelligent Systems Laboratory
Visualization
The University of Iowa Intelligent Systems Laboratory
Visualization
-
13
The University of Iowa Intelligent Systems Laboratory
Clustering
The University of Iowa Intelligent Systems Laboratory
Clustering
The University of Iowa Intelligent Systems Laboratory
Clustering
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
-
14
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
-
15
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
The University of Iowa Intelligent Systems Laboratory
Test Data Analysis
-
16
The University of Iowa Intelligent Systems Laboratory
UCI repository
•http://archive.ics.uci.edu/ml/index.html
http://archive.ics.uci.edu/ml/index.html