Part II
Tools for
Knowledge Discovery
Knowledge Discovery in Databases
Chapter 5
5.1 A KDD Process Model
Figure 5.1 A seven-step KDD process model
Step 3: Data Preprocessing
CleansedData
Step 2: Create Target Data
DataWarehouse
TargetData
Step 1: Goal Identification
DefinedGoals
Step 4: Data Transformation
TransformedData
Step 7: Taking Action
Step 6: Interpretation & EvaluationStep 5: Data Mining
DataModel
Transactional
Database
FlatFile
Figure 5.2 Applyiing the scientific method to data mining
The Scientific Method
Define the Problem
A KDD Process Model
Take Action
Interpretation / Evaluation
Create Target DataData PreprocessingData TransformationData Mining
Identify the Goal
Verifiy Conclusions
Draw Conclusions
Perform an Experiment
Formulate a Hypothesis
{
Step 1: Goal Identification
• Define the Problem.
• Choose a Data Mining Tool.
• Estimate Project Cost.
• Estimate Project Completion Time.
• Address Legal Issues.
• Develop a Maintenance Plan.
Step 2: Creating a Target Dataset
Figure 5.3 The Acme credit card database
Step 3: Data Preprocessing
• Noisy Data
• Missing Data
Noisy Data
• Locate Duplicate Records.
• Locate Incorrect Attribute Values.
• Smooth Data.
Preprocessing Missing Data
• Discard Records With Missing Values.
• Replace Missing Real-valued Items With the Class Mean.
• Replace Missing Values With Values Found Within Highly Similar Instances.
Processing Missing Data While Learning
• Ignore Missing Values.
• Treat Missing Values As Equal Compares.
• Treat Missing values As Unequal Compares.
Step 4: Data Transformation
• Data Normalization
• Data Type Conversion
• Attribute and Instance Selection
Data Normalization
• Decimal Scaling
• Min-Max Normalization
• Normalization using Z-scores
• Logarithmic Normalization
Attribute and Instance Selection
• Eliminating Attributes
• Creating Attributes
• Instance Selection
Table 5.1 • An Initial Population for Genetic Attribute Selection
Population Income Magazine Watch Credit CardElement Range Promotion Promotion Insurance Sex Age
1 1 0 0 1 1 12 0 0 0 1 0 13 0 0 0 0 1 1
Step 5: Data Mining
1. Choose training and test data.
2. Designate a set of input attributes.
3. If learning is supervised, choose one or more output attributes.
4. Select learning parameter values.
5. Invoke the data mining tool.
Step 6: Interpretation and Evaluation
• Statistical analysis.
• Heuristic analysis.
• Experimental analysis.
• Human analysis.
Step 7: Taking Action
• Create a report.
• Relocate retail items.
• Mail promotional information.
• Detect fraud.
• Fund new research.
5.9 The Crisp-DM Process Model
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
5.10 Experimenting with ESX
A Four-Step Model for Knowledge Discovery
1. Identify the goal.
2. Prepare the data.
3. Apply data mining.
4. Interpret and evaluate the results.
Experiment 1: Attribute Evaluation
*Applying the Four-Step Process Model to the Credit Screening
Dataset*
Table 5.2 • A Confusion Matrix for Credit Card Screening
Computed ComputedAccept Reject
Accept 115 38Reject 35 152
Table 5.3 • Test Set Results for a Most Typical Training Model
Computed ComputedAccept Reject
Accept 98 55Reject 25 162
Experiment 2: Parameter Evaluation
*Applying the Four-Step Process Model to the Satellite Image
Dataset*
Figure 5.4 Satellite image data
Top Related