MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS...
-
Upload
alicia-patrick -
Category
Documents
-
view
218 -
download
0
Transcript of MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS...
MAKING THE BUSINESS BETTERMAKING THE BUSINESS BETTER
Presented ByPresented By
Mohammed DwikatMohammed Dwikat
DATA MININGDATA MINING
Presented toPresented to
Faculty of ITFaculty of IT
MIS Department MIS Department
An Najah National UniversityAn Najah National University
Mohammed Dwikat Data Mining
What is Data MiningWhat is Data Mining
Exploration & analysis of large quantities of data in order to discover meaningful patternsExtraction useful information from data
Group together similar documents returned by search engine
Mohammed Dwikat Data Mining
What is What is NOTNOT Data Mining Data Mining
Look up phone number in phone directoryQuery a Web search engine for information about “Amazon”Search a customer name in a Bank
Mohammed Dwikat Data Mining
Data Mining TasksData Mining Tasks
Predictive Methods Use some variables to predict unknown or
future values of other variables.
Descriptive Methods Find human-interpretable patterns that
describe the data.
Mohammed Dwikat Data Mining
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Classification [Predictive]
Regression [Predictive]
Deviation Detection [Predictive]
Data Mining TasksData Mining Tasks
Mohammed Dwikat Data Mining
Clustering ExampleClustering Example
Euclidean Distance Based Clustering in 3-D space
Intracluster distancesare minimized
Intracluster distancesare minimized
Intercluster distancesare maximized
Intercluster distancesare maximized
Mohammed Dwikat Data Mining
Market Segmentation: Goal: subdivide a market into distinct subsets of
customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix.
Document Clustering: Goal: To find groups of documents that are similar
to each other based on the important terms appearing in them.
Other Clustering ExamplesOther Clustering Examples
Mohammed Dwikat Data Mining
predict occurrence of an item based on occurrences of other items
Association Rule ExampleAssociation Rule Example
TID Items
1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
Mohammed Dwikat Data Mining
Marketing and Sales Promotion
Supermarket shelf management
Inventory Management
Other Association Rule ExamplesOther Association Rule Examples
Mohammed Dwikat Data Mining
Find rules that predict strong sequential dependencies among different events.
Sequential Pattern Discovery Sequential Pattern Discovery ExampleExample
(A B) (C) (D E)
(A B) (C) (D E)
Mohammed Dwikat Data Mining
Other Sequential Pattern Other Sequential Pattern Discovery ExamplesDiscovery Examples
In telecommunications alarm logs, (Inverter_Problem Excessive_Line_Current)
(Rectifier_Alarm) --> (Fire_Alarm)
In point-of-sale transaction sequences, Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk) Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)
Mohammed Dwikat Data Mining
Classification ExampleClassification Example
Given a collection of records (training set ) Each record contains a set of attributes, one of
the attributes is the class.
Find a model for class attribute as a function of the values of other attributes.
Goal: previously unseen records should be assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model.
Mohammed Dwikat Data Mining
Classification ExampleClassification Example
Tid Refund MaritalStatus
TaxableIncome Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes10
categoric
al
categoric
al
continuous
class
Refund MaritalStatus
TaxableIncome Cheat
No Single 75K ?
Yes Married 50K ?
No Married 150K ?
Yes Divorced 90K ?
No Single 40K ?
No Married 80K ?10
TestSet
Training Set
ModelLearn
Classifier
Mohammed Dwikat Data Mining
Other Classification ExamplesOther Classification Examples
Direct Marketing Reduce cost of mailing by targeting a set of
consumers likely to buy a new cell-phone product
Fraud Detection
Predict fraudulent cases in credit card transactions.
Mohammed Dwikat Data Mining
Regression ExamplesRegression Examples
Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.
Examples: Predicting sales amounts based on advertising
expenditure. Predicting wind velocities as a function of
temperature, humidity, air pressure, etc. Time series prediction of stock market indices.
Mohammed Dwikat Data Mining
Deviation/Anomaly ExampleDeviation/Anomaly ExampleDetect significant deviations from normal behavior
Applications: Credit Card Fraud Detection
Network Intrusion Detection
Mohammed Dwikat Data Mining
Prediction MeasurementPrediction Measurement
Confusion Matrix
Example of confusion matrix
Predicted
Actual
Pass Fail
Pass 9 3
Fail 1 7
True Positive vs. True Negative
False Positive vs. False Negative
Mohammed Dwikat Data Mining
ChallengesChallenges
Distributed Data
Dimensionality
Complex and Heterogeneous Data
Data Quality
Data Ownership and Distribution
Privacy Preservation
Mohammed Dwikat Data Mining
WEKA Free, Simple, Limited
SAS Enterprise Miner Data Miner, Text miner
SPSS Regression, Time Series and more
Data Mining ApplicationsData Mining Applications
Mohammed Dwikat Data Mining
QuestionsQuestions
Thank YouThank You