Data Mining- RICHA
-
Upload
richa-mishra -
Category
Documents
-
view
227 -
download
0
Transcript of Data Mining- RICHA
-
8/6/2019 Data Mining- RICHA
1/23
-
8/6/2019 Data Mining- RICHA
2/23
Process of extracting hidden patterns fromdata.
Process of semi automatically analysinglarge data warehouses to find out thepatterns that are
Valid
Novel
Useful
-
8/6/2019 Data Mining- RICHA
3/23
DATA MINING WORKS WITH WAREHOUSEDATA
Data mining Data warehousesIntegrated with
Data mining can generate new business by providing:
Automated prediction of trends and behaviors Data mining usesdata on past promotional mailings to identify the targets most likelymaximize returns on investment in future mailings.Eg.- Targeted marketing.
Automated discovery of previously unknown patterns Analysis ofretails sales data to identify seemingly unrelated products that areoften purchased together.
-
8/6/2019 Data Mining- RICHA
4/23
In the business environment, certain models are
proposed as standard process for data mining. CRISP (Cross-Industry Standard Process for data
mining) was one such model. According to this-
-
8/6/2019 Data Mining- RICHA
5/23
Also known as Knowledge Discovery in Databases (KDD)
Data mining is ready for application in the businesscommunity because it is supported by three technologiesthat are now sufficiently mature:
Massive data collection.
Powerful multiprocessor computers.
Data mining algorithms.
-
8/6/2019 Data Mining- RICHA
6/23
must be fully integrated with a data warehouse as well asflexible interactive business analysis tools.
resulting analytic data warehouse can be applied toimprove business processes throughout the organization,in areas such as promotional campaign management,fraud detection, new product rollout.
The Data Mining Server must be integrated with the data
warehouse and the OLA
P server to embed ROI-focusedbusiness analysis directly into this infrastructure.
-
8/6/2019 Data Mining- RICHA
7/23
data warehousing is a process of organizing the storage of large,multivariate data sets in a way that facilitates the retrieval of
information for analytic purposes.
The ideal starting point is a data warehouse containing acombination of internal data tracking all customer contact coupledwith external market data about competitor activity.
OLAP (On-Line Analytical Processing) refers to technology thatallows users of multidimensional databases to generate on-linedescriptive or comparative summaries of data and other analyticqueries.
An OLAP server enables a more sophisticated end-user businessmodel to be applied when navigating the data warehouse.
-
8/6/2019 Data Mining- RICHA
8/23
AN ARCHITECTURE FOR DATA MINING
-
8/6/2019 Data Mining- RICHA
9/23
BASIC TECHNIQUE:
The basic principle used in this is Modeling.
Modeling - Act of building a model in one
situation where we know the answer & thenapplying it to another situations that are unknown.
Different techniques under modeling are-Neural networks
ClusteringGenetic algorithms
Decision trees
Support vector machine
-
8/6/2019 Data Mining- RICHA
10/23
STEPS IN KDD
-
8/6/2019 Data Mining- RICHA
11/23
STEPS IN KDD EXPLAINEDu
Step 1: Explorationincludes Preprocessing of data, data
transformations etc. Preprocessing is the removal of Noise from large
data set. Transformation is reduction of data in to Feature
vectors.
FV Summarized version of raw data observations.
-
8/6/2019 Data Mining- RICHA
12/23
Next,
Feature vector
Training set Test set
-
8/6/2019 Data Mining- RICHA
13/23
Step 2 : Model building
This stage involves considering various models andchoosing the best one based on their predictiveperformance.
Main concept is - applying different models to the samedata set and then comparing their performance to choosethe best.
uses 2 types of processes.
Predictive
Descriptive
-
8/6/2019 Data Mining- RICHA
14/23
Stage 3: Deployment
involves using the model selected asbest in the previous stage and applying it to new data inorder to generate predictions or estimates of theexpected outcome.
-
8/6/2019 Data Mining- RICHA
15/23
PREDICTIVE METHODS :
Regression: (linear or any other polynomial)
a*x1 + b*x2 + c = Ci.
Nearest neighour
Decision tree classifier: divide decision space intopiecewise constant regions.
Probabilistic/generative models
Neural networks: partition by non-linear boundaries
-
8/6/2019 Data Mining- RICHA
16/23
NEURAL NETWORKS IN DEPTH:
Learning from existing data.
Are analytic techniques modeled after the processes oflearning in the cognitive system and the neurological functionsof the brain and capable of predicting new observations from
other observations.
Architecture includes 3 layers-
1) Input layer
2) Hidden layer ( may consist of many process layers)
3) Output layer
Layers are interconnected with weighted connectors.
-
8/6/2019 Data Mining- RICHA
17/23
-
8/6/2019 Data Mining- RICHA
18/23
network is then subjected to the process of "training.
neurons apply an iterative process to the number ofinputs (variables) to adjust the weights of the network inorder to optimally predict the sample data on which the
"training" is performed.
After the phase of learning from an existing data set, thenew network is ready and it can then be used to generatepredictions.
Useful for learning complex data like handwriting,speech and image recognition
-
8/6/2019 Data Mining- RICHA
19/23
DESCRIPTIVE METHODS:
Clustering / similarity matching:
Hierarchical clustering - agglomerative and divisive
Partitioned clustering- K mean and EM
Association rules and variants
Deviation detection
-
8/6/2019 Data Mining- RICHA
20/23
DATA MINING IN USE
The US Government uses Data Mining to track fraud.
ASupermarket becomes an information broker.
Basketball teams use it to track game strategy. Cross Selling.
Target Marketing.
Holding on to Good Customers.
Weeding out Bad Customers.
-
8/6/2019 Data Mining- RICHA
21/23
APPLICATION AREAS:
Industry Application
Finance Credit Card Analysis
Telecommunication Call record analysis
Consumer goods promotion analysis
Insurance Claims, Fraud Analysis
-
8/6/2019 Data Mining- RICHA
22/23
CONCLUSION:
The concept ofData Mining is becoming increasinglypopular as a business information management tool.
It is expected to reveal knowledge structures that canguide decisions in conditions of limited certainty.
Recently, there has been increased interest indeveloping new analytic techniques specifically designed
to address the issues relevant to business Data Mining(e.g., Classification Trees).
-
8/6/2019 Data Mining- RICHA
23/23