Data Mining- RICHA

download Data Mining- RICHA

of 23

Transcript of Data Mining- RICHA

  • 8/6/2019 Data Mining- RICHA

    1/23

  • 8/6/2019 Data Mining- RICHA

    2/23

    Process of extracting hidden patterns fromdata.

    Process of semi automatically analysinglarge data warehouses to find out thepatterns that are

    Valid

    Novel

    Useful

  • 8/6/2019 Data Mining- RICHA

    3/23

    DATA MINING WORKS WITH WAREHOUSEDATA

    Data mining Data warehousesIntegrated with

    Data mining can generate new business by providing:

    Automated prediction of trends and behaviors Data mining usesdata on past promotional mailings to identify the targets most likelymaximize returns on investment in future mailings.Eg.- Targeted marketing.

    Automated discovery of previously unknown patterns Analysis ofretails sales data to identify seemingly unrelated products that areoften purchased together.

  • 8/6/2019 Data Mining- RICHA

    4/23

    In the business environment, certain models are

    proposed as standard process for data mining. CRISP (Cross-Industry Standard Process for data

    mining) was one such model. According to this-

  • 8/6/2019 Data Mining- RICHA

    5/23

    Also known as Knowledge Discovery in Databases (KDD)

    Data mining is ready for application in the businesscommunity because it is supported by three technologiesthat are now sufficiently mature:

    Massive data collection.

    Powerful multiprocessor computers.

    Data mining algorithms.

  • 8/6/2019 Data Mining- RICHA

    6/23

    must be fully integrated with a data warehouse as well asflexible interactive business analysis tools.

    resulting analytic data warehouse can be applied toimprove business processes throughout the organization,in areas such as promotional campaign management,fraud detection, new product rollout.

    The Data Mining Server must be integrated with the data

    warehouse and the OLA

    P server to embed ROI-focusedbusiness analysis directly into this infrastructure.

  • 8/6/2019 Data Mining- RICHA

    7/23

    data warehousing is a process of organizing the storage of large,multivariate data sets in a way that facilitates the retrieval of

    information for analytic purposes.

    The ideal starting point is a data warehouse containing acombination of internal data tracking all customer contact coupledwith external market data about competitor activity.

    OLAP (On-Line Analytical Processing) refers to technology thatallows users of multidimensional databases to generate on-linedescriptive or comparative summaries of data and other analyticqueries.

    An OLAP server enables a more sophisticated end-user businessmodel to be applied when navigating the data warehouse.

  • 8/6/2019 Data Mining- RICHA

    8/23

    AN ARCHITECTURE FOR DATA MINING

  • 8/6/2019 Data Mining- RICHA

    9/23

    BASIC TECHNIQUE:

    The basic principle used in this is Modeling.

    Modeling - Act of building a model in one

    situation where we know the answer & thenapplying it to another situations that are unknown.

    Different techniques under modeling are-Neural networks

    ClusteringGenetic algorithms

    Decision trees

    Support vector machine

  • 8/6/2019 Data Mining- RICHA

    10/23

    STEPS IN KDD

  • 8/6/2019 Data Mining- RICHA

    11/23

    STEPS IN KDD EXPLAINEDu

    Step 1: Explorationincludes Preprocessing of data, data

    transformations etc. Preprocessing is the removal of Noise from large

    data set. Transformation is reduction of data in to Feature

    vectors.

    FV Summarized version of raw data observations.

  • 8/6/2019 Data Mining- RICHA

    12/23

    Next,

    Feature vector

    Training set Test set

  • 8/6/2019 Data Mining- RICHA

    13/23

    Step 2 : Model building

    This stage involves considering various models andchoosing the best one based on their predictiveperformance.

    Main concept is - applying different models to the samedata set and then comparing their performance to choosethe best.

    uses 2 types of processes.

    Predictive

    Descriptive

  • 8/6/2019 Data Mining- RICHA

    14/23

    Stage 3: Deployment

    involves using the model selected asbest in the previous stage and applying it to new data inorder to generate predictions or estimates of theexpected outcome.

  • 8/6/2019 Data Mining- RICHA

    15/23

    PREDICTIVE METHODS :

    Regression: (linear or any other polynomial)

    a*x1 + b*x2 + c = Ci.

    Nearest neighour

    Decision tree classifier: divide decision space intopiecewise constant regions.

    Probabilistic/generative models

    Neural networks: partition by non-linear boundaries

  • 8/6/2019 Data Mining- RICHA

    16/23

    NEURAL NETWORKS IN DEPTH:

    Learning from existing data.

    Are analytic techniques modeled after the processes oflearning in the cognitive system and the neurological functionsof the brain and capable of predicting new observations from

    other observations.

    Architecture includes 3 layers-

    1) Input layer

    2) Hidden layer ( may consist of many process layers)

    3) Output layer

    Layers are interconnected with weighted connectors.

  • 8/6/2019 Data Mining- RICHA

    17/23

  • 8/6/2019 Data Mining- RICHA

    18/23

    network is then subjected to the process of "training.

    neurons apply an iterative process to the number ofinputs (variables) to adjust the weights of the network inorder to optimally predict the sample data on which the

    "training" is performed.

    After the phase of learning from an existing data set, thenew network is ready and it can then be used to generatepredictions.

    Useful for learning complex data like handwriting,speech and image recognition

  • 8/6/2019 Data Mining- RICHA

    19/23

    DESCRIPTIVE METHODS:

    Clustering / similarity matching:

    Hierarchical clustering - agglomerative and divisive

    Partitioned clustering- K mean and EM

    Association rules and variants

    Deviation detection

  • 8/6/2019 Data Mining- RICHA

    20/23

    DATA MINING IN USE

    The US Government uses Data Mining to track fraud.

    ASupermarket becomes an information broker.

    Basketball teams use it to track game strategy. Cross Selling.

    Target Marketing.

    Holding on to Good Customers.

    Weeding out Bad Customers.

  • 8/6/2019 Data Mining- RICHA

    21/23

    APPLICATION AREAS:

    Industry Application

    Finance Credit Card Analysis

    Telecommunication Call record analysis

    Consumer goods promotion analysis

    Insurance Claims, Fraud Analysis

  • 8/6/2019 Data Mining- RICHA

    22/23

    CONCLUSION:

    The concept ofData Mining is becoming increasinglypopular as a business information management tool.

    It is expected to reveal knowledge structures that canguide decisions in conditions of limited certainty.

    Recently, there has been increased interest indeveloping new analytic techniques specifically designed

    to address the issues relevant to business Data Mining(e.g., Classification Trees).

  • 8/6/2019 Data Mining- RICHA

    23/23