amit vaghela

download amit vaghela

of 16

Transcript of amit vaghela

  • 8/9/2019 amit vaghela

    1/16

  • 8/9/2019 amit vaghela

    2/16

    ` Data mining is the principle of sorting through

    large amounts of data and picking out relevant

    information.

    ` In other words, data mining is Extraction of

    interesting patterns or knowledge from huge

    amount of data.

  • 8/9/2019 amit vaghela

    3/16

    ` The knowledge discovery process (KDP), also

    called knowledge discovery in databases,

    seeks new knowledge in some applicationdomain.

    ` It is defined as the process of identifying valid,

    novel, potentially useful, and ultimatelyunderstandable patterns in data.

  • 8/9/2019 amit vaghela

    4/16

    1. Developing and understanding the

    application domain:

    This step includes learning the relevant prior

    knowledge and the goals of the end user of thediscovered knowledge.

    2. Creating a target data set:

    This step usually includes querying theexisting data to select the desired subset.

  • 8/9/2019 amit vaghela

    5/16

    3. Data cleaning and preprocessing:

    This step consists of dealing with noise and

    missing values in the data, and accounting fortime sequence information and known changes.

    4. Data reduction and projection:

    This step consists of finding useful attributes byapplying dimension reduction and transformationmethods.

  • 8/9/2019 amit vaghela

    6/16

    5. Choosing the data mining task:

    Here the data miner matches the goals defined

    in Step 1 with a particular DM method.

    6. Choosing the data mining algorithm:

    The data miner selects methods to search for

    patterns in the data and decides which models

    and parameters of the methods used may beappropriate.

  • 8/9/2019 amit vaghela

    7/16

    7. Data mining:

    This step generates patterns in a particularrepresentational form such as classification

    rules, decision trees etc.

    8. Interpreting mined patterns:

    Here the analyst performs visualization of theextracted patterns and models.

    9. Consolidating discovered knowledge:

    The final step consists of incorporating the discoveredknowledge into the performance system, anddocumenting and reporting it

  • 8/9/2019 amit vaghela

    8/16

    3 Different Facet of Data Mining Community:

    ` Client (3):

    Mfg & Supplier ofHospital in North America

    (Innovator).` Developer (3):

    A software firm. Creator of Award Wining Data

    Mining Software.

    ` Consulting Firm (3):

    This firm has earned a position of niche data

    mining consulting firm.

  • 8/9/2019 amit vaghela

    9/16

    Task Domain Identification is fundamental to theeffectiveness of all later phases.

    ` Client:

    The time invested in any KD process is indicative ofboth the direct and opportunity costs of a knowledge-seeking firm, the starting conditions are important.

    ` Client representatives prefer to spend additional time

    early-on defining and specifying the scope of eachtask and its related data requirements.

  • 8/9/2019 amit vaghela

    10/16

    ` Developer:

    The data sets initially used or provided by clients

    may be incomplete or inappropriate at first.

    ` Consulting firm:The consulting firm reviewed the initial data

    through the use of baseline summary statistics

    and checked on redundancy. If necessary,

    cleaning and aggregation techniques were applied

  • 8/9/2019 amit vaghela

    11/16

    ` Once the task domain has been partially structured, thespecification and application of effective KD strategies can beconsidered.

    ` Data mining techniques described as either directed orundirected searches.

    ` Fully directed techniques required the a priori specification ofinputs, outputs, and models.

    ` Less directed techniques, often utilizing step-wise and self-organizing approaches

  • 8/9/2019 amit vaghela

    12/16

    ` Client:

    The Client firm did suggest that the presence of timeconstraints and a desire to provide manageable results

    tends to force a streamlining of the analysis wheneverpossible.

    ` Consulting firm:

    Suggested to gain an entirely structured reduction

    approach through adequate level of understanding of theproblem and available data because of time requirement ofcomplex analyses.

  • 8/9/2019 amit vaghela

    13/16

    ` Developer:

    The Development firm proposed alternate

    approaches. The consideration of simple metrics,such as correlations, was joined with the

    consideration of more complex techniques and

    they also agree with the need of frequent data

    reduction.

  • 8/9/2019 amit vaghela

    14/16

    ` The ultimate phase of the knowledge discovery

    process involved the interpretation of the results

    provided by analyst-specified algorithmic search.

    ` Client:

    A representative from the Client firm claimed that,

    as subsequent evaluations and iterations

    occurred, result is based upon total availableknowledge.

  • 8/9/2019 amit vaghela

    15/16

    ` Development firm:

    Development firm representatives insisted that thediscovery process was a cycle of trial and error.

    ` Consulting firm:As evident from the formality of alternation schemesproposed by the Consulting and Development firms,the relevancy of such issues seems apparent. As

    such, the timing of such alternations may have aprofound impact on the efficacy of the process.

  • 8/9/2019 amit vaghela

    16/16