Date Mining

31
DataMining By Guan Hang Su CS157A section 2 fall 2005
  • date post

    19-Oct-2014
  • Category

    Documents

  • view

    800
  • download

    0

description

 

Transcript of Date Mining

Page 1: Date Mining

DataMining

By

Guan Hang Su

CS157A section 2 fall 2005

Page 2: Date Mining

Outline

Overview

---- Define Data Mining

---- Foundation of Data Mining

---- Scope of Data Mining

---- Techniques in data mining

----Applications

Page 3: Date Mining

What is DataMining?

Discovering “hidden value” in your data warehouse

Page 4: Date Mining

Define Data Mining

The automated extraction of hidden predictive information from (large) databases

Three key words: Automated Hidden Predictive

Implicit is a statistical methodology Data mining lets you be proactive Prospective rather than Retrospective

Page 5: Date Mining

The Foundations of Data Mining

Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.

Page 6: Date Mining

The Foundations of Data Mining (continue)

Data mining is ready for application in

the business community because it is supported by three technologies that are now sufficiently mature:

Massive data collection Powerful multiprocessor computers Data mining algorithms

Page 7: Date Mining

The Scope of Data Mining

Data mining derives its name from the similarities between searching for valuable business information in a large database

Example — finding linked products in gigabytes of

store scanner data and mining a mountain for a vein of valuable ore.

Both processes require either sifting through an

immense amount of material, or intelligently probing it to find exactly where the value resides.

Page 8: Date Mining

The Scope of Data Mining (cont..)

Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities:

Automated prediction of trends and behaviors Automated discovery of previously unknown

patterns.

Page 9: Date Mining

The Scope of Data Mining (cont..)

Automated prediction of trends and behaviors --- Data mining automates the process of finding predictive

information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data.

Typical example of a predictive problem: 1)targeted marketing.

2) forecasting bankruptcy

Page 10: Date Mining

The Scope of Data Mining (cont..)

Automated discovery of previously unknown patterns

---- Data mining tools sweep through databases and identify

previously hidden patterns in one step.

Example of pattern discovery: The analysis of retail sales data to identify seemingly unrelated products that are often

purchased together

Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that

could represent data entry keying errors.

Page 11: Date Mining

Techniques in data mining

The most commonly used techniques in data mining:

Artificial neural networks

Decision trees

Genetic algorithms

Nearest neighbor method

Rule induction

Page 12: Date Mining

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.

Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID)

Page 13: Date Mining

Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.

Nearest Neighbor. A data mining technique that performs prediction by finding the prediction value of records (near neighbors) similar to the record to be predicted.

Page 14: Date Mining

Rule induction: The extraction of useful if-then rules from data based on statistical significance

Other Techniques : Bayesian networks

----- Naïve Bayes Support vector machines Many more…..

Page 15: Date Mining

Decision Trees

Nearest Neighbor classification

Neural Networks

Rule Induction

K-means Clustering

Page 16: Date Mining

Example of Neural Network

Difficult interpretation

Tends to ‘overfit’ the data

Extensive amount of training time

A lot of data preparation

Works with all data types

Output

Hidden layer

Input layer

Page 17: Date Mining

Example of Rule of induction

Description Produces decision trees:

income < $40K job > 5 yrs then good risk job < 5 yrs then bad risk

income > $40K high debt then bad risk low debt then good risk

Or Rule Sets: Rule #1 for good risk:

if income > $40K if low debt

Rule #2 for good risk: if income < $40K if job > 5 years

Page 18: Date Mining

K-Nearest-Neighbor (kNN) Models

Use entire training database as the model Find nearest data point and do the same thing as you did for that record

Very easy to implement. More difficult to use in production. Disadvantage: Huge Models

0 Doses 1000

100

Age

Page 19: Date Mining

Example of Decision Trees

Page 20: Date Mining

How Data Mining Works

How exactly is data mining able to tell you important things that you didn't know or what is going to happen next? The technique that is used to perform these feats in data mining is called modeling.

Modeling is simply the act of building a model in one situation where you know the answer and then applying it to another situation that you don't.

Page 21: Date Mining

Computers are loaded up with lots of information about a variety of situations where an answer is known and then the data mining software on the computer must run through that data and distill the characteristics of the data that should go into the model

Once the model is built it can then be used in similar situations where you don't know the answer

Page 22: Date Mining

Some results of Data Mining

Forecasting what may happen in the future.

Classifying people or things into groups by recognizing patterns.

Clustering people or things into groups based on their attributes.

Sequencing what events are likely to lead to later events

Page 23: Date Mining

Example

For example, say that you are the director of marketing for a telecommunications company and you'd like to acquire some new long distance phone customers.

1)randomly mail out the coupon to general population.

2) or use your business experience stored in

your database to build a model , then choose the right target.

Page 24: Date Mining

Cont..

As the marketing director you have access to a lot of information about all of your customers: their age, sex, credit history and long distance calling usage.

The problem is that you don't know the long distance calling usage of these prospects (since they are most likely now customers of your competition).

Page 25: Date Mining

We 'd like to concentrate on those prospects who have large amounts of long distance usage .We can accomplish this by building a model

  Cust Pros

General information (e.g. demographic data)

Known Known

Proprietary information (e.g. customer transactions)

Known Target

Page 26: Date Mining

For instance, a simple model for a telecommunications company might be:

98% of my customers who make more than $60,000/year spend more than $80/month on long distance.

With this model in hand new customers can be selectively targeted

Page 27: Date Mining

Architecture for Data Mining

To best apply these advanced techniques, they must be fully integrated with a data warehouse as well as flexible interactive business analysis tools.

Many data mining tools currently operate outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data. Furthermore, when new insights require operational implementation, integration with the warehouse simplifies the application of results from data mining.

Page 28: Date Mining

illustrates an architecture for advanced analysis in a large data warehouse

Page 29: Date Mining

Data Mining Applications

The US Drug Enforcement Agency needed to be more effective in their drug “busts”.

Analyzed suspects’ cell phone usage to focus investigations.

Page 30: Date Mining

HSBC need to cross-sell more effectively by identifying profiles that would be interested in higher yielding investments.

Reduced direct mail costs by 30% while

garnering 95% of the campaign’s revenue.

Page 31: Date Mining

Bibliography

http://www.thearling.com/dmintro/dmintro_frame.htm

http://www.thearling.com/text/dwhite/dmwhite.htm

http://www.cs.sjsu.edu/faculty/lee/cs157/25SpL22DataMining.ppt

http://www.oracle.com/technology/products/bi/odm/index.html