Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

15
Data Mining Techniques Cluster Analysis • Induction Neural Networks • OLAP Data Visualization

Transcript of Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Page 1: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Data Mining Techniques

• Cluster Analysis• Induction• Neural Networks• OLAP• Data Visualization

Page 2: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Association Rule

• An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database.

• Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items.

• The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.

Page 3: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Support

The support of an item set S is the percentage of those transactions in T which contain S.

• If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.

Page 4: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Confidence

• Confidence of a candidate rule X Y is calculated as support(XY) / support(X).

• The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y

Page 5: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Example: Association Rule

• In a store we might have I={cheese,ham,bread,butter,salt,coke}

• A transaction could look like: t={bread,butter} for a customer who bought cheese and coke.

• An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.

Page 6: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Apriori Algorithm

• Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets.

• Use the frequent itemsets to generate the desired rules.

Page 7: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Apriori Algorithm(cont’d)

Pass 1

1. Generate the candidate itemsets in C1

2. Save the frequent itemsets in L1 Pass k 1. Generate the candidate itemsets in Ck from the frequent

itemsets in Lk-1

2. Join Lk-1 with Lk-1, as follows: insert into Ck select p.item1, q.item1, . . . , p.itemk-1, q.itemk-1 from Lk-1 p, Lk-1q where p.item1 = q.item1, . . . , p.itemk-1 < q.itemk-1

Page 8: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Apriori Algorithm(cont’d)

3. Generate all (k-1)-subsets from the candidate itemsets in Ck

4. Prune all candidate itemsets from Ck where some

(k-1)-subset of the candidate itemset is not in the frequent itemset Lk-1

2. Scan the transaction database to determine the support for each candidate itemset in Ck

3. Save the frequent itemsets in Lk

Page 9: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Smart Web Search Agents• Data Search Engines >> Information Search Agents

- Traditional searching on the Web is done using one of the following three:

- Directories (Yahoo, Lycos, etc)

- Search Engines (AltaVista, NorthernLight, etc)

- Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc)

All of these involve keyword searches; Drawback: not easily personalized,

too many results (although many give relevancy factors)

Page 10: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

- local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!)

- local cache information base (containing mined information and discovered knowledge for efficient personal use)

- domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)

Page 11: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Intelligent Tools for E-Business

• Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems

• Learning Algorithms, Heuristic Searching

• Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery

• Prediction & Time Series Analysis

• Information Retrieval, Intelligent User Interface

• Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems

Page 12: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Enhancing E-Business Process Through Data Mining

• Quality of discovered knowledge

– Having right data

– Having appropriate data mining tools!!!

D a ta M in in g( Kn o w led g e d is c o v er y )

D AT A W ar eh o u s e

D AT A W ar eh o u s e

D AT A W ar eh o u s e

F ailu r e P atte r n s

Su cces s P at t ern s

F A IL U R E P at t ern s

SU C C E SS P at t ern s

• Traditional Data Mining Tools

– Simple query and reporting

– Visualization driven data exploration tools, OLAP

– Discovery process is user driven

Page 13: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Intelligent Data Mining Tools

• Automate the process of discovering patterns/knowledge in data

• Require hypothesis, exploration• Derive business knowledge (patterns) from data• Combine business knowledge of users with

results of discovery algorithms

D AT A W ar eh o u s e

D AT A W ar eh o u s e

D AT A W ar eh o u s e

F ailu r e P a tte r n s

Su cces s P at t ern s

F A IL U R E P at t ern s

SU C C E SS P at t ern s

Page 14: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Intelligent Information Agents

• The Data Mining Problem:– Clustering/ Classification– Association– Sequencing

• Viewed as an Optimization Problem

• Tools: Genetic Algorithms

Page 15: Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Fuzzy Rules Discovering• Rules discovering : The discovery of

associations between business events, i.e. which items are purchased together

• In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge

• Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query

• Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data