Direct Mining of Discriminative Patterns for Classifying...

74
Direct Mining of Discriminative Patterns for Classifying Uncertain Data Chuancong Gao , Jianyong Wang Database Laboratory Department of Computer Science and Technology Tsinghua University, Beijing, China C. Gao , J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 1 / 26

Transcript of Direct Mining of Discriminative Patterns for Classifying...

Page 1: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Direct Mining of Discriminative Patterns for ClassifyingUncertain Data

Chuancong Gao, Jianyong Wang

Database LaboratoryDepartment of Computer Science and Technology

Tsinghua University, Beijing, China

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 1 / 26

Page 2: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Certain Data

Example:

A toy example about certain categorical dataset containing 4 classes.Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / -Acceptable / - / /

Good - + / /Very Good / + + +

(+: Good, /: Medium, -: Bad)

A Lot of Methods:

I Decision Tree - C4.5, etc.

I Rule-based Classifier - Ripper, etc.

I Associative Classification (Pattern-based Classification) - CBA,RCBT, HARMONY, DDPMine, MbT, etc. (Better Performance)

I etc.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 2 / 26

Page 3: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Certain Data

Example:

A toy example about certain categorical dataset containing 4 classes.Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / -Acceptable / - / /

Good - + / /Very Good / + + +

(+: Good, /: Medium, -: Bad)

A Lot of Methods:

I Decision Tree - C4.5, etc.

I Rule-based Classifier - Ripper, etc.

I Associative Classification (Pattern-based Classification) - CBA,RCBT, HARMONY, DDPMine, MbT, etc. (Better Performance)

I etc.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 2 / 26

Page 4: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Certain Data

Example:

A toy example about certain categorical dataset containing 4 classes.Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / -Acceptable / - / /

Good - + / /Very Good / + + +

(+: Good, /: Medium, -: Bad)

A Lot of Methods:

I Decision Tree - C4.5, etc.

I Rule-based Classifier - Ripper, etc.

I Associative Classification (Pattern-based Classification) - CBA,RCBT, HARMONY, DDPMine, MbT, etc. (Better Performance)

I etc.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 2 / 26

Page 5: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Certain Data

Example:

A toy example about certain categorical dataset containing 4 classes.Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / -Acceptable / - / /

Good - + / /Very Good / + + +

(+: Good, /: Medium, -: Bad)

A Lot of Methods:

I Decision Tree - C4.5, etc.

I Rule-based Classifier - Ripper, etc.

I Associative Classification (Pattern-based Classification) - CBA,RCBT, HARMONY, DDPMine, MbT, etc. (Better Performance)

I etc.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 2 / 26

Page 6: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Certain Data

Example:

A toy example about certain categorical dataset containing 4 classes.Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / -Acceptable / - / /

Good - + / /Very Good / + + +

(+: Good, /: Medium, -: Bad)

A Lot of Methods:

I Decision Tree - C4.5, etc.

I Rule-based Classifier - Ripper, etc.

I Associative Classification (Pattern-based Classification) - CBA,RCBT, HARMONY, DDPMine, MbT, etc. (Better Performance)

I etc.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 2 / 26

Page 7: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Two-Step Framework:

I Mine a set of frequent patterns.

I Select a subset of most discriminative patterns from the minedpatterns.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 3 / 26

Page 8: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Two-Step Framework:

I Mine a set of frequent patterns.

I Select a subset of most discriminative patterns from the minedpatterns.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 3 / 26

Page 9: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Two-Step Framework:

I Mine a set of frequent patterns.

I Select a subset of most discriminative patterns from the minedpatterns.

One-Step Framework:

I Directly mine a set of most discriminative frequent patterns.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 3 / 26

Page 10: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Two-Step Framework:

I Mine a set of frequent patterns.

I Select a subset of most discriminative patterns from the minedpatterns.

One-Step Framework:

I Directly mine a set of most discriminative frequent patterns.

After Having Discriminative Patterns:

I Convert each pattern to a binary feature: Whether the instancecontains the pattern.

I Train a classifier using the feature data converted from training data.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 3 / 26

Page 11: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Two-Step Framework:

I Mine a set of frequent patterns.

I Select a subset of most discriminative patterns from the minedpatterns.

One-Step Framework:

I Directly mine a set of most discriminative frequent patterns.

After Having Discriminative Patterns:

I Convert each pattern to a binary feature: Whether the instancecontains the pattern.

I Train a classifier using the feature data converted from training data.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 3 / 26

Page 12: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Mainly Differences between Different Algorithms:

I Different types of mined pattern - All, Closed, Generator, etc.(Two-Step Framework)

I Different discriminative measures - Confidence (CBA, HARMONY),Information Gain (DDPMine) (Better Performance), Fisher Score, etc.

I Different instance covering strategies - Sequential Covering(DDPMine), Top-K (RCBT), Search Tree-based Partition (MbT), etc.

I Different classification models - Rule-based (RCBT, HARMONY),SVM (DDPMine, MbT) (Better Performance), Naıve Bayes, etc.

I Different feature types - Binary, Numeric (New, NDPMine)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 4 / 26

Page 13: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Mainly Differences between Different Algorithms:

I Different types of mined pattern - All, Closed, Generator, etc.(Two-Step Framework)

I Different discriminative measures - Confidence (CBA, HARMONY),Information Gain (DDPMine) (Better Performance), Fisher Score, etc.

I Different instance covering strategies - Sequential Covering(DDPMine), Top-K (RCBT), Search Tree-based Partition (MbT), etc.

I Different classification models - Rule-based (RCBT, HARMONY),SVM (DDPMine, MbT) (Better Performance), Naıve Bayes, etc.

I Different feature types - Binary, Numeric (New, NDPMine)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 4 / 26

Page 14: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Mainly Differences between Different Algorithms:

I Different types of mined pattern - All, Closed, Generator, etc.(Two-Step Framework)

I Different discriminative measures - Confidence (CBA, HARMONY),Information Gain (DDPMine) (Better Performance), Fisher Score, etc.

I Different instance covering strategies - Sequential Covering(DDPMine), Top-K (RCBT), Search Tree-based Partition (MbT), etc.

I Different classification models - Rule-based (RCBT, HARMONY),SVM (DDPMine, MbT) (Better Performance), Naıve Bayes, etc.

I Different feature types - Binary, Numeric (New, NDPMine)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 4 / 26

Page 15: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Mainly Differences between Different Algorithms:

I Different types of mined pattern - All, Closed, Generator, etc.(Two-Step Framework)

I Different discriminative measures - Confidence (CBA, HARMONY),Information Gain (DDPMine) (Better Performance), Fisher Score, etc.

I Different instance covering strategies - Sequential Covering(DDPMine), Top-K (RCBT), Search Tree-based Partition (MbT), etc.

I Different classification models - Rule-based (RCBT, HARMONY),SVM (DDPMine, MbT) (Better Performance), Naıve Bayes, etc.

I Different feature types - Binary, Numeric (New, NDPMine)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 4 / 26

Page 16: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Associative Classification

Mainly Differences between Different Algorithms:

I Different types of mined pattern - All, Closed, Generator, etc.(Two-Step Framework)

I Different discriminative measures - Confidence (CBA, HARMONY),Information Gain (DDPMine) (Better Performance), Fisher Score, etc.

I Different instance covering strategies - Sequential Covering(DDPMine), Top-K (RCBT), Search Tree-based Partition (MbT), etc.

I Different classification models - Rule-based (RCBT, HARMONY),SVM (DDPMine, MbT) (Better Performance), Naıve Bayes, etc.

I Different feature types - Binary, Numeric (New, NDPMine)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 4 / 26

Page 17: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Uncertain Data

Example:

A toy example about uncertain categorical dataset. The uncertaintyusually is caused by noise, measurement precisions, etc.

Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / {-: 0.8, /: 0.1, +: 0.1}Acceptable / - / {-: 0.1, /: 0.8, +: 0.1}

Good - + / {-: 0.1, /: 0.8, +: 0.1}Very Good / + + {-: 0.1, /: 0.1, +: 0.8}

(+: Good, /: Medium, -: Bad)

Very Few Methods:

I Uncertain Decision Tree - C4.5-based DTU.

I Uncertain Rule-based Classifier - Ripper-based uRule

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 5 / 26

Page 18: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Uncertain Data

Example:

A toy example about uncertain categorical dataset. The uncertaintyusually is caused by noise, measurement precisions, etc.

Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / {-: 0.8, /: 0.1, +: 0.1}Acceptable / - / {-: 0.1, /: 0.8, +: 0.1}

Good - + / {-: 0.1, /: 0.8, +: 0.1}Very Good / + + {-: 0.1, /: 0.1, +: 0.8}

(+: Good, /: Medium, -: Bad)

Very Few Methods:

I Uncertain Decision Tree - C4.5-based DTU.

I Uncertain Rule-based Classifier - Ripper-based uRule

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 5 / 26

Page 19: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Classification on Uncertain Data

Example:

A toy example about uncertain categorical dataset. The uncertaintyusually is caused by noise, measurement precisions, etc.

Evaluation Price Looking Tech. Spec. Quality

Unacceptable + - / {-: 0.8, /: 0.1, +: 0.1}Acceptable / - / {-: 0.1, /: 0.8, +: 0.1}

Good - + / {-: 0.1, /: 0.8, +: 0.1}Very Good / + + {-: 0.1, /: 0.1, +: 0.8}

(+: Good, /: Medium, -: Bad)

Very Few Methods:

I Uncertain Decision Tree - C4.5-based DTU.

I Uncertain Rule-based Classifier - Ripper-based uRule

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 5 / 26

Page 20: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Our Solution

A new associative classification algorithm working on uncertain data.

Difference to Certain Dataset:Patterns involving uncertain attributes have probabilities to appear ininstances.

Challenges:

I How to represent frequentness information? - Using expected value ofsupport. (Easy to calculate - Sum all the probabilities appearing indifferent instances)

I How to represent discriminative information?

I How to cover instances?

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 6 / 26

Page 21: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Our Solution

A new associative classification algorithm working on uncertain data.

Difference to Certain Dataset:Patterns involving uncertain attributes have probabilities to appear ininstances.

Challenges:

I How to represent frequentness information? - Using expected value ofsupport. (Easy to calculate - Sum all the probabilities appearing indifferent instances)

I How to represent discriminative information?

I How to cover instances?

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 6 / 26

Page 22: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Our Solution

A new associative classification algorithm working on uncertain data.

Difference to Certain Dataset:Patterns involving uncertain attributes have probabilities to appear ininstances.

Challenges:

I How to represent frequentness information? - Using expected value ofsupport. (Easy to calculate - Sum all the probabilities appearing indifferent instances)

I How to represent discriminative information?

I How to cover instances?

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 6 / 26

Page 23: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Our Solution

A new associative classification algorithm working on uncertain data.

Difference to Certain Dataset:Patterns involving uncertain attributes have probabilities to appear ininstances.

Challenges:

I How to represent frequentness information? - Using expected value ofsupport. (Easy to calculate - Sum all the probabilities appearing indifferent instances)

I How to represent discriminative information?

I How to cover instances?

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 6 / 26

Page 24: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Our Solution

A new associative classification algorithm working on uncertain data.

Difference to Certain Dataset:Patterns involving uncertain attributes have probabilities to appear ininstances.

Challenges:

I How to represent frequentness information? - Using expected value ofsupport. (Easy to calculate - Sum all the probabilities appearing indifferent instances)

I How to represent discriminative information?

I How to cover instances?

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 6 / 26

Page 25: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Discriminative Measures on Uncertain Data

Choose to use expected value of confidence. Unlike expected support,expected confidence is hard to calculate.

Definition of Expected Confidence:

Given a set of transactions T and the set of possible worlds W w.r.t. T ,the expected confidence of an itemset x on class c is

E (confxc) =

∑wi∈W

confx ,wic × P(wi ) =

∑wi∈W

supx ,wic

supx ,wi

× P(wi )

where P(wi ) is the probability of world wi . confx ,wic is the respected

confidence of x on class c in world wi , while supx ,wi (supx ,wic) is the

respected support of x (on class c) in world wi .

O(∏

Ak∈Au |domAk||T |) possible worlds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 7 / 26

Page 26: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Discriminative Measures on Uncertain Data

Choose to use expected value of confidence. Unlike expected support,expected confidence is hard to calculate.

Definition of Expected Confidence:

Given a set of transactions T and the set of possible worlds W w.r.t. T ,the expected confidence of an itemset x on class c is

E (confxc) =

∑wi∈W

confx ,wic × P(wi ) =

∑wi∈W

supx ,wic

supx ,wi

× P(wi )

where P(wi ) is the probability of world wi . confx ,wic is the respected

confidence of x on class c in world wi , while supx ,wi (supx ,wic) is the

respected support of x (on class c) in world wi .

O(∏

Ak∈Au |domAk||T |) possible worlds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 7 / 26

Page 27: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Discriminative Measures on Uncertain Data

Choose to use expected value of confidence. Unlike expected support,expected confidence is hard to calculate.

Definition of Expected Confidence:

Given a set of transactions T and the set of possible worlds W w.r.t. T ,the expected confidence of an itemset x on class c is

E (confxc) =

∑wi∈W

confx ,wic × P(wi ) =

∑wi∈W

supx ,wic

supx ,wi

× P(wi )

where P(wi ) is the probability of world wi . confx ,wic is the respected

confidence of x on class c in world wi , while supx ,wi (supx ,wic) is the

respected support of x (on class c) in world wi .

O(∏

Ak∈Au |domAk||T |) possible worlds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 7 / 26

Page 28: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Lemma:Since 0 ≤ supx

c ≤ supx ≤ |T |, we have:

E (confxc) =

∑wi∈W

confx ,wic × P(wi )

=

|T |∑i=0

i∑j=0

j

i× P(supx = i ∧ supx

c = j)

=

|T |∑i=0

Ei (supxc)

i=

|T |∑i=0

Ei (confxc)

, where Ei (supxc) and Ei (confx

c) denote the part of expected support andconfidence of itemset x on class c when supx = i .

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 8 / 26

Page 29: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Given 0 ≤ n ≤ |T |, define En(supxc) =

∑|T |i=0 Ei ,n(supx

c) as the expectedsupport of x on class c on the first n transactions of T , and Ei ,n(supx

c) asthe expected support of x on class c with support of i on the first ntransactions of T .

Denoting P(x ⊆ ti ) as pi for each transaction ti ∈ T , we have

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c)

+ (1− pn)× Ei ,n−1(supxc)

when cn 6= c , and

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c + 1)

+ (1− pn)× Ei ,n−1(supxc)

when cn = c, where 1 ≤ i ≤ n ≤ |T |.

Ei ,n(supxc) = 0

for ∀n where i = 0, or where n < i .

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 9 / 26

Page 30: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Given 0 ≤ n ≤ |T |, define En(supxc) =

∑|T |i=0 Ei ,n(supx

c) as the expectedsupport of x on class c on the first n transactions of T , and Ei ,n(supx

c) asthe expected support of x on class c with support of i on the first ntransactions of T .

Denoting P(x ⊆ ti ) as pi for each transaction ti ∈ T , we have

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c)

+ (1− pn)× Ei ,n−1(supxc)

when cn 6= c

, and

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c + 1)

+ (1− pn)× Ei ,n−1(supxc)

when cn = c, where 1 ≤ i ≤ n ≤ |T |.

Ei ,n(supxc) = 0

for ∀n where i = 0, or where n < i .

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 9 / 26

Page 31: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Given 0 ≤ n ≤ |T |, define En(supxc) =

∑|T |i=0 Ei ,n(supx

c) as the expectedsupport of x on class c on the first n transactions of T , and Ei ,n(supx

c) asthe expected support of x on class c with support of i on the first ntransactions of T .

Denoting P(x ⊆ ti ) as pi for each transaction ti ∈ T , we have

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c)

+ (1− pn)× Ei ,n−1(supxc)

when cn 6= c , and

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c + 1)

+ (1− pn)× Ei ,n−1(supxc)

when cn = c, where 1 ≤ i ≤ n ≤ |T |.

Ei ,n(supxc) = 0

for ∀n where i = 0, or where n < i .

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 9 / 26

Page 32: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Given 0 ≤ n ≤ |T |, define En(supxc) =

∑|T |i=0 Ei ,n(supx

c) as the expectedsupport of x on class c on the first n transactions of T , and Ei ,n(supx

c) asthe expected support of x on class c with support of i on the first ntransactions of T .

Denoting P(x ⊆ ti ) as pi for each transaction ti ∈ T , we have

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c)

+ (1− pn)× Ei ,n−1(supxc)

when cn 6= c , and

Ei ,n(supxc) = pn × Ei−1,n−1(supx

c + 1)

+ (1− pn)× Ei ,n−1(supxc)

when cn = c, where 1 ≤ i ≤ n ≤ |T |.

Ei ,n(supxc) = 0

for ∀n where i = 0, or where n < i .C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 9 / 26

Page 33: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Defining Pi ,n as the probability of x having support of i on the first ntransactions of T , we have

Ei ,n(supxc) = pn × (Ei−1,n−1(supx

c + 1))

+ (1− pn)× Ei ,n−1(supxc)

= pn × (Ei−1,n−1(supxc) + Pi−1,n−1)

+ (1− pn)× Ei ,n−1(supxc)

when cn = c

, since we have:

Ei−1,n−1(supxc + 1) = Ei−1,n−1(supx

c) + Pi−1,n−1

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 10 / 26

Page 34: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Defining Pi ,n as the probability of x having support of i on the first ntransactions of T , we have

Ei ,n(supxc) = pn × (Ei−1,n−1(supx

c + 1))

+ (1− pn)× Ei ,n−1(supxc)

= pn × (Ei−1,n−1(supxc) + Pi−1,n−1)

+ (1− pn)× Ei ,n−1(supxc)

when cn = c , since we have:

Ei−1,n−1(supxc + 1) = Ei−1,n−1(supx

c) + Pi−1,n−1

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 10 / 26

Page 35: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Denoting P(x ⊆ ti ) as pi for each transaction ti ∈ T , we have

Pi ,n = pn × Pi−1,n−1 + (1− pn)× Pi ,n−1

, where 1 ≤ i ≤ n ≤ |T |.

Pi ,n =

{1 for n = 0

Pi ,n−1 × (1− pn) for 1 ≤ n ≤ |T |

where i = 0.Pi ,n = 0

where n < i .

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 11 / 26

Page 36: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Denoting P(x ⊆ ti ) as pi for each transaction ti ∈ T , we have

Pi ,n = pn × Pi−1,n−1 + (1− pn)× Pi ,n−1

, where 1 ≤ i ≤ n ≤ |T |.

Pi ,n =

{1 for n = 0

Pi ,n−1 × (1− pn) for 1 ≤ n ≤ |T |

where i = 0.

Pi ,n = 0

where n < i .

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 11 / 26

Page 37: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Denoting P(x ⊆ ti ) as pi for each transaction ti ∈ T , we have

Pi ,n = pn × Pi−1,n−1 + (1− pn)× Pi ,n−1

, where 1 ≤ i ≤ n ≤ |T |.

Pi ,n =

{1 for n = 0

Pi ,n−1 × (1− pn) for 1 ≤ n ≤ |T |

where i = 0.Pi ,n = 0

where n < i .

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 11 / 26

Page 38: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Since E (confxc) = E|T |(confx

c) =∑|T |

i=0 Ei ,|T |(confxc). The computation

is divided into |T |+ 1 steps with Ei ,|T |(confxc) = Ei ,|T |(supx

c)/i(0 ≤ i ≤ |T |) computed in ith step.

#Transaction / n

Support / i

0 1 |T|

01

|T|

...

...

,| |( )ci T xconfE

1,| | 1( )ci T xconfE − −

2

2

Computation in One Step Start of Next Step Explaination

1,| | ( )ci T xconfE −

| |

,| |0

( ) ( )T

c cx i T x

i

conf cE E onf=

=∑ Time Complexity:

O(|T |2)

Space Complexity:

O(|T |)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 12 / 26

Page 39: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Since E (confxc) = E|T |(confx

c) =∑|T |

i=0 Ei ,|T |(confxc). The computation

is divided into |T |+ 1 steps with Ei ,|T |(confxc) = Ei ,|T |(supx

c)/i(0 ≤ i ≤ |T |) computed in ith step.

#Transaction / n

Support / i

0 1 |T|

01

|T|

...

...

,| |( )ci T xconfE

1,| | 1( )ci T xconfE − −

2

2

Computation in One Step Start of Next Step Explaination

1,| | ( )ci T xconfE −

| |

,| |0

( ) ( )T

c cx i T x

i

conf cE E onf=

=∑

Time Complexity:

O(|T |2)

Space Complexity:

O(|T |)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 12 / 26

Page 40: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Since E (confxc) = E|T |(confx

c) =∑|T |

i=0 Ei ,|T |(confxc). The computation

is divided into |T |+ 1 steps with Ei ,|T |(confxc) = Ei ,|T |(supx

c)/i(0 ≤ i ≤ |T |) computed in ith step.

#Transaction / n

Support / i

0 1 |T|

01

|T|

...

...

,| |( )ci T xconfE

1,| | 1( )ci T xconfE − −

2

2

Computation in One Step Start of Next Step Explaination

1,| | ( )ci T xconfE −

| |

,| |0

( ) ( )T

c cx i T x

i

conf cE E onf=

=∑ Time Complexity:

O(|T |2)

Space Complexity:

O(|T |)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 12 / 26

Page 41: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Efficient Computation of Expected Confidence

Since E (confxc) = E|T |(confx

c) =∑|T |

i=0 Ei ,|T |(confxc). The computation

is divided into |T |+ 1 steps with Ei ,|T |(confxc) = Ei ,|T |(supx

c)/i(0 ≤ i ≤ |T |) computed in ith step.

#Transaction / n

Support / i

0 1 |T|

01

|T|

...

...

,| |( )ci T xconfE

1,| | 1( )ci T xconfE − −

2

2

Computation in One Step Start of Next Step Explaination

1,| | ( )ci T xconfE −

| |

,| |0

( ) ( )T

c cx i T x

i

conf cE E onf=

=∑ Time Complexity:

O(|T |2)

Space Complexity:

O(|T |)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 12 / 26

Page 42: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Upper Bounds of Expected Confidence

For ∀i(1 ≤ i ≤ |T |), we have

E (confxc) = E|T |(confx

c)

=i−1∑k=0

Ek,|T |(supxc)

k+

|T |∑k=i

Ek,|T |(supxc)

k

≤i−1∑k=0

Ek,|T |(supxc)

k+

|T |∑k=i

Ek,|T |(supxc)

i

=i−1∑k=0

Ek,|T |(supxc)

k+

|T |∑k=0

Ek,|T |(supxc)

i−

i−1∑k=0

Ek,|T |(supxc)

i

=i−1∑k=0

Ek,|T |(supxc)× (

1

k− 1

i) +

E (supxc)

i

=boundi (confxc)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 13 / 26

Page 43: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Upper Bounds of Expected Confidence

For 1 ≤ i ≤ |T |, we have:

E (supxc) = bound1(confx

c)

≥ · · · ≥ boundi (confxc) ≥ · · ·

≥ bound|T |(confxc) = E (confx

c)

Since

boundi (confxc) = boundi−1(confx

c)

− (1

i − 1− 1

i)× (E (supx

c)−i−1∑k=0

Ek,|T |(supxc))

, can compute boundi (confxc) with boundi−1(confx

c).

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 14 / 26

Page 44: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Upper Bounds of Expected Confidence

For 1 ≤ i ≤ |T |, we have:

E (supxc) = bound1(confx

c)

≥ · · · ≥ boundi (confxc) ≥ · · ·

≥ bound|T |(confxc) = E (confx

c)

Since

boundi (confxc) = boundi−1(confx

c)

− (1

i − 1− 1

i)× (E (supx

c)−i−1∑k=0

Ek,|T |(supxc))

, can compute boundi (confxc) with boundi−1(confx

c).

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 14 / 26

Page 45: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Upper Bounds of Expected Confidence

#Transaction / n

Support / i

0 1 |T|

01

|T|

...

...

,| |( )ci T xconfE

1,| |( )ci T xconfE −

1,| | 1( )ci T xconfE − −

2

2

Stop Condition:

SkippedComputation in One Step Start of Next Step Explaination

_( )cc cur db

i x maxbound conf conf≤

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 15 / 26

Page 46: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Upper Bounds of Expected Confidence

Running Example:

0.1

1

10

10 20 30 40 50 60 70 80 90 100

boun

d i(c

onf x

c )

Support / i

boundi(confxc)

confmaxcur_dbc

confxc

boundi(confxc) (Skipped)

Stop when boundi(confxc) <= confmax

cur_dbc

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 16 / 26

Page 47: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Algorithm Framework

1. Calculate the (expected) confidence of current prefix pattern.

2. If the confidence value is larger than previous maximal value, updatecovered instances.

3. If at least one instance covered, select the prefix.

4. Continue growing current prefix, and go to Step 1.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 17 / 26

Page 48: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Algorithm Framework

1. Calculate the (expected) confidence of current prefix pattern.

2. If the confidence value is larger than previous maximal value, updatecovered instances.

3. If at least one instance covered, select the prefix.

4. Continue growing current prefix, and go to Step 1.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 17 / 26

Page 49: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Algorithm Framework

1. Calculate the (expected) confidence of current prefix pattern.

2. If the confidence value is larger than previous maximal value, updatecovered instances.

3. If at least one instance covered, select the prefix.

4. Continue growing current prefix, and go to Step 1.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 17 / 26

Page 50: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Algorithm Framework

1. Calculate the (expected) confidence of current prefix pattern.

2. If the confidence value is larger than previous maximal value, updatecovered instances.

3. If at least one instance covered, select the prefix.

4. Continue growing current prefix, and go to Step 1.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 17 / 26

Page 51: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Algorithm Framework

1. Calculate the (expected) confidence of current prefix pattern.

2. If the confidence value is larger than previous maximal value, updatecovered instances.

3. If at least one instance covered, select the prefix.

4. Continue growing current prefix, and go to Step 1.

Need to sort all the uncertain attributes after certain attributes, to helpshrink current projected database.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 17 / 26

Page 52: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Instance Covering Strategy

Previous Strategy in HARMONY:

Just find one most discriminative covering pattern with the highestconfidence for each instance. On uncertain data, the probability of theinstance being covered could be very low.

Our method:Apply a threshold of minimum cover probability coverProbmin. Assure thatthe probability of each instance not covered by any pattern is less than1− coverProbmin, by maintaining a list storing confidence values ofcovering patterns on class c in descending order.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 18 / 26

Page 53: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Instance Covering Strategy

Previous Strategy in HARMONY:

Just find one most discriminative covering pattern with the highestconfidence for each instance. On uncertain data, the probability of theinstance being covered could be very low.

Our method:Apply a threshold of minimum cover probability coverProbmin. Assure thatthe probability of each instance not covered by any pattern is less than1− coverProbmin, by maintaining a list storing confidence values ofcovering patterns on class c in descending order.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 18 / 26

Page 54: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Used Classifiers

SVM ClassifierConvert each pattern to a binary feature by whether it is contained by theinstance.

Rule-based Classifier (From HARMONY)

For each test instance we just sum up the product of the confidence ofeach pattern on each class and the probability of the instance containingthe pattern. The class with the largest value is the predicted class of theinstance.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 19 / 26

Page 55: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Used Classifiers

SVM ClassifierConvert each pattern to a binary feature by whether it is contained by theinstance.

Rule-based Classifier (From HARMONY)

For each test instance we just sum up the product of the confidence ofeach pattern on each class and the probability of the instance containingthe pattern. The class with the largest value is the predicted class of theinstance.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 19 / 26

Page 56: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Used DatasetsDataset #Instance #Attribute #Class Area

australian 690 14 2 Financialbalance 635 4 3 Socialbands 539 38 2 Physicalbreast 699 9 2 Life

bridges-v1 106 11 6 N/Abridges-v2 106 10 6 N/A

car 1728 6 4 N/Acontraceptive 1473 9 3 Life

credit 690 15 2 Financialechocardiogram 131 12 2 Life

flag 194 28 8 N/Agerman 1000 19 2 Financial

heart 920 13 5 Lifehepatitis 155 19 2 Life

horse 368 27 2 Lifemonks-1 556 6 2 N/Amonks-2 601 6 2 N/Amonks-3 554 6 2 N/A

mushroom 8124 22 2 Lifepima 768 8 2 Life

postoperative 90 8 3 Lifepromoters 106 57 2 Life

spect 267 22 2 Lifesurvival 306 3 2 Lifeta eval 151 5 3 N/A

tic-tac-toe 958 9 2 Gamevehicle 846 18 4 N/Avoting 435 16 2 Socialwine 178 13 3 Physicalzoo 101 16 7 Life

30 Public UCI CertainDatasets

Real values have beendiscretizated.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 20 / 26

Page 57: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Used DatasetsDataset #Instance #Attribute #Class Area

australian 690 14 2 Financialbalance 635 4 3 Socialbands 539 38 2 Physicalbreast 699 9 2 Life

bridges-v1 106 11 6 N/Abridges-v2 106 10 6 N/A

car 1728 6 4 N/Acontraceptive 1473 9 3 Life

credit 690 15 2 Financialechocardiogram 131 12 2 Life

flag 194 28 8 N/Agerman 1000 19 2 Financial

heart 920 13 5 Lifehepatitis 155 19 2 Life

horse 368 27 2 Lifemonks-1 556 6 2 N/Amonks-2 601 6 2 N/Amonks-3 554 6 2 N/A

mushroom 8124 22 2 Lifepima 768 8 2 Life

postoperative 90 8 3 Lifepromoters 106 57 2 Life

spect 267 22 2 Lifesurvival 306 3 2 Lifeta eval 151 5 3 N/A

tic-tac-toe 958 9 2 Gamevehicle 846 18 4 N/Avoting 435 16 2 Socialwine 178 13 3 Physicalzoo 101 16 7 Life

30 Public UCI CertainDatasets

Real values have beendiscretizated.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 20 / 26

Page 58: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Convert to Uncertain Datasets

Two parameters:

Uncertain Attribute Number:Number of attributes selected to converted to uncertain. Those withhighest information gain values are selected.

Uncertain Degree (0 - 1):

The probability the attribute value taking values other than the originalvalue.

Represented by Ux@y, where x is uncertain degree and y is uncertainattribute number.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 21 / 26

Page 59: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Convert to Uncertain Datasets

Two parameters:

Uncertain Attribute Number:Number of attributes selected to converted to uncertain. Those withhighest information gain values are selected.

Uncertain Degree (0 - 1):

The probability the attribute value taking values other than the originalvalue.

Represented by Ux@y, where x is uncertain degree and y is uncertainattribute number.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 21 / 26

Page 60: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Convert to Uncertain Datasets

Two parameters:

Uncertain Attribute Number:Number of attributes selected to converted to uncertain. Those withhighest information gain values are selected.

Uncertain Degree (0 - 1):

The probability the attribute value taking values other than the originalvalue.

Represented by Ux@y, where x is uncertain degree and y is uncertainattribute number.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 21 / 26

Page 61: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Accuracy Evaluation

Average accuracies on 30 datasets. For accuracy on each dataset, refer toour paper.

Using SVM Classifier:Dataset uHARMONY DTU uRule

U10@1 79.0138 74.8738 75.2111U10@2 78.6970 73.1629 73.4107U10@4 77.9657 72.2670 69.4649

U20@1 78.9537 74.6577 74.6287U20@2 78.6073 72.5642 72.5460U20@4 77.8352 69.9157 68.2066

Using Rule-based Classifier:Dataset uHARMONYrule DTU uRule

U10@4 73.2517 72.2670 69.4649

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 22 / 26

Page 62: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Accuracy Evaluation

Average accuracies on 30 datasets. For accuracy on each dataset, refer toour paper.

Using SVM Classifier:Dataset uHARMONY DTU uRule

U10@1 79.0138 74.8738 75.2111U10@2 78.6970 73.1629 73.4107U10@4 77.9657 72.2670 69.4649

U20@1 78.9537 74.6577 74.6287U20@2 78.6073 72.5642 72.5460U20@4 77.8352 69.9157 68.2066

Using Rule-based Classifier:Dataset uHARMONYrule DTU uRule

U10@4 73.2517 72.2670 69.4649

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 22 / 26

Page 63: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Sensitivity Test

94

94.5

95

95.5

96

96.5

97

1.25 2.5 5 10 20

Acc

urac

y (in

%)

Minimum Support (in %)

(a) breast

49

50

51

52

53

54

0.25 0.5 1 2 4

Acc

urac

y (in

%)

Minimum Support (in %)

(b) wine

Figure: Accuracy Evaluation of U10@1 w.r.t. Minimum Support

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 23 / 26

Page 64: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Sensitivity Test

94

94.5

95

95.5

96

96.5

0 10 50 90 100

Acc

urac

y (in

%)

Minimum Cover Prob. (in %)

(a) breast

51

51.5

52

52.5

53

53.5

0 10 50 90 100

Acc

urac

y (in

%)

Minimum Cover Prob. (in %)

(b) wine

Figure: Accuracy Evaluation of U10@1 w.r.t. Minimum Cover Prob.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 24 / 26

Page 65: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Runtime Efficiency

uHarmony DTU uRule

0.1

1

10

100

1000

breast car contraceptive heart pima wine

Run

ning

Tim

e (in

sec

)

Dataset

(a) Running Time (in sec)

Figure: Classification Efficiency Evaluation of U10@1

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 25 / 26

Page 66: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Runtime Efficiency

uHarmony DTU uRule

10

100

1000

breast car contraceptive heart pima wine

Mem

ory

Use

(in

MB

)

Dataset

(b) Memory Use (in MB)

Figure: Classification Efficiency Evaluation of U10@1

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 26 / 26

Page 67: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Effectiveness of the Expected Confidence Upper Bound

With Expected Conf Bound Without Expected Conf Bound

10 15 20 25 30 35 40 45 50

0.2 0.4 0.6 0.8 1 1.2

Run

ning

Tim

e (in

sec

)

Minimum Support (in %)

(a) car

10

15

20

25

30

35

40

0.2 0.4 0.6 0.8 1 1.2

Run

ning

Tim

e (in

sec

)

Minimum Support (in %)

(b) heart

Figure: Running Time Evaluation of U10@4

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 27 / 26

Page 68: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Scalability Test

With Expected Conf Bound Without Expected Conf Bound

0 20 40 60 80

100 120 140 160 180

1 2 4 8 16

Run

ning

Tim

e (in

sec

)

Dataset Duplication Ratio

(a) car (supmin = 0.01)

0

200

400

600

800

1000

1 2 4 8 16

Run

ning

Tim

e (in

sec

)

Dataset Duplication Ratio

(b) heart (supmin = 0.01)

Figure: Scalability Evaluation (U10@1, Running Time)

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 28 / 26

Page 69: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Conclusions

I Proposed the first associative classification algorithm on uncertaindata.

I Proposed an efficient computation of expected confidence value,together with the computation of upper bounds.

I New instance covering strategy has been proposed and tested to beeffective.

I Conducted an extensive evaluation on 30 public real data, undervarying uncertain parameters. With significant improvements onaccuracy, comparing with two other state-of-the-art alrotihms.

I Evaluated the runtime efficiency, proved the effectiveness of usingupper bounds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 29 / 26

Page 70: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Conclusions

I Proposed the first associative classification algorithm on uncertaindata.

I Proposed an efficient computation of expected confidence value,together with the computation of upper bounds.

I New instance covering strategy has been proposed and tested to beeffective.

I Conducted an extensive evaluation on 30 public real data, undervarying uncertain parameters. With significant improvements onaccuracy, comparing with two other state-of-the-art alrotihms.

I Evaluated the runtime efficiency, proved the effectiveness of usingupper bounds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 29 / 26

Page 71: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Conclusions

I Proposed the first associative classification algorithm on uncertaindata.

I Proposed an efficient computation of expected confidence value,together with the computation of upper bounds.

I New instance covering strategy has been proposed and tested to beeffective.

I Conducted an extensive evaluation on 30 public real data, undervarying uncertain parameters. With significant improvements onaccuracy, comparing with two other state-of-the-art alrotihms.

I Evaluated the runtime efficiency, proved the effectiveness of usingupper bounds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 29 / 26

Page 72: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Conclusions

I Proposed the first associative classification algorithm on uncertaindata.

I Proposed an efficient computation of expected confidence value,together with the computation of upper bounds.

I New instance covering strategy has been proposed and tested to beeffective.

I Conducted an extensive evaluation on 30 public real data, undervarying uncertain parameters. With significant improvements onaccuracy, comparing with two other state-of-the-art alrotihms.

I Evaluated the runtime efficiency, proved the effectiveness of usingupper bounds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 29 / 26

Page 73: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

Conclusions

I Proposed the first associative classification algorithm on uncertaindata.

I Proposed an efficient computation of expected confidence value,together with the computation of upper bounds.

I New instance covering strategy has been proposed and tested to beeffective.

I Conducted an extensive evaluation on 30 public real data, undervarying uncertain parameters. With significant improvements onaccuracy, comparing with two other state-of-the-art alrotihms.

I Evaluated the runtime efficiency, proved the effectiveness of usingupper bounds.

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 29 / 26

Page 74: Direct Mining of Discriminative Patterns for Classifying ...dbgroup.cs.tsinghua.edu.cn/chuancong/publications/kdd10...Direct Mining of Discriminative Patterns for Classifying Uncertain

The End

Thank you for Listening!

Questions or Comments?

C. Gao, J. Wang (Tsinghua Univ.) Direct Mining of Discriminative Patterns for Classifying Uncertain Data 30 / 26