A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples
description
Transcript of A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples
![Page 1: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/1.jpg)
A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples
Dell Zhang (BBK) and Wee Sun Lee (NUS)
![Page 2: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/2.jpg)
Problem
Supervised Learning
![Page 3: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/3.jpg)
Problem
Semi-Supervised Learning
![Page 4: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/4.jpg)
Problem
PU Learning
![Page 5: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/5.jpg)
Problem
Unlabeled Examples Help
![Page 6: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/6.jpg)
Problem
PU Learning To distinguish
the interesting instances (the positive class C+) with
other instances (the negative class C-)
by learning a classifier from a set of positive examples P and a set of unlabeled examples U
There is no labeled negative example!
![Page 7: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/7.jpg)
Applications To automatically filter web pages according to a user's
preference the browsed or bookmarked pages can be used as positive examples while unlabeled examples can be easily collected from the web
To automatically find machine learning literature the ICML papers can be used as positive examples while unlabeled examples can be easily collected from the ACM or IEEE
digital library To automatically identify cancer patients
the patients known to have cancers can be used as positive examples while unlabeled examples can be easily collected from the patient
database To automatically discover future customers for direct
marketing the current customers of the company can be used as positive examples while unlabeled examples can be purchased at a low cost compared with
obtaining negative examples ……
![Page 8: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/8.jpg)
Approaches Existing Approaches
PNB (Denis et al. 2002); PNCT (Denis et al. 2003)
S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003)
PEBL (Yu et al. 2004); SVMC (Yu 2005) PN-SVM (Fung et al. 2005) W-LR (Lee & Liu 2003); B-SVM (Liu et al.
2003) Our Proposed Approach
B-Pr
![Page 9: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/9.jpg)
Our Approach
Cx
Cx
p
1 pP
U1
Pr[ | ] Pr[ | ](1 )P C p x x
Pr[ | ] Pr[ | ] Pr[ | ]U C p C x x x
A Probabilistic Model
![Page 10: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/10.jpg)
Our Approach
1Pr[ | ] Pr[ | ] Pr[ | ] Pr[ | ]
1
pC C P U
p
x x x x
( ) sgn Pr[ | ] Pr[ | ]f b P U x x x
( ) sgn Pr[ | ] Pr[ | ]f x C C x x
(1 ) (1 )b p p
![Page 11: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/11.jpg)
Our Approach
Biased PrTFIDF (B-Pr) Estimate
PrTFIDF (Joachims 1997) Estimmate
Maximize On a held-out validation set (Lee & Liu 2003)
Linear Time Complexity!
b2Pr[ ] Pr[ ( ) 1]pr C r f x
Pr[ | ] and Pr[ | ]P Ux x
![Page 12: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/12.jpg)
Experiments
Reuters-21578
B-Pr>RC-SVM>PEBL (p=0.55)
RC-SVM>B-Pr>PEBL (p=0.85)
![Page 13: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/13.jpg)
Experiments
20NewsGroups
B-Pr>W-LR>S-EM (p=0.3)
B-Pr>W-LR>S-EM (p=0.7)
![Page 14: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/14.jpg)
Conclusion
A New Approach to Learning from Positive and Unlabeled Examples As effective as the state-of-the-art
approaches Yet simpler and faster
![Page 15: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814649550346895db3595a/html5/thumbnails/15.jpg)
Thank you
Questions? Comments? Suggestions? ……