Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions....

44
Learning to Learning to Predict Predict Presenter: Russell Greiner Presenter: Russell Greiner

Transcript of Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions....

Page 1: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

Learning to Learning to PredictPredict

Presenter: Russell GreinerPresenter: Russell Greiner

Page 2: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

2

Vision StatementVision Statement

Helping the world understand data

… and make informed decisions.… and make informed decisions.

Single decision:determineclass label of an instance

set of labels of set of pixels, …

value of a property of an instance, …

Page 3: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

3

Motivation for Training a Motivation for Training a PredictorPredictor

TempPress.

Sore-Throat

… Color

32 90 N … PalePredictor

treatX

Ok

Need to know “label” of an instance,to determine appropriate actionPredictorMed( patient#2 ) =? “treatX is Ok”

Unfortunately, Predictor( . ) not known a priori

But many examples of patient, treatX

Page 4: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

4

Motivation for Training a Motivation for Training a PredictorPredictorMachine learning provide alg’s for mapping

{ patient, treatX } to Predictor(.) function

Pale…N9032

Color…Sore-

ThroatPress.Temp Predictor

treatX

Ok

Learner

N

N

Y

Sore Throat

NoPale8710

::::

OkClear11022

NoPale9535

treatXColourPress.Temp.

Page 5: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

5

Motivation for Training a PredictorMotivation for Training a Predictor

Need to learn (not program it in) when predictor is … … not known… not expressible… changing… user dependent

Pale…N9032

Color…Sore-Throat

Press.Temp PredictortreatX

No

Learner

N

N

Y

Sore Throat

NoPale8710

::::

OkClear11022

NoPale9535

treatXColourPress.Temp.

Page 6: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

6

PersonnelPersonnelPI synergy:

Greiner, Schuurmans, Holte, Sutton, Szepesvari, Goebel

5 Postdocs16 Grad students (5 MSc, 11 PhD)5 Supporting technical staff

+ personnel for Bioinformatics thrust

Page 7: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

7

Partners/CollaboratorsPartners/Collaborators

4 UofA CS profs1 UofAlberta Math/StatNon-UofA collaborators:

Google, Yahoo!, Electronic Arts, UofMontreal, UofWaterloo, UofNebraska, NICTA, NRC-IIT, …

+ Bioinformatics thrust collaborators

Page 8: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

8

Additional ResourcesAdditional ResourcesGrants

$225K CFI$100K MITACS$100K Google

Hardware68 processor, 2TB, Opteron Cluster54 processor, dual core, 1.5TB, Opteron

Cluster

+ funds/data for Bioinformatics thrust

Page 9: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

9

HighlightsHighlights

IJCAI 2005 – Distinguished Paper Prize

UM 2003 – Best Student Paper PrizeWebIC technology is foundation for

start-up companySignificant advances in extending

SVMs to use Un-supervised/Semi-supervised data, and for structured data

+ Highlights from Bioinformatics thrust

Page 10: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

10

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re: training dataIID / unstructuredLots of instancesLow dimensions Complete featuresCompletely labeledBalanced data is sufficient

Pale…N9032

Color…Sore-

ThroatPress.

Temp PredictortreatX

No

Learner

N

N

Y

Sore Throat

NoPale8710

::::

OkClear11022

NoPale9535

treatXColourPressTemp.

Page 11: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

11

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re: training dataSimplifying assumptions re: training dataIID / unstructured ?Lots of instancesLots of instancesLow dimensions Low dimensions Complete featuresComplete featuresCompletely labeledCompletely labeledBalanced dataBalanced data is sufficientis sufficient

Segmenting Brain Tumors

Extensions to Conditional Random Fields, …

Page 12: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

12

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re training dataSimplifying assumptions re training dataIID / unstructuredIID / unstructuredLots of instances ?Low dimensions ?Complete featuresComplete featuresCompletely labeledCompletely labeledBalanced dataBalanced data is sufficientis sufficient

m 1000’s

7.3 2.1 5.0 … 1.1 Y

22.1 6.03 3.1 … 3.0 Y

22.1 6.03 3.1 … 3.0 Y

22.1 6.03 3.1 … 3.0 Y

22.1 6.03 2.2 … 3.0 Y

22.1 6.03 12 … 3.0 Y

22.1 6.03 5 … 3.0 Y

: : : : :

32.0 1.9 5.8 … 2.8 N

N 10’s

Page 13: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

13

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re training dataSimplifying assumptions re training dataIID / unstructuredIID / unstructuredLots of instances ?Low dimensions ?Complete featuresComplete featuresCompletely labeledCompletely labeledBalanced dataBalanced data is sufficientis sufficient

g1 g2 g3 … gN disease

7.3 2.1 55.0 … 1.1 Y

22.1 6.03 29.1 … 3.0 Y

: : : : :

32.0 1.9 15.8 … 2.8 N

m100

N 20,000

Microarray, SNP Chips, …

Dimensionality Reduction… L 2 Model: Component Discovery BiCluster Coding

Page 14: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

14

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re training dataSimplifying assumptions re training dataIID / unstructuredIID / unstructuredLots of instancesLots of instancesLow dimensions Low dimensions Complete features ?Completely labeledCompletely labeledBalanced dataBalanced data is sufficientis sufficient

g1 g2 g3 … gN diseaseX

7.3 2.1 55.0 … 1.1 Y

22.1 6.03 29.1 … 3.0 Y

: : : : :

32.0 1.9 15.8 … 2.8 N

Budget Learning

g1 g2 g3 … gN diseaseX

7.3 2.1 55.0 … 1.1 Y

22.1 … 3.0 Y

: : : : :

1.9 … N

g1 g2 g3 … gN diseaseX

Y

Y

:

N

Page 15: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

15

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re training dataSimplifying assumptions re training dataIID / unstructuredIID / unstructuredLots of instancesLots of instancesLow dimensions Low dimensions Complete featuresComplete featuresCompletely labeled ?Balanced dataBalanced data is sufficientis sufficient

g1 g2 g3 … gN treatX

7.3 2.1 55.0 … 1.1 Y

22.1 6.03 29.1 … 3.0 Y

20.7 6.03 29.1 … 3.0 N

22.1 8.73 20.1 … 5.0 N

123 6.03 17.1 … 7.0 Y

: : : : :

32.0 1.9 15.8 … 2.8 N

SemiSupervised LearningActive Learning

treatX

Y

Y

Page 16: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

16

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re training dataSimplifying assumptions re training dataIID / unstructuredIID / unstructuredLots of instancesLots of instancesLow dimensions Low dimensions Complete featuresComplete featuresCompletely labeledCompletely labeledBalanced data ? is sufficientis sufficient

Cost Curves (analysis)

Page 17: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

17

Learning to Predict: Learning to Predict: ChallengesChallenges

Simplifying assumptions re training dataSimplifying assumptions re training dataIID / unstructuredIID / unstructuredLots of instancesLots of instancesLow dimensions Low dimensions Complete featuresComplete featuresCompletely labeledCompletely labeledBalanced dataBalanced data is sufficient ?

Robust SVMMixture Using VarianceLarge Margin Bayes NetCoordinate Classifier…

Page 18: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

18

Projects and StatusProjects and StatusStructured Prediction

Random Fields Parsing Unsupervised M3N

Dimensional Reduction (L 2 Model: Component Discovery)

Budgeted LearningSemiSupervised Learning

large-margin (SVM) probabilistic (CRF) graph based transduction

Active Learning CostCurvesRobust SVMCoordinated ClassifiersMixture Using VarianceLarge Margin Bayes Net

IID / unstructured

Lots of instancesLow dimensions Complete features

Completely labeled

Balanced data

Beyond simple learners

Poster # 26

Page 19: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

Budgeted LearningBudgeted Learning

Technical DetailsTechnical Details

Page 20: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

20

b 0 5 b 1

b 1 3 a 0

a 1 1 a 0

b 1 1 a 0

a 0 3 a 1

Person 1

Person 2

Typical Supervised LearningTypical Supervised LearningResponse

Predictor

Learner

Page 21: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

21

Person 1

Person 2

ActiveActive LearningLearningResponse

Predictor

Learner

b 0 5 b ?

b 1 3 a ?

a 1 1 a ?

b 1 1 a ?

a 0 3 a ?

User is able to PURCHASE labels, at some cost… for which instances??

?

+

?

+

--

Page 22: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

22

Person 1

Person 2

BudgetedBudgeted LearningLearningResponse

Predictor

Learner

? ? ? ? 1

? ? ? ? 0

? ? ? ? 0

? ? ? ? 0

? ? ? ? 1

User is able to PURCHASE values of features, at some cost … but which features for which instances??

1 5 + t

0 5 + f

? ? ? ?

? ? ? ?

? ? ? ?

Page 23: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

23

Person 1

Person 2

BudgetedBudgeted LearningLearningResponse

Predictor

Learner

? ? ? ? 1

? ? ? ? 0

? ? ? ? 0

? ? ? ? 0

? ? ? ? 1

Significantly different from ACTIVE learning:correlations between feature values

? 5 ? ?

? ? + ?

0 ? + ?

? 9 -- ?

? ? ? ?

User is able to PURCHASE values of features, at some cost … but which features for which instances??

Page 24: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

24

n=10, Beta(10,1)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 5 10 15 20 25 30 35 40 45

time

round-robin

random

greedy

allocational

lookahead

biased-robin

10 tests ($1/test)

Budget =$40 Beta(10,1)

# features purchased

Page 25: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

25

Budgeted Learning… so farBudgeted Learning… so farDefined framework

Ability to purchase individual feature valuesFixed LEARNING / CLASSIFICATION Budget

Theoretical resultsNP-hard in general Standard algorithms not even Approx !

Empirical Results show …Avoid Round RobinTry clever algorithms

Biased Robin Randomized Single Feature Lookahead

[Lizotte,Madani,Greiner: UAI’03], [Madani,Lizotte,Greiner: UAI’04], [Kapoor,Greiner: ECML’05]

Page 26: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

26

Person 1

Person 2

Future Work #1Future Work #1

Response

Classifier

Learner

? ? ? ? 1

b ? ? ? 0

? ? ? ? 0

? ? ? ? 0

? ? ? ? 1

?

?

?

?

?

Page 27: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

27

Future Work #2Future Work #2

Sample complexity of Budgeted LearningHow many (Ij, Xi) “probes” required to PAC-learn ?

Develop policies with guarantees on learning performance

Complex cost model … Bundling tests, … Allow learner to perform more powerful probes

purchase X3 in instance where X7 = 0 & Y = 1

More complex classifiers ?

Page 28: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

28

Person 1

Person 2

Future Work #3Future Work #3

Response

? ? ? ? ?

? ? ? ? ?

? ? ? ? ?

? ? ? ? ?

? ? ? ? ?

Goal: Find * = argmax P(D)

?

?

?

?

?

Learning Generative Model

Page 29: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

29

Projects and StatusProjects and StatusStructured Prediction

(ongoing)

Dimensional Reduction: (ongoing; RoBiC: Poster#8)

Budgeted Learning(ongoing)

SemiSupervised Learning(ongoing)

Active Learning(ongoing)

CostCurves(complete; Post#26)

LabelsMTest MTrain

0 1 .. 1

1 1 .. 0

1 0 … 1

1 1 … 0

0 0 … 1

1 1 … 0

Learner

Classifier

+

+

+

FindBiClusters

BiClusterMembershi

p

Page 30: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

30

Page 31: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

Using Variance Using Variance Estimates to Combine Estimates to Combine Bayesian ClassifiersBayesian Classifiers

Technical DetailsTechnical Details

Page 32: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

32

MotivationMotivation

Spse many different classifiers …For each instance, want each classifier to…

“know what it knows” …… and shout LOUDEST when it knows best…

“Loudness” 1/ Variance !

C2++ +

+

+

+o

o

oo o

o

+

++ +

++

+

oo

oo o

o

+

o

++ +

+

+

+o

o

oo o

o

+ +

+ +

+

+

+o

o

oo o

o

+

C1

C3 C4

*

§

Page 33: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

33

Mixture Using VarianceMixture Using Variance

Given belief net classifierfixed (correct) structureparameters estimated from (random) datasample

Response to query “P(+c| -e, +w)” is… asymptotically normal with …(asymptotic) variance

Variance easy to compute …for simple structures (Naïve Bayes, TAN) … and for complete queries

|

2

2 2| | | |

θ |

1' ( ) ' ( )

1 nC

Q d d d dd D d DD

q q

f

f f f ff

Page 34: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

34

Experiment #4b:Experiment #4b:MUVMUV(kNB(kNB, , AdaboostAdaboost, js), js) vs vs AdaBoost(NB)AdaBoost(NB)

MUV significantly out-performs AdaBoost even when using base-classifiers that AdaBoost generated!

MUV(kNB, AdaBoost, js) better than AdaBoost[NB]

with p < 0.023

Page 35: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

35

MUV ResultsMUV ResultsSound statistical foundationVery effective classifier …

…across many real datasetsMUV(NB) better than AdaBoost(NB)!

C. Lee, S. Wang and R. Greiner; ICML’06

Page 36: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

36

Mixture Using Variance … next Mixture Using Variance … next steps?steps?Other structures (beyond NB, TAN)

Beyond just tabular CPtables for discrete variablesNoisy-orGaussians

Learn different base-classifiers from different subset of features

Scaling up to many MANY featuresoverfitting characteristics?

Page 37: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

37

Confidence in ClassifierConfidence in Classifier

Confidence of Prediction?Fit each j, j

2 to Beta(aji, bj)Compute area CDFBeta(aj, bj)(0.5)

Page 38: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

38

Temp. BP.Sore Throa

t… Colour diseaseX

35 95 Y … Pale No

22 110 N … Clear Yes

: : :

10 87 N … Pale

Semi-Supervised LearningSemi-Supervised Learning

TempPress.

Sore-Throat

… Color

32 90 N … PaleClassifier

diseaseX

No

Learner

Temp. BP.Sore Throa

t… Colour diseaseX

35 95 Y … Pale No

22 110 N … Clear Yes

10 87 N … Pale Yes

17 82 Y … Red No

33 82 N … Blue No

: : : :

4 87 N … Pale No

Labeled Training Data

UnLabeled Training Data

Page 39: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

39

ApproachesApproaches

Ignore the unlabeled dataGreat if have LOTS of labeled data

Use the unlabeled data, as is…“Semi-Supervised Learning”… based on

large margin (SVM) graph probabilistic model

Pay to get labels for SOME unlabeled data“Active Learning”

Page 40: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

40

Semi-supervised Multi-class SVMSemi-supervised Multi-class SVM

Approach: find a labeling that would yield an optimal SVM classifier, on the resulting training data.

Hard, but semi-definite relaxations can approximate

this objective surprisingly welltraining procedures are computationally

intensive, but produce high quality generalization results.

L. Xu, J. Neufeld, B. Larson, D. Schuurmans. Maximum margin clustering. NIPS-04.

L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class SVMs. AAAI-05.

Page 41: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

41

Probabilistic Approach to Probabilistic Approach to Semi-Supervised LearningSemi-Supervised Learning

Probabilistic model: P(y|x)Context: non-IID data

Language modellingSegmenting Brain Tumor from

MR ImagesUse Unlabeled Data as

RegularizerFuture: Other applications…

C-H. Lee, W. Shaojun, F. Jiao, D. Schuurmans and R. Greiner. Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields. NIPS06.

F. Jiao, S. Wang, C. Lee, R. Greiner, and D. Schuurmans. Semi-supervised conditional random fields for improved sequence segmentation and labeling. COLING/ACL06.

Page 42: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

42

Active LearningActive Learning

Pay for label to query xi that ... maximizes conditional mutual information about unlabeled data:

How to determine yi ? Take EXPECTATION wrt Yi ?

Use OPTIMISTIC guess wrt Yi ?

( , )arg min min ( | , )ii U y u u L y

u

H Y xx

( , )arg min ( | , ) ( | , )ii U i L u u L yy

u

P y H Y x x x

Page 43: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

43

Optimistic Active Learning Optimistic Active Learning using Mutual Informationusing Mutual Information

Need Optimism Need “on-line adjustment” Better than just MostUncertain, …

pima breast

Y. Guo and R. Greiner. Optimistic active learning using mutual information. IJCAI’07

Page 44: Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

44

Future Work on Active Future Work on Active LearningLearningUnderstand WHY “optimism” works…

+ other applications of optimismExtend framework to deal with

non-iid datadifferent qualities of labelers…