Machine Learning

26
Active Learning Shrey Malik 0901CS32 [email protected]

description

Machine Learning

Transcript of Machine Learning

Page 1: Machine Learning

Active Learning

Shrey [email protected]

Page 2: Machine Learning

What is it ?

● Machine Learning

● Making a program Curious !● Teach it to decide on its own.● Give it some intelligence

Make a program to label documents according to contents :

Sports , Technology, History, Geography, Politics etc...

Page 3: Machine Learning

What is it ?

Step 1.Download a lot of documents from the web

Step 2.Label Them !

Labeling is quite a painful task. Somehow our program should be able to distinguish b/w the various categories.

Teach the program using examples (Training set) and make sure it makes intelligent decision in real world situations.

Question ! How many and what examples ?

Page 4: Machine Learning

Example

All kinds of “unlabeled” data

Page 5: Machine Learning

Example

One way : Select a few data points at random, label them give the input output set to the program … and let it “learn” from these examples.

Supervised Learning .

Page 6: Machine Learning

Example

One way : Select a few data points at random, label them give the input output set to the program … and let it “learn” from these examples. BUT ! Keep in mind (memory) the location of other labeled points.Semi Supervised & Active learning !

Page 7: Machine Learning

Example

Got a better generalization this time !Didn't we ??

Page 8: Machine Learning

Active Learning Somehow make the set of training examples smaller& results, more accurate.

Page 9: Machine Learning

So how to make Training set smaller & smarter ??

Select the Training examples which are most uncertain … instead of doing it at random .

The program asks Queries from the “Oracle” in the form of unlabeled instances to be labeled.

In this way, the active learner aims to achieve high accuracy using as few labeled instances as possible, thereby minimizing the cost of obtaining labeled data.

Eg. Query the unlabeled point that is:Closest to the boundary.ORMost UncertainORMost likely to decrease overall certainty.Etc etc.

\

Page 10: Machine Learning

How does the learner ask queries ?

There are several different problem scenarios in which the learner may be able to ask queries.

For example:

Page 11: Machine Learning

Membership Query SynthesisThe learner may request labels for any unlabeled instance in the

input space, including (and typically assuming) queries that the learner generates de novo, rather than those sampled from some underlying natural distribution.

BUT sometimes the queries to label are quite awkward !

*De novo means from the source,fresh & itself.

Page 12: Machine Learning

Stream-Based Selective Sampling

Obtain an unlabeled instance, sampled from the actual distribution.

Now, the learner decide whether to request its label or not.

The learner !

Page 13: Machine Learning

Pool-Based SamplingFor many real-world learning problems, large collections of unlabeled data Can be gathered at once.

The learner !

A large pool of Instances

Selects the Best Query

Page 14: Machine Learning

… and how does the program select the best Query ??

Uncertainty Sampling Query the instance for which it is least confident.

x = ∗ max( 1 − P ( y' | x ) )

Where y' = max( P ( y | x ) )

X* = The best QueryP(y|x) = conditional probability …

Page 15: Machine Learning

…and how does the program select the best Query ??

Query-By-CommitteeMaintain a committee of models all trained on `that` Input space & let them label it …Now select the queries for which they disagree the most !

Page 16: Machine Learning

For measuring the level of disagreement:

… and then there are a lot of other algorithms also !

Yi :: ranges over all possible labelings.V (yi ) :: number of “votes” that a label receives from among the committee

members’ predictions.C :: Committee size !

Page 17: Machine Learning

The Algorithm…

1.Start with a large pool of unlabeled data

Select the single most informative instance to be labeled by the oracle

Add the labeled query to the Training set

Re-train using this newly acquired knowledge

Goto 1

Page 18: Machine Learning

Is Active learning 'The' thing ?

Assumptions

1.Annotator, the Oracle is always right .2.If Annotator is wrong, see rule one !

3.Labeling is sooo expansive … is it ???

So can my machine learn more economically if it is allowed to ask questions ???

Are you from delhi ?

Seen the qutub minar ?

Used the metro ?

Page 19: Machine Learning

Suggested Improvements in it ...

Dr. Burr Settles ...

The Oracle has to wait as learner “re-trains” after each label By him/her. learner should Ask to label a batch of queries at once instead …Querying in BATCHES

Page 20: Machine Learning

Suggested Improvements in it ...

Dr. Burr Settles ... Oracles are not always right …

They can be fatiguedError in instruments etc

CrowdSourcing on webYou just played a fun game :

Tag as many rockstars in the pic as you can in one minute

Challenge your friends Like on facebook

...meanwhile the learner was learning from your labels … thanku Oracle !

Page 21: Machine Learning

Suggested Improvements in it ...

Dr. Burr Settles ... Goal: to minimize the overall of training an accurate model.

Simply reducing the number of labeled instances Wont help.

Cost Sensitive Active Learning approaches explicitly Account for varying labeling costs while selecting Queries.

eg. Kapoor et al. Proposed a decision-theoratic approach.Takes into account both labeling & misclassification cost. Assumption: Cost of labeling prop. To length.

Page 22: Machine Learning

Suggested Improvements in it ...

Dr. Burr Settles ...

If labeling cost is not known,Try to predict the real, unknown annotation cost based on a few simple “meta features” on the instances.

Research has shown that these learned cost-models aresignificantly better than simpler cost heuristics (e.g., a linear function of length).

Page 23: Machine Learning

Active Learning :: Practical Examples

Drug Design

Unlabeled Points :: A large (really large) pool of Chemical Compounds.Label :: Active (binds to a target) or Not.Getting a label :: The Experiment.

Page 24: Machine Learning

Active Learning :: Practical Examples Pedestrian Detection

Page 25: Machine Learning

Conclusion

Machines should be able to do all the things we hate … & machine learning will play a big role in achieving this goal.

And to make machine learning faster and cheaper … active learning is the key !

Machine/Active learning is a very good area for research !

Machines will become Intelligent and wage a war against

Humanity !

Page 26: Machine Learning

Thank You :)

Do Check out http://en.akinator.com