Active Learning and Selective...

Active Learning and

Selective Sensing

Can’t Learn W

ithout You

Sensing

Computing

Sensing

Computing

“Closing the Loop”

Learning to Discover

Sequential approach:select new samples/experiments that are

predicted to be maximally inform

ative in discriminating hypotheses

select

sensing

action

sample

/sense

observe

/ infer

Laplace

Discovery !

Decided to make new astronomical

measurements when “the discrepancy

between prediction and observation [was]

large enough to give a high probability that

there is something new to be found.”

Jaynes(1986)

selective

sensing

observe

/ infer

Learning a decision hyperplanein

Selective sampling yields exponential speed-up in learning !

Y. Freund, H. S. Seung, E. Shamir, and

N. Tishby. Selective sampling using the

query by committee algorithm. Machine

Learning, 28(2-3):133–168, 1997.

Now you see it, now you don’t !

Weak signals/patterns are imperceptible without selective sensing !

sparse

signal

J. Haupt, R. Castro, and R. Nowak,

"Distilled sensing: selective sampling for

sparse signal recovery," in Proceedings of

AISTATS 2009, pp 216-223.

Outline

1. Active Learning: selective sampling for binary prediction problems

2. Distilled Sensing: selective sensing for sparse signal recovery

Common theme: feedback between data analysis and data

collection can be crucial for effective learning and inference

hypothesis

“Does the person

have blue eyes ?”

“Is the person

wearing a hat ?”

Binary Search

“Binary Search”works very well in simple conditions

Where is it shady vs. sunny ?

Binary Search and Threshold Functions

Where is it shady vs. sunny ?

1/3 = 0…

1/3 = 01…

1/3 = 010…

1/3 = 0101…

0 * 1/2

1 * 1/40 * 3/8

1 *5/16

active learning: sequentially select points for labeling

passive learning: all points are labeled

n samples �

n bits

n samples �

effectively log n bits

0 * 1/2

1 * 1/40 * 3/8

1 *5/16

Bounded and Unbounded Noise

“bounded noise”: strictly more/less probably 1 at all locations

more probably 0

more probably 1

“unbounded noise”: like the toss of a fair coin at threshold

Learning Rates for Multidimensional Thresholds

Compare with passive learning

Active Learning: Theorem (R. Castro and RN ’07)

hypothesis

oracle

Learner

consider

hypotheses

select sample/

query that is highly

discriminative

query oracle

inate or discount

inconsistent hypotheses

"With every m

istake, we m

ust surely be learning." G. Harrison

Generalized Binary Search (akaSplitting Algorithm)

Selects a query for which disagreementamong

viable hypotheses is maximal

hypothesis

oracle

Example: Two-Dimensional Thresholds

How well does GBS work ?

Geometry of GBS

Classic Binary Search is possible because the hypotheses are ordered with respect

to queries. W

e need a similar structure for more general hypothesis spaces.

To that end, note that the hypotheses induce a

partition of the query space into equivalence sets

Geometric Condition for GBS Convergence

Classic Binary Search Revisited

0 0 0 0 0 0 0 0 1 1 1 1 1 1

-1 -1-1

unknown correct threshold at i*/n

Theorem 1 Proof Sketch:

‘good’situation:

‘bad’situation:

Theorem 1 Proof Sketch:

‘good’situation:

Ex. Linear Thresholds in Two Dimensions

maximally inform

ative queries

Linear Thresholds in

KimiParker

Example

Suppose we have a sensor network observing a binary activation pattern with a

linear boundary. How many sensors must be queried to determ

ine the pattern?

number of hypotheses vs. queries

log number of hypotheses vs. queries

100 sensors, 9900 possible linear boundaries

Correct boundary determ

ined after querying 12 sensors

Conclusions

axBounds

selective sampling/querying can accelerate the learning of threshold functions

Generalized Binary Search

multidimensional threshold functions can be learned at the optimal rate

by selecting maximally discriminative queries

R. Castro and RN, “M

inimaxBounds for Active Learning,”

IEEE Trans. Info. Theory, pp. 2339–2353, 2008

RN, “Generalized Binary Search,”Proceedings of the Forty-

Sixth Annual AllertonConference on Communication, 2008

Detection/Estimation of Sparse Signals

How reliably can we determ

ine sparsitypatterns ?

Distilled Sensing

Detection/Estimation of Sparse Signals

Astrophysics

Genomics

How reliably can we infer sparsepatterns ?

Sparse Signal Model

signal support set

Example:

In this talk we will assume .

Noisy Observation Model

Suppose we want to locate just onesignal component:

Because of noise, even if no signal is present

How small can µ

*be so that we can still reliably

locate the signal components from the observations?

Sparse Support Recovery

Adaptive control of therelative proportion of errors

(Benjamini& Hochberg ’95)

When testing a large number of hypotheses

simultaneously we are bound to m

ake errors

Approaches:

Control the probability of perfect localization of the support

(Bonferronicorrection) –very conservative

False Discovery Rate Control

Recall the definition of the signal support set

Goal:Estimate the support as well as possible. Let

be the outcome of a support estimation procedure.

# falsely discovered com

ponents

# discovered com

ponents

# missed com

ponents

# true components

Discovery

Proportion

iscovery

Proportion

Since nis typically very large it makes sense to

study asymptoticperform

ance, as n→∞

Known Results (Jin & Donoho’03)

Assume the signal is very sparse:

Example:β=3/4

n = 10000 ⇒

|Is|=10

n = 1000000 ⇒

|Is|=32

Number of signal

components

Known Results (Jin & Donoho’03)

Estimable

Signal

strength

Sparsity

Non-estimable

These asymptoticresults tell us how strong the

signals need to be for reliable signal localization

A Generalization of the Sensing Model

Allow m

ultiple observations…

…subject to a sampling energy budget

are called the sensing vectors.

(Note:in the previous work a single observation was

considered, where )

Distilled Sensing

Proceeding in this fashion, gradually focuson the signal subspace

Enhanced Sensitivity by Selectivity

Theorem 2

(J. Haupt, R. Castro and RN ‘08)

Furtherm

ore if one does not allow active sensing, then the previous

results (equivalent to k=0) cannot be improved.

original signal

(~0.8% non-zero components)

Noisy version of the

image (k=0)

Distilled sensing recovery

(FDR = 0.01)

Passive sensing recovery

(FDR = 0.01)

Noisy versions of the

image (k=5)

Distilled Sensing Example

Thresholds of Perceptibility

recoverypossible

from passive sensing

These results suggest we m

ight be able to estimate

signal with amplitudes growing slower than

Signal

strength

Sparsity

Universal Perceptibility

Theorem 3

(J. Haupt, R. Castro and RN ‘09)

Corollary

Proof Sketch (Theorem 2)

With high probability each distillation keeps almost all the non-zero

components and rejects about half of the non-signal components.

Energy in signal subspacedoublesat each step

Now you see it, now you don’t !

Weak signals/patterns are imperceptible without selective sensing !

sparse

signal

J. Haupt, R. Castro and RN, “Distilled Sensing: Selective

Sampling for Sparse Signal Recovery,”AISTATS 2009

Active Learning and Selective...

Documents

Transcript of Active Learning and Selective...

Kernel Methods and Support Vector Machinesweb.cse.ohio-state.edu/mlss09/mlss09_talks/1.june-MON/jst_tutorial.… · Kernel methods approach † Data embedded into a Euclidean feature

Selective Color

ANNUAL REPORT - Selective Insurance/media/Files/S/Selective/2016...Selective 2016 Annual Report Selective Insurance Group, Inc. 40 Wantage Avenue Branchville, New Jersey 07890 The

Selective Incorporation

INTRODUCTION TO BAYESIAN INFERENCE PART 2mlg.eng.cam.ac.uk › mlss09 › mlss_slides › Bishop-MLSS-09-2.pdf · BAYESIAN INFERENCE –PART 2 CHRIS BISHOP. Personal Healthcare Revolution

Selective IT neurons are selective along many dimensions

Article Frequency Selective Buildings Through Frequency ...clok.uclan.ac.uk/17909/1/Frequency Selective Buidlings through Frequency Selective...FREQUENCY SELECTIVE BUILDINGS THROUGH

Selective Call Intercom System€¦ · Selective Call Back Box IC5006-BK N/A WIrIng dIagram addItIonal ProduCtS Selective Call Intercom System SeleCtIve Call InterCom InStallatIon

Value Chain Strategy for Services automation self-service offshoring selective outsourcing insourcing selective automation selective outsourcing offshoring.

A Bahadur Type Representation of the Linear Support Vector ...web.cse.ohio-state.edu/mlss09/mlss09_talks/8.june-MON/Lee.pdf · Regularity Conditions (A1) The densities f and g are

Selective Marketing

Selective versus non-Selective Caries Removal in Permanent ...

Selective Mutism

Kernel Methods and Support Vector Machinesweb.cse.ohio-state.edu/mlss09/mlss09_talks/1.june-MON... · 2009-06-07 · Kernel Methods and Support Vector Machines John Shawe-Taylor Department

PAC-Bayes Analysis: Background and Applicationsweb.cse.ohio-state.edu/mlss09/mlss09_talks/1.june-MON/JST_pacbayes... · 2 PAC-Bayes Analysis Deﬁnitions PAC-Bayes Theorem Applications

INTRODUCTION TO BAYESIAN INFERENCE PART 1mlg.eng.cam.ac.uk/mlss09/mlss_slides/Bishop-MLSS-09-1.pdfINTRODUCTION TO BAYESIAN INFERENCE ... knowledge and statistical learning ... Bayesian

MLSS Tutorial on: Deep Belief Nets - University of Cambridgemlg.eng.cam.ac.uk/mlss09/mlss_slides/Hinton_1.pdf · MLSS Tutorial on: Deep Belief Nets (An updated and extended version

Sequential Monte Carlo Methods Simon Godsill ...mlg.eng.cam.ac.uk/mlss09/mlss_slides/Godsill_1.pdf · Sequential Monte Carlo Methods Simon Godsill sjg Cambridge, September, 2009 University

Selective Magazine

Selective Breeding Noadswood Science, 2012. Selective Breeding What do you understand by the term ‘selective breeding’?