Active Learning and Selective...

Post on 17-Jan-2020

1 views 0 download

Transcript of Active Learning and Selective...

Active Learning and

Selective Sensing

Can’t Learn W

ithout You

Sensing

Computing

Sensing

Computing

“Closing the Loop”

Learning to Discover

Sequential approach:select new samples/experiments that are

predicted to be maximally inform

ative in discriminating hypotheses

select

sensing

action

sample

/sense

observe

/ infer

Laplace

Discovery !

Decided to make new astronomical

measurements when “the discrepancy

between prediction and observation [was]

large enough to give a high probability that

there is something new to be found.”

Jaynes(1986)

selective

sensing

observe

/ infer

Learning a decision hyperplanein

+-

Selective sampling yields exponential speed-up in learning !

Y. Freund, H. S. Seung, E. Shamir, and

N. Tishby. Selective sampling using the

query by committee algorithm. Machine

Learning, 28(2-3):133–168, 1997.

Now you see it, now you don’t !

Weak signals/patterns are imperceptible without selective sensing !

sparse

signal

noise

J. Haupt, R. Castro, and R. Nowak,

"Distilled sensing: selective sampling for

sparse signal recovery," in Proceedings of

AISTATS 2009, pp 216-223.

Outline

1. Active Learning: selective sampling for binary prediction problems

2. Distilled Sensing: selective sensing for sparse signal recovery

Common theme: feedback between data analysis and data

collection can be crucial for effective learning and inference

hypothesis

space

“Does the person

have blue eyes ?”

“Is the person

wearing a hat ?”

Binary Search

“Binary Search”works very well in simple conditions

Where is it shady vs. sunny ?

Binary Search and Threshold Functions

101

0x

y

Where is it shady vs. sunny ?

1/3 = 0…

1/3 = 01…

1/3 = 010…

1/3 = 0101…

Binary Search and Threshold Functions

101

0

0 * 1/2

1 * 1/40 * 3/8

1 *5/16

101

0*

**

**

**

**

**

**

**

**

11

11

0

11

00

00

00

00

00

active learning: sequentially select points for labeling

passive learning: all points are labeled

n samples �

n bits

n samples �

effectively log n bits

**

**

**

**

**

**

**

**

*x

y

x

y

Binary Search and Threshold Functions

101

0

0 * 1/2

1 * 1/40 * 3/8

1 *5/16

101

0*

**

**

**

**

**

**

**

**

11

11

0

11

00

00

00

00

00

Bounded and Unbounded Noise

“bounded noise”: strictly more/less probably 1 at all locations

more probably 0

more probably 1

“unbounded noise”: like the toss of a fair coin at threshold

Learning Rates for Multidimensional Thresholds

1

Compare with passive learning

Active Learning: Theorem (R. Castro and RN ’07)

query

space

hypothesis

space

oracle

Learner

consider

hypotheses

select sample/

query that is highly

discriminative

query oracle

elim

inate or discount

inconsistent hypotheses

"With every m

istake, we m

ust surely be learning." G. Harrison

Generalized Binary Search (akaSplitting Algorithm)

Selects a query for which disagreementamong

viable hypotheses is maximal

hypothesis

space

hypothesis

space

query

space

oracle

Example: Two-Dimensional Thresholds

-1

+1

+1

How well does GBS work ?

Geometry of GBS

Classic Binary Search is possible because the hypotheses are ordered with respect

to queries. W

e need a similar structure for more general hypothesis spaces.

To that end, note that the hypotheses induce a

partition of the query space into equivalence sets

A

A’

Geometric Condition for GBS Convergence

Classic Binary Search Revisited

0 0 0 0 0 0 0 0 1 1 1 1 1 1

-1 -1-1

-1-1

-1

unknown correct threshold at i*/n

1/2

1/2

Theorem 1 Proof Sketch:

Theorem 1 Proof Sketch:

‘good’situation:

‘bad’situation:

‘bad’situation:

x’

x+c

-c

-c +c

‘bad’situation:

Theorem 1 Proof Sketch:

‘good’situation:

Ex. Linear Thresholds in Two Dimensions

maximally inform

ative queries

Linear Thresholds in

KimiParker

Example

Suppose we have a sensor network observing a binary activation pattern with a

linear boundary. How many sensors must be queried to determ

ine the pattern?

number of hypotheses vs. queries

log number of hypotheses vs. queries

100 sensors, 9900 possible linear boundaries

Correct boundary determ

ined after querying 12 sensors

Conclusions

Minim

axBounds

selective sampling/querying can accelerate the learning of threshold functions

Generalized Binary Search

multidimensional threshold functions can be learned at the optimal rate

by selecting maximally discriminative queries

R. Castro and RN, “M

inimaxBounds for Active Learning,”

IEEE Trans. Info. Theory, pp. 2339–2353, 2008

RN, “Generalized Binary Search,”Proceedings of the Forty-

Sixth Annual AllertonConference on Communication, 2008

Detection/Estimation of Sparse Signals

How reliably can we determ

ine sparsitypatterns ?

Distilled Sensing

Detection/Estimation of Sparse Signals

fMRI

Astrophysics

Genomics

How reliably can we infer sparsepatterns ?

Sparse Signal Model

signal support set

Example:

In this talk we will assume .

Noisy Observation Model

Suppose we want to locate just onesignal component:

Because of noise, even if no signal is present

How small can µ

*be so that we can still reliably

locate the signal components from the observations?

Sparse Support Recovery

Adaptive control of therelative proportion of errors

(Benjamini& Hochberg ’95)

When testing a large number of hypotheses

simultaneously we are bound to m

ake errors

Approaches:

Control the probability of perfect localization of the support

(Bonferronicorrection) –very conservative

False Discovery Rate Control

Recall the definition of the signal support set

Goal:Estimate the support as well as possible. Let

be the outcome of a support estimation procedure.

# falsely discovered com

ponents

# discovered com

ponents

# missed com

ponents

# true components

False

Discovery

Proportion

Non D

iscovery

Proportion

Since nis typically very large it makes sense to

study asymptoticperform

ance, as n→∞

.

Known Results (Jin & Donoho’03)

Assume the signal is very sparse:

Example:β=3/4

n = 10000 ⇒

|Is|=10

n = 1000000 ⇒

|Is|=32

Number of signal

components

Known Results (Jin & Donoho’03)

Estimable

Signal

strength

Sparsity

Non-estimable

These asymptoticresults tell us how strong the

signals need to be for reliable signal localization

A Generalization of the Sensing Model

Allow m

ultiple observations…

…subject to a sampling energy budget

are called the sensing vectors.

(Note:in the previous work a single observation was

considered, where )

Distilled Sensing

Proceeding in this fashion, gradually focuson the signal subspace

Enhanced Sensitivity by Selectivity

Theorem 2

(J. Haupt, R. Castro and RN ‘08)

Furtherm

ore if one does not allow active sensing, then the previous

results (equivalent to k=0) cannot be improved.

original signal

(~0.8% non-zero components)

Noisy version of the

image (k=0)

Distilled sensing recovery

(FDR = 0.01)

Passive sensing recovery

(FDR = 0.01)

Noisy versions of the

image (k=5)

Distilled Sensing Example

Thresholds of Perceptibility

recoverypossible

from passive sensing

These results suggest we m

ight be able to estimate

signal with amplitudes growing slower than

Signal

strength

Sparsity

Universal Perceptibility

Theorem 3

(J. Haupt, R. Castro and RN ‘09)

Corollary

Proof Sketch (Theorem 2)

With high probability each distillation keeps almost all the non-zero

components and rejects about half of the non-signal components.

Energy in signal subspacedoublesat each step

Now you see it, now you don’t !

Weak signals/patterns are imperceptible without selective sensing !

sparse

signal

noise

J. Haupt, R. Castro and RN, “Distilled Sensing: Selective

Sampling for Sparse Signal Recovery,”AISTATS 2009