Carnegie Mellon AI, Sensing, and Optimized Information Gathering: Trends and Directions Carlos...

Carnegie Mellon

AI, Sensing, and Optimized Information Gathering:

Trends and Directions

Carlos Guestrin

joint work with:

and:Anupam Gupta, Jon Kleinberg,

Brendan McMahan, Ajit Singh, and others…

Monitoring algal blooms

Algal blooms threaten freshwater4 million people without water1300 factories shut down$14.5 billion to clean upOther occurrences in Australia, Japan, Canada, Brazil, Mexico, Great Britain, Portugal, Germany …

Growth processes still unclear [Carmichael]Need to characterize growth in the lakes, not in

the lab!

Tai Lake China10/07 MSNBC

Can only make a limited number of measurements!

Dep

th

Location across lake

Monitoring rivers and lakesNeed to monitor large spatial phenomena

Temperature, nutrient distribution, fluorescence, …

Predict atunobserved

locations

NIMSKaiseret.al.

(UCLA)

Color indicates actual temperature Predicted temperature

Use robotic sensors tocover large areas

Where should we sense to get most accurate predictions?

[Singh, Krause, G., Kaiser ‘07]

Water distribution networks

Simulator from EPA

Water distribution in a city very complex systemPathogens in water can affect thousands (or millions) of peopleCurrently: Add chlorine to the source and hope for the best

Chlorine

ATTACK!could deliberately

introduce pathogen

Monitoring water networks[Krause, Leskovec, G., Faloutsos, VanBriesen ‘08]

Contamination of drinking watercould affect millions of people

Place sensors to detect contaminations“Battle of the Water Sensor Networks” competitionWhere should we place sensors

to detect contaminations quickly ?

Sensors

Simulator from EPA Hach Sensor~$14K

Sensing problemsWant to learn something about the state of the world

Detect outbreaks, predict algal blooms …

We can choose (partial) observations…Place sensors, make measurements, …

… but they are expensive / limitedhardware cost, power consumption, measurement time …

Want cost-effectively get most useful information!

Fundamental problem:What information should I use to learn

?

Related work

Sensing problems considered inExperimental design (Lindley ’56, Robbins ’52…), Spatial

statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …)

Existing algorithms typicallyHeuristics: No guarantees! Can do arbitrarily badly.Find optimal solutions (Mixed integer programming, POMDPs):Very difficult to scale to bigger problems.

This talk

Theoretical: Approximation algorithms that have theoretical guarantees and scale to large problems

Applied: Empirical studies with real deployments and large datasets

Model-based sensingModel predicts impact of contaminations

For water networks: Water flow simulator from EPAFor lake monitoring: Learn probabilistic models from data (later)

For each subset AÍV compute “sensing quality” F(A)

S2

S3

S4S1 S2

S3

S4

S1

High sensing quality F(A) = 0.9Low sensing quality F(A)=0.01

Model predictsHigh impact

Medium impactlocation

Lowimpactlocation

Sensor reducesimpact throughearly detection!

S1

Contamination

Set V of all network junctions

Robust sensing Complex constraints

Sequential sensing

Optimizing sensing / Outline

Sensing locationsSensing quality

Sensing budgetSensing cost

Sensor placement

Sensor placementGiven: finite set V of locations, sensing quality FWant: A*Í V such that

Typically NP-hard!

How well can this simple heuristic do?

S1

S2

S3

S4

S5

S6Greedy algorithm:

Start with A =Ø ;For i = 1 to k

s* := argmaxs F(A {s})A := A {s*}

2 4 6 8 100.5

0.6

0.7

0.8

0.9

Number of sensors placed

Pop

ulat

ion

affe

cted

Performance of greedy algorithm

Greedy score empirically close to optimal. Why?

Small subset of Water networks

data

2 4 6 8 100.5

0.6

0.7

0.8

0.9


Pop

ulat

ion

affe

cted

Greedy

Optimal

Pop

ula

tion

pro

tecte

d

(hig

her

is b

ett

er)


S2S3

S4S1

Key property: Diminishing returns

S2

S1

S’

Placement A = {S1, S2} Placement B = {S1, S2, S3, S4}

Adding S’ will help a lot!

Adding S’ doesn’t help muchNew

sensor S’

B . . . . . A S’

S’

+

+

Large improvement

Small improvement

For AÍB, F(A {S’}) – F(A) ≥ F(B {S’}) – F(B)

Submodularity:

Theorem [Krause, Leskovec, G., Faloutsos, VanBriesen ’08]:Sensing quality F(A) in water networks is

submodular!

One reason submodularity is useful

Theorem[Nemhauser et al ‘78]

Greedy algorithm gives constant factor approximation

F(Agreedy)≥ (1-1/e) F(Aopt)

Greedy algorithm gives near-optimal solution!Guarantees best possible unless P = NP!

Many more reasons, sit back and relax…

~63%

People sit a lotActivity recognition inassistive technologiesSeating pressure as user interface Equipped with

1 sensor per cm2!

Costs $16,000!

Can we get similar accuracy with fewer,

cheaper sensors?

Leanforward

SlouchLeanleft

82% accuracy on 10 postures! [Zhu et al]

Building a Sensing Chair [Mutlu, Krause, Forlizzi, G., Hodgins ‘07]

How to place sensors on a chair?

Sensor readings at locations V as random variablesPredict posture Y using probabilistic model P(Y,V)Pick sensor locations A*ÍV to minimize entropy:

Possible locations V

Theorem: Information gainis submodular!*[UAI’05]

*See store for details

Accuracy CostBefore 82% $16,000 After 79% $100

Placed sensors, did a user study:

Similar accuracy at <1% of cost!

Battle of the Water Sensor Networks Competition

Real metropolitan area network (12,527 nodes)Water flow simulator provided by EPA3.6 million contamination eventsMultiple objectives: Detection time, affected population, …Place sensors that detect well “on average”

BWSN Competition results

13 participantsPerformance measured in 30 different criteria

0

5

10

15

20

25

30

Tota

l S

core

Hig

her

is b

ett

er

Our

app

roac

h

Berry

et. a

l.

Dor

ini e

t. a

l.

Wu

& W

alsk

i

Ost

feld

& S

alom

ons

Prop

ato

& P

iller

Elia

des & P

olyc

arpo

u

Hua

ng e

t. a

l.

Gua

n et

. al.

Ghi

mire

& B

arkd

oll

Trac

htm

an

Gue

li

Prei

s & O

stfe

ld

E

E

D DG

GG

GG

H

H

H

G: Genetic algorithm

H: Other heuristic

D: Domain knowledge

E: “Exact” method (MIP)

24% better performance than runner-up!

Simulated all on 2 weeks / 40 processors152 GB data on disk Very accurate sensing quality

, 16 GB in main memory (compressed)

Using “lazy evaluations”:1 hour/20 sensorsDone after 2 days! Advantage through theory and

engineering!

Low

er

is b

ett

er 30 hours/20 sensors

6 weeks for all30 settings

3.6M contaminations

Very slow evaluation of F(A)

1 2 3 4 5 6 7 8 9 100

100

200

300

Number of sensors selected

Runn

ing

time

(min

utes

)

Exhaustive search(All subsets)

NaiveGreedy

ubmodularity to the rescue:

What was the trick?

Fast Greedy

Robustness against adversaries

Unified viewRobustness to change in parameters Robust experimental designRobustness to adversaries

SATURATE: A simple, but very effective algorithm for robust sensor placement

If sensor locations are known, attack vulnerable locations

[Krause, McMahan, G., Gupta ‘07]

What about worst-case?

S2

S3

S4S1

Knowing the sensor locations, an adversary contaminates here!

Where should we place sensors to quickly detect in the worst case?

Very different average-case score,Same worst-case score

S2

S3

S4

S1

Placement detects well on “average-case”

(accidental) contamination

Optimizing for the worst case

Contamination at node s Sensors A

Fs(A) is high

Contamination at node r

Fr(A) is low

Fs(B) is low

Fr(B) is high

Sensors B

Fr(C) is high

Fs(C) is high

Sensors C

Separate utility function Fi with each contamination i

Fi(A) = impact reduction by sensors A for contamination i

Want to solve

Each of the Fi is submodular

Unfortunately, mini Fi

not submodular!How can we solve this robust sensing problem?

How does the greedy algorithm do?

Theorem [NIPS ’07]: The problem max|A|≤ k mini Fi(A) does not admit any approximation unless P=NP

Optimalsolution

Greedy picks first

Then, canchoose only

or

Greedy does arbitrarily badly. Is there something better?

V={ , , }

Can only buy k=2

Greedy score:Optimal score: 1

Set A F1 F2 mini Fi

1 0 00 2 0

1 2

1 2 1

Hence we can’t find any approximation algorithm.

Or can we?

Alternative formulation

If somebody told us the optimal value,

can we recover the optimal solution A*?

Need to find

Is this any easier?

Yes, if we relax the constraint |A| ≤ k

Solving the alternative problem

Trick: For each Fi and c, define truncation

c

|A|

Fi(A)

F’i,c(A)

Same optimal solutions!Solving one solves the other

Non-submodular Don’t know how to solve

Submodular!Can use greedy!

Problem 1 (last slide) Problem 2

Back to our example

Guess c=1First pick Then pick

Optimal solution!

How do we find c?Do binary search!

Set A F1 F2 mini Fi F’avg,1

1 0 0 ½0 2 0 ½ 1 (1+)/2

2 (1+)/2

1 2 1 1

Saturate Algorithm [NIPS ‘07]Given: set V, integer k and submodular functions F1,…,Fm

Initialize cmin=0, cmax = mini Fi(V)

Do binary search: c = (cmin+cmax)/2

Greedily find AG such that F’avg,c(AG) = c

If |AG| ≤ k: increase cmin

If |AG| > k: decrease cmax

until convergence

Truncationthreshold(color)

Theoretical guarantees

Theorem:If there were polytime algorithm with better factor <, then NP DTIME(nlog log n)

Theorem: Saturate finds a solution AS such that

mini Fi(AS) ≥ OPTk and |AS| ≤ k

where OPTk = max|A|≤k mini Fi(A)

= 1 + log maxsi Fi({s})

Theorem: The problem max|A|≤ k mini Fi(A) does not admit any approximation unless P=NP

Example: Lake monitoring

Monitor pH values using robotic sensor

Position s along transect

pH

valu

e

True (hidden) pH values

Prediction at unobservedlocations

transect

Where should we sense to minimize our maximum error?

Use probabilistic model(Gaussian processes)

to estimate prediction error

(often) submodular[Das & Kempe ’08]

Var(s | A)

Robust sensing problem!

Observations A

Comparison with state of the art

Algorithm used in geostatistics: Simulated Annealing[Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,

…]7 parameters that need to be fine-tuned

Environmental monitoring

bett

er

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

SimulatedAnnealing

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

SimulatedAnnealing

Saturate

Precipitation data

0 20 40 60 80 1000.5

1

1.5

2

2.5

Number of sensors

Max

imu

m m

arg

inal

var

ian

ce

Greedy

Saturate

SimulatedAnnealing

Saturate is competitive & 10x fasterNo parameters to tune!

Saturate

Results on water networks

60% lower worst-case detection time!

Water networks

500

1000

1500

2000

2500

3000

Number of sensors

Maxim

um

dete

cti

on

tim

e (

min

ute

s)

Low

er

is b

ett

er

No decreaseuntil allcontaminationsdetected!

0 10 200

Greedy

SimulatedAnnealing

Summary so far

Submodularity in sensing optimization

Greedy is near-optimal

Robust sensingGreedy fails badly

Saturate is near-optimal

Path planningCommunication constraints

Constrained submodular optimizationpSPIEL gives strong guarantees

Sequential sensingExploration Exploitation Analysis

All these applications involve physical sensing

Now for something completely different

Let’s jump from water…

… to the Web!

You have 10 minutes each day for reading blogs / news.

Which of the million blogs should you read?

Tim

e

Information cascade

Which blogs should we read to learn about big cascades early?

Learn aboutstory after us!

Information Cascades[Leskovec, Krause, G., Faloutsos, VanBriesen ‘07]

Water vs. Web

In both problems we are givenGraph with nodes (junctions / blogs) and edges (pipes / links)Cascades spreading dynamically over the graph (contamination / citations)

Want to pick nodes to detect big cascades early

Placing sensors inwater networks

Selectinginformative blogsvs.

In both applications, utility functions submodular

Performance on Blog selection

Outperforms state-of-the-art heuristics700x speedup using submodularity!

Blog selection

Low

er

is b

ett

er

1 2 3 4 5 6 7 8 9 10

0

100

200

300

400

Number of blogs selected

Ru

nn

ing

tim

e (

secon

ds)

Exhaustive search(All subsets)

Naivegreedy

Fast greedy

Blog selection~45k blogs

Hig

her

is b

ett

er

Number of blogs

Cascad

es c

ap

ture

d

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7Greedy

In-linksAll outlinks

# Posts

Random

Naïve approach: Just pick 10 best blogsSelects big, well known blogs (Instapundit, etc.)These contain many posts, take long to read!

Taking “attention” into account

Casc

ades

cap

ture

d

Number of posts (time) allowed

x 104

Cost/benefitanalysis

Ignoring cost

Cost-benefit optimization picks summarizer blogs!

Predicting the “hot” blogs

Jan Feb Mar Apr May0

200

#detectio

ns

Greedy


200

#detectio

ns

Saturate

Detects on training set

Greedy on historicTest on future

Poor generalization!Why’s that?

0 1000 2000 3000 40000

0.05

0.1

0.15

0.2

0.25 Greedy on futureTest on future

“Cheating”

Cascad

es c

ap

ture

d


Detect wellhere!

Detect poorlyhere!

Want blogs that will be informative in the futureSplit data set; train on historic, test on future

Blog selection “overfits”to training data!

Let’s see whatgoes wrong here.

Want blogs thatcontinue to do well!

Robust optimization


200

#detectio

ns

Greedy


200

#detectio

ns

Saturate


200

#detectio

ns

Greedy


200

#detectio

ns

Saturate

Detections using Saturate

F1(A)=.5

F2 (A)=.8

F3 (A)=.6

F4(A)=.01

F5 (A)=.02

Optimizeworst-case

Fi(A) = detections in interval i

“Overfit” blog selection A

“Robust” blog selection A*

Robust optimization Regularization!

Predicting the “hot” blogs

Greedy on historicTest on future

Robust solutionTest on future

0 1000 2000 3000 40000

0.05

0.1

0.15

0.2

0.25 Greedy on futureTest on future

“Cheating”S

en

sin

g q

uality


50% better generalization!

Summary

Submodularity in sensing optimization

Greedy is near-optimal

Robust sensingGreedy fails badly

Saturate is near-optimal

Path planningCommunication constraints

Constrained submodular optimizationpSPIEL gives strong guarantees

Sequential sensingExploration Exploitation Analysis

Constrained optimization better use of

“attention”

Robust optimization better generalization

AI-complete dreamRobot that saves the world

Robot that cleans your room

But…It’s definitely useful, but…

Really narrow

Hardware is a real issueWill take a while

What’s an “AI-complete” problem that will be useful to a huge number of people in the next 5-10 years?What’s a problem accessible to a large part of AI community?

What makes a good AI-complete problem?

A complete AI-system:Sensing: gathering information from the worldReasoning: making high-level conclusions from informationActing: making decisions that affect the dynamics of the world and/or the interaction with the user

But alsoHugely complexCan get access to real dataCan scale up and layer upCan make progressVery cool and exciting

Data gathering can lead to good, accessible and cool AI-complete

problems

Factcheck.orgTake a statement

Collect information from multiple sources

Evaluate quality of sources

Connect them

Make a conclusion AND provide an analysis

Automated fact checkingQuery

Fact or Fiction

?Conclusion

and Justification

Active user feedback on sources and proof

Web

Models

InferenceCan lead to very cool “AI-

complete” problem, useful, and can make progress in short

term!

ConclusionsSensing and information acquisition problems are important and ubiquitousCan exploit structure to find provably good solutionsObtain algorithms with strong guaranteesPerform well on real world problems

Could help focus on a cool “AI-complete”

problem

Carnegie Mellon AI, Sensing, and Optimized Information Gathering: Trends and Directions Carlos...

Documents

Transcript of Carnegie Mellon AI, Sensing, and Optimized Information Gathering: Trends and Directions Carlos...