Optimizing Sensing from Water to the Web

60
Optimizing Sensing from Water to the Web Andreas Krause rsrg@caltech ..where theory and practice collide

description

Optimizing Sensing from Water to the Web. Andreas Krause. rsrg @caltech. ..where theory and practice collide. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A. Monitoring algal blooms. Algal blooms threaten freshwater - PowerPoint PPT Presentation

Transcript of Optimizing Sensing from Water to the Web

Page 1: Optimizing Sensing from Water to the Web

Optimizing Sensingfrom Water to the Web

Andreas Krause

[email protected] theory and practice collide

Page 2: Optimizing Sensing from Water to the Web

2

Monitoring algal blooms

Algal blooms threaten freshwater4 million people without water1300 factories shut down$14.5 billion to clean upOther occurrences in Australia, Japan, Canada, Brazil, Mexico, Great Britain, Portugal, Germany …

Growth processes still unclearNeed to characterize growth in the lakes, not in the lab!

Tai Lake China10/07 MSNBC

Page 3: Optimizing Sensing from Water to the Web

3

Can only make a limited number of measurements!

Dep

th

Location across lake

Monitoring rivers and lakes [IJCAI ‘07]Need to monitor large spatial phenomena

Temperature, nutrient distribution, fluorescence, …

Predict atunobserved

locations

NIMSKaiseret.al.

(UCLA)

Color indicates actual temperature Predicted temperatureUse robotic sensors tocover large areas

Where should we sense to get most accurate predictions?

Page 4: Optimizing Sensing from Water to the Web

4

Monitoring water networks[J Wat Res Mgt 2008]

Contamination of drinking watercould affect millions of people

Contamination

Place sensors to detect contaminations“Battle of the Water Sensor Networks” competition

Where should we place sensors to quickly detect contamination?

Sensors

Simulator from EPA Hach Sensor

~$14K

Page 5: Optimizing Sensing from Water to the Web

5

Sensing problemsWant to learn something about the state of the world

Estimate water quality in a geographic region, detect outbreaks, …

We can choose (partial) observations…Make measurements, place sensors, choose experimental parameters …

… but they are expensive / limitedhardware cost, power consumption, grad student time …

Want to cost-effectively get most useful information!Fundamental problem in Machine Learning:

How do we decide what to learn from?

Page 6: Optimizing Sensing from Water to the Web

6

Related workSensing problems considered in

Experimental design (Lindley ’56, Robbins ’52…), Spatial statistics

(Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …)

Existing algorithms typicallyHeuristics: No guarantees! Can do arbitrarily badly.Find optimal solutions (Mixed integer programming, POMDPs):

Very difficult to scale to bigger problems.

Page 7: Optimizing Sensing from Water to the Web

7

Contributions

Theoretical: Approximation algorithms that have theoretical guarantees and scale to large problems

Applied: Empirical studies with real deployments and large datasets

Page 8: Optimizing Sensing from Water to the Web

8

Model-based sensingUtility of placing sensors based on model of the world

For lake monitoring: Learn probabilistic models from data (later)For water networks: Water flow simulator from EPA

For each subset A µ V compute “sensing quality” F(A)

S2

S3

S4S1 S2

S3

S4

S1

High sensing quality F(A) = 0.9 Low sensing quality F(A)=0.01

Model predictsHigh impact

Medium impactlocation

Low impactlocation

Sensor reducesimpact throughearly detection!

S1

Contamination

Set V of all network junctions

Page 9: Optimizing Sensing from Water to the Web

9

Robust sensing Complex constraints

Sequential sensing

Optimizing sensing / OutlineSensing locationsSensing quality

Sensing budgetSensing cost

Sensor placement

Page 10: Optimizing Sensing from Water to the Web

10

Sensor placementGiven: finite set V of locations, sensing quality FWant: A*µ V such that

Typically NP-hard!

How well can this simple heuristic do?

S1

S2

S3

S4

S5

S6Greedy algorithm:Start with A = ;For i = 1 to k

s* := argmaxs F(A [ {s})

A := A [ {s*}

Page 11: Optimizing Sensing from Water to the Web

11

2 4 6 8 100.5

0.6

0.7

0.8

0.9

Number of sensors placed

Pop

ulat

ion

affe

cted

Performance of greedy algorithm

Greedy score empirically close to optimal. Why?

Small subset of Water networks

data

2 4 6 8 100.5

0.6

0.7

0.8

0.9

Number of sensors placed

Pop

ulat

ion

affe

cted

Greedy

Optimal

Pop

ula

tion

pro

tect

ed

(h

igh

er

is b

ett

er)

Number of sensors placed

Page 12: Optimizing Sensing from Water to the Web

12

S’

S2S3

S4S1

Key property: Diminishing returns

S2

S1

S’

Placement A = {S1, S2} Placement B = {S1, S2, S3, S4}

Adding S’ will help a lot!

Adding S’ doesn’t help muchNew

sensor S’B A

S’

+

+

Large improvement

Small improvement

For Aµ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)

Submodularity:

Theorem [Krause et al., J Wat Res Mgt ’08]:Sensing quality F(A) in water networks is submodular!

Page 13: Optimizing Sensing from Water to the Web

13

One reason submodularity is useful

Theorem [Nemhauser et al ‘78]Greedy algorithm gives constant factor approximation

F(Agreedy) ¸ (1-1/e) F(Aopt)

Greedy algorithm gives near-optimal solution!Guarantees best possible unless P = NP!

Many more reasons, sit back and relax…

~63%

Page 14: Optimizing Sensing from Water to the Web

14

People sit a lotActivity recognition inassistive technologiesSeating pressure as user interface

Equipped with 1 sensor per cm2!

Costs $16,000!

Can we get similar accuracy with fewer,

cheaper sensors?

Leanforward

SlouchLeanleft

82% accuracy on 10 postures! [Tan et al]

Building a Sensing Chair [UIST ‘07]

Page 15: Optimizing Sensing from Water to the Web

15

How to place sensors on a chair?Sensor readings at locations V as random variablesPredict posture Y using probabilistic model P(Y,V)Pick sensor locations A* µ V to minimize entropy:

Possible locations V

Theorem: Information gainis submodular!* [UAI’05]

*See store for details

Accuracy CostBefore 82% $16,000 After 79% $100

Placed sensors, did a user study:

Similar accuracy at <1% of the cost!

Page 16: Optimizing Sensing from Water to the Web

16

Battle of the Water Sensor Networks CompetitionReal metropolitan area network (12,527 nodes)Water flow simulator provided by EPA3.6 million contamination eventsMultiple objectives: Detection time, affected population, …Place sensors that detect well “on average”

Page 17: Optimizing Sensing from Water to the Web

17

BWSN Competition results [Ostfeld et al., J Wat Res Mgt 2008]

13 participantsPerformance measured in 30 different criteria

0

5

10

15

20

25

30

Tota

l S

core

Hig

her

is b

ett

er

Our

app

roac

h

Berry

et. a

l.

Dorin

i et.

al.

Wu

& Wal

ski

Ostfe

ld &

Sal

omon

s

Prop

ato

& Piller

Elia

des & P

olyc

arpo

u

Huang

et.

al.

Guan

et. a

l.

Ghim

ire &

Bar

kdol

l

Trac

htm

an

Gueli

Prei

s & O

stfe

ld

EE

D DG

GG

GG

H

H

H

G: Genetic algorithmH: Other heuristic

D: Domain knowledgeE: “Exact” method (MIP)

24% better performance than runner-up!

Page 18: Optimizing Sensing from Water to the Web

18

Simulated all on 2 weeks / 40 processors152 GB data on disk Very accurate sensing quality

Using “lazy evaluations”:1 hour/20 sensorsDone after 2 days!

, 16 GB in main memory (compressed)

Advantage through theory and engineering!

Low

er

is b

ett

er 30 hours/20 sensors

6 weeks for all30 settings

3.6M contaminations

Very slow evaluation of F(A)

1 2 3 4 5 6 7 8 9 100

100

200

300

Number of sensors selected

Runn

ing

time

(min

utes

)

Exhaustive search(All subsets)

Naivegreedy

Fast greedy ubmodularity to the rescue:

What was the trick?

Page 19: Optimizing Sensing from Water to the Web

19

What about worst-case?

S2

S3

S4S1

Knowing the sensor locations, an adversary contaminates here!

Where should we place sensors to quickly detect in the worst case?

Very different average-case impact,Same worst-case impact

S2

S3

S4

S1

Placement detects well on “average-case”

(accidental) contamination

Page 20: Optimizing Sensing from Water to the Web

20

Optimizing for the worst case

Contamination at node s Sensors AFs(A) is high

Contamination at node rFr(A) is low

Fs(B) is low

Fr(B) is high

Sensors B

Fr(C) is high

Fs(C) is highSensors C

Separate utility function Fi with each contamination i

Fi(A) = impact reduction by sensors A for contamination i

Want to solve

Each of the Fi is submodular

Unfortunately, mini Fi not submodular!

How can we solve this robust sensing problem?

Page 21: Optimizing Sensing from Water to the Web

21

Robust sensing

Outline

Sensor placement

Sequential sensing

Complex constraints

Page 22: Optimizing Sensing from Water to the Web

22

How does the greedy algorithm do?

Theorem [NIPS ’07]: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP

Optimalsolution

Greedy picks first

Then, canchoose only

or

Greedy does arbitrarily badly. Is there something better?

V={ , , }Can only buy k=2

Greedy score: Optimal score: 1

Set A F1 F2 mini Fi

1 0 00 2 0

1

2

1 2 1

Hence we can’t find any approximation algorithm.

Or can we?

Page 23: Optimizing Sensing from Water to the Web

23

Alternative formulationIf somebody told us the optimal value,

can we recover the optimal solution A*?

Need to find

Is this any easier?

Yes, if we relax the constraint |A| · k

Page 24: Optimizing Sensing from Water to the Web

24

Solving the alternative problemTrick: For each Fi and c, define truncation

c

|A|

Fi(A)

F’i,c(A)

Same optimal solutions!Solving one solves the other

Non-submodular Don’t know how to solve

Submodular!Can use greedy!

Problem 1 (last slide) Problem 2

Page 25: Optimizing Sensing from Water to the Web

25

Back to our example

Guess c=1First pick Then pick

Optimal solution!

How do we find c?Do binary search!

Set A F1 F2 mini Fi F’avg,1

1 0 0 ½0 2 0 ½

1 (1+)/2

2 (1+)/2

1 2 1 1

Page 26: Optimizing Sensing from Water to the Web

26

Truncationthreshold(color)

Saturate Algorithm [NIPS ‘07]Given: set V, integer k and submodular functions F1,…,Fm

Initialize cmin=0, cmax = mini Fi(V)

Do binary search: c = (cmin+cmax)/2Greedily find AG such that F’avg,c(AG) = c

If |AG| · k: increase cmin

If |AG| > k: decrease cmax

until convergence

Page 27: Optimizing Sensing from Water to the Web

27

Theoretical guarantees

Theorem: If there were a polytime algorithm with better factor < , then NP µ DTIME(nlog log n)

Theorem: Saturate finds a solution AS such that

mini Fi(AS) ¸ OPTk and |AS| · k

where OPTk = max|A|·k mini Fi(A)

= 1 + log maxs i Fi({s})

Theorem: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP

Page 28: Optimizing Sensing from Water to the Web

28

Example: Lake monitoringMonitor pH values using robotic sensor

Position s along transect

pH v

alue

Observations A

True (hidden) pH values

Prediction at unobservedlocations

transect

Where should we sense to minimize our maximum error?

Use probabilistic model(Gaussian processes)

to estimate prediction error

(often) submodular[Das & Kempe ’08]

Var(s | A)

Robust sensing problem!

Page 29: Optimizing Sensing from Water to the Web

29

Comparison with state of the artAlgorithm used in geostatistics: Simulated Annealing

[Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…]7 parameters that need to be fine-tuned

Environmental monitoring

bett

er

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

SimulatedAnnealing

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

SimulatedAnnealing

Saturate

Precipitation data

0 20 40 60 80 1000.5

1

1.5

2

2.5

Number of sensors

Maxim

um

marg

inal vari

ance

Greedy

Saturate

SimulatedAnnealing

Saturate is competitive & 10x fasterNo parameters to tune!

Page 30: Optimizing Sensing from Water to the Web

30

Saturate

Results on water networks

60% lower worst-case detection time!

Water networks

500

1000

1500

2000

2500

3000

Number of sensors

Max

imum

det

ectio

n tim

e (m

inut

es)

Low

er

is b

ett

er

No decreaseuntil allcontaminationsdetected!

0 10 200

Greedy

SimulatedAnnealing

Page 31: Optimizing Sensing from Water to the Web

31

Outline

Sensor placement

Sequential sensing

Complex constraintsRobust sensing

Page 32: Optimizing Sensing from Water to the Web

32

Other aspects: Complex constraintsmaxA F(A) or maxA mini Fi(A) subject to

So far: |A| · kIn practice, more complex constraints:

Locations need to be connected by paths

[IJCAI ’07, AAAI ’07]

Lake monitoring

Sensors need to communicate (form a routing tree)

Building monitoring

Page 33: Optimizing Sensing from Water to the Web

33

Naïve approach: Greedy-connectSimple heuristic: Greedily optimize sensing quality F(A)Then add nodes to minimize communication cost C(A)

Want to find optimal tradeoffbetween information and communication cost

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

No communicationpossible!C(A) = 1

relay node

relay node

C(A) = 10

2 2

2

2 2Second

most informative

Communication cost = Expected # of trials(learned using Gaussian Processes)

F(A) = 4C(A) =

10SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

relay node

relay node

2 2

2

2 2

F(A) = 4

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

1

11

F(A) = 0.2

C(A) = 3

efficientcommunication!

Not veryinformative

Most informative

Very informative,High communication

cost!

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

1.5

1

F(A) = 3.5

C(A) = 3.5

1

Page 34: Optimizing Sensing from Water to the Web

34

Theorem: pSPIEL finds a placement A with sensing quality F(A) ¸ (1) OPTF

communication cost C(A) · O(log |V|) OPTC

44% lower error and 19% lower communication costthan placement by domain expert!

Real deployment at CMUArchitecture department

The pSPIEL Algorithm [IPSN ’06 – Best Paper Award] padded Sensor Placements at Informative and cost-Effective Locations

pSPIEL:

Page 35: Optimizing Sensing from Water to the Web

35

Outline

Sensor placement

Sequential sensing

Complex constraintsRobust sensing

Page 36: Optimizing Sensing from Water to the Web

36

Other aspects: Sequential sensingYou walk up to a lake, but don’t have a modelWant sensing locations to learn the model (explore) and predict well (exploit)Chosen locations depend on measurements![ICML ’07] Exploration—Exploitation analysis

for Active Learning in Gaussian Processes

Page 37: Optimizing Sensing from Water to the Web

37

Summary so far

Submodularity in sensing optimization

Greedy is near-optimal[UAI ’05, JMLR ’07, KDD ’07]

Robust sensingGreedy fails badly

Saturate is near-optimal[NIPS ’07]

Path planningCommunication constraintsConstrained submodular optimization

pSPIEL gives strong guarantees[IJCAI ’07, IPSN ’06, AAAI ‘07]

Sequential sensingExploration Exploitation Analysis [ICML ’07, IJCAI ‘05]

All these applications involve physical sensing.

Now for something completely different.Let’s jump from water…

Page 38: Optimizing Sensing from Water to the Web

38

… to the Web!

You have 10 minutes each day for reading blogs / news.

Which of the million blogs should you read?

Page 39: Optimizing Sensing from Water to the Web

39

Tim

e

Information cascade

Cascades in the Blogosphere[KDD 07 – Best Paper Award]

Which blogs should we read to learn about big cascades early?

Learn aboutstory after us!

Page 40: Optimizing Sensing from Water to the Web

40

Water vs. Web

In both problems we are givenGraph with nodes (junctions / blogs) and edges (pipes / links)Cascades spreading dynamically over the graph (contamination / citations)

Want to pick nodes to detect big cascades early

Placing sensors inwater networks

Selectinginformative blogsvs.

In both applications, utility functions submodular [Generalizes Kempe et al, KDD ’03]

Page 41: Optimizing Sensing from Water to the Web

41

Performance on Blog selection

Outperforms state-of-the-art heuristics700x speedup using submodularity!

Blog selection

Low

er

is b

ett

er

1 2 3 4 5 6 7 8 9 100

100

200

300

400

Number of blogs selected

Runn

ing

time

(sec

onds

)

Exhaustive search(All subsets)

Naivegreedy

Fast greedy

Blog selection~45k blogs

Hig

her

is b

ett

er

Number of blogs

Casc

ades

cap

ture

d

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Greedy

In-linksAll outlinks

# Posts

Random

Page 42: Optimizing Sensing from Water to the Web

42

Naïve approach: Just pick 10 best blogsSelects big, well known blogs (Instapundit, etc.)These contain many posts, take long to read!

0 1 2 3 4 5

x 104

0

0.2

0.4

0.6

Taking “attention” into account

0 1 2 3 4 5

x 104

0

0.2

0.4

0.6

Casc

ades

cap

ture

d

Number of posts (time) allowed

x 104

Cost/benefitanalysis

Ignoring cost

Cost-benefit optimization picks summarizer blogs!

Page 43: Optimizing Sensing from Water to the Web

43

Predicting the “hot” blogs

Jan Feb Mar Apr May0

200

#det

ectio

ns

Greedy

Jan Feb Mar Apr May0

200#d

etec

tions

Saturate

Detects on training set

Greedy on historicTest on future

Poor generalization!Why’s that?

0 1000 2000 3000 40000

0.05

0.1

0.15

0.2

0.25Greedy on futureTest on future

“Cheating”

Casc

ades

cap

ture

d

Number of posts (time) allowed

Detect wellhere!

Detect poorlyhere!

Want blogs that will be informative in the futureSplit data set; train on historic, test on future

Blog selection “overfits”to training data!

Let’s see whatgoes wrong here.

Want blogs thatcontinue to do well!

Page 44: Optimizing Sensing from Water to the Web

44

Robust optimization

Jan Feb Mar Apr May0

200

#det

ectio

ns

Greedy

Jan Feb Mar Apr May0

200

#det

ectio

ns

Saturate

Jan Feb Mar Apr May0

200#d

etec

tions

Greedy

Jan Feb Mar Apr May0

200

#det

ectio

ns

Saturate

Detections using Saturate

F1(A)=.5

F2 (A)=.8

F3 (A)=.6

F4(A)=.01

F5 (A)=.02

Optimizeworst-case

Fi(A) = detections in interval i

“Overfit” blog selection A

“Robust” blog selection A*

Robust optimization Regularization!

Page 45: Optimizing Sensing from Water to the Web

45

Predicting the “hot” blogs

Greedy on historicTest on future

Robust solutionTest on future

0 1000 2000 3000 40000

0.05

0.1

0.15

0.2

0.25Greedy on futureTest on future

“Cheating”Se

nsin

g qu

ality

Number of posts (time) allowed

50% better generalization!

Page 46: Optimizing Sensing from Water to the Web

46

Future work

Submodularity in sensing optimization

Greedy is near-optimal[UAI ’05, JMLR ’07, KDD ’07]

Robust sensingGreedy fails badly

Saturate is near-optimal[NIPS ’07]

Path planningCommunication constraintsConstrained submodular optimization

pSPIEL gives strong guarantees[IJCAI ’07, IPSN ’06]

Sequential sensingExploration Exploitation Analysis [ICML ’07]

Thesis in a box

Constrained optimization better use of “attention”

Robust optimization better generalization

Page 47: Optimizing Sensing from Water to the Web

47

Future work

Thesis in a box

Algorithmic and statistical machine learning

UAI ‘05ICML ‘06 Sensys ‘05 ISWC ‘06

Trans MobComp ‘06

Fundamental questions in Information Acquisition

and Interpretation

Machine learningand sensing in

natural sciencesand engineering

Machine learning and optimization

on the web

Page 48: Optimizing Sensing from Water to the Web

48

Structure in ML / AI problems

Structural insights help us solve challenging problemsShameless plug: Tutorial at ICML 2008

ML last 10 years:

Convexity

Kernel machines SVMs, GPs, MLE…

ML “next 10 years:”

Submodularity

New structural properties

Page 49: Optimizing Sensing from Water to the Web

49

Future work

Fundamental questions in Information Acquisition

and Interpretation

Machine learningand sensing in

natural sciencesand engineering

Machine learning and optimization

on the web

Thesis in a box

Algorithmic and statistical machine learning

Page 50: Optimizing Sensing from Water to the Web

50

Fundamental bottlenecks in information acquisition

WebUser

Attention

Privacy

Preliminary resultsUtility/Privacy in Personalized search. Submodularity is key!

Planned study: data from 2 million blogs from spinn3r.com

extremely scarce,non-renewable!

Page 51: Optimizing Sensing from Water to the Web

51

www.blogcascades.org

5 months, 2 million blogs, 55 million posts

Page 52: Optimizing Sensing from Water to the Web

52

Future work

Fundamental questions in Information Acquisition

and Interpretation

Machine learningand sensing in

natural sciencesand engineering

Machine learning and optimization

on the web

Thesis in a box

Algorithmic and statistical machine learning

Page 53: Optimizing Sensing from Water to the Web

53

Machine learning for natural sciencesLarge, distributed research efforts

Multiple research teamsMultiple sensing modalities

Need to integrate experimental resultsReason about noisy measurements + scientific hypothesesKey problem in AI / ML:

Integrate noisy sensor data and symbolic, domain knowledge

New approaches toward autonomous science

[Oßwald et al, Lasser et al]

[MacIntyre et al, Singh et al]

Page 54: Optimizing Sensing from Water to the Web

54

Future work

Fundamental questions in Information Acquisition

and Interpretation

Machine learningand sensing in

natural sciencesand engineering

Machine learning and optimization

on the web

Thesis in a box

Many opportunities for collaboration!

Algorithmic and statistical machine learning

Page 55: Optimizing Sensing from Water to the Web

55

ConclusionsSensing and information acquisition problems are important and ubiquitousCan exploit structure to find provably good solutionsPresented algorithms with strong guaranteesPerform well on real world problems

Thanks to: Carlos Guestrin, Anupam Gupta, Jon Kleinberg, Eric Horvitz,

Jure Leskovec, Ajit Singh, Amarjeet Singh, Vipul Singhvi, William Kaiser, Jeanne VanBriesen, Christos Faloutsos, Brendan McMahan, Natalie Glance

Page 56: Optimizing Sensing from Water to the Web

56

Page 57: Optimizing Sensing from Water to the Web

57

Worst- vs. average caseGiven: Possible locations V, submodular functions F1,…,Fm

Average-case score Worst-case score

Strong assumptions! Very pessimistic!

Want to optimize both average- and worst-case score!

Can modify Saturate to solve this problem!

Page 58: Optimizing Sensing from Water to the Web

58

Worst- vs. average case

0 50 100 150 200 250 300 3500

1000

2000

3000

4000

5000

6000

7000

Knee intradeoff curve

Only optimize foraverage case

Only optimize for worst case

Tradeoffs(Saturate)

Wors

t case

im

pact

lo

wer

is b

ett

er

Average case impactlower is better

Waternetworks

data

Can find good compromise between average- and worst-case score!

Page 59: Optimizing Sensing from Water to the Web

59

Other aspects: Sequential sensing

expected utility over outcome of observations

X5=?

X3 =? X2 =?

X7 =?

F(X5=17, X3=16, X7=19) = 3.4

X5=17X5=21

X3 =16

X7 =19 X12=? X23 =?

F(…) = 2.1 F(…) = 2.4

Sensingpolicy

F() = 3.1

Page 60: Optimizing Sensing from Water to the Web

60

Questions we ask1) When can we find the optimal sequential policy? [IJCAI ’05]

2) How much better is sequential selection when compared to the best subset? [ICML ’07]

X1 X2 X3

X1 X2

X3

X4 X5

For chains (HMMs, etc.)Best policy in polytime

For trees:NPPP hard

Best policy Best set

For Gaussian processes: Have bounds on GAP that lead to efficient algorithms