Optimizing Sensing from Water to the Web

Optimizing Sensingfrom Water to the Web

Andreas Krause

[email protected] theory and practice collide

2

Monitoring algal blooms

Algal blooms threaten freshwater4 million people without water1300 factories shut down$14.5 billion to clean upOther occurrences in Australia, Japan, Canada, Brazil, Mexico, Great Britain, Portugal, Germany …

Growth processes still unclearNeed to characterize growth in the lakes, not in the lab!

Tai Lake China10/07 MSNBC

3

Can only make a limited number of measurements!

Dep

th

Location across lake

Monitoring rivers and lakes [IJCAI ‘07]Need to monitor large spatial phenomena

Temperature, nutrient distribution, fluorescence, …

Predict atunobserved

locations

NIMSKaiseret.al.

(UCLA)

Color indicates actual temperature Predicted temperatureUse robotic sensors tocover large areas

Where should we sense to get most accurate predictions?

4

Monitoring water networks[J Wat Res Mgt 2008]

Contamination of drinking watercould affect millions of people

Contamination

Place sensors to detect contaminations“Battle of the Water Sensor Networks” competition

Where should we place sensors to quickly detect contamination?

Sensors

Simulator from EPA Hach Sensor

~$14K

5

Sensing problemsWant to learn something about the state of the world

Estimate water quality in a geographic region, detect outbreaks, …

We can choose (partial) observations…Make measurements, place sensors, choose experimental parameters …

… but they are expensive / limitedhardware cost, power consumption, grad student time …

Want to cost-effectively get most useful information!Fundamental problem in Machine Learning:

How do we decide what to learn from?

6

Related workSensing problems considered in

Experimental design (Lindley ’56, Robbins ’52…), Spatial statistics

(Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …)

Existing algorithms typicallyHeuristics: No guarantees! Can do arbitrarily badly.Find optimal solutions (Mixed integer programming, POMDPs):

Very difficult to scale to bigger problems.

7

Contributions

Theoretical: Approximation algorithms that have theoretical guarantees and scale to large problems

Applied: Empirical studies with real deployments and large datasets

8

Model-based sensingUtility of placing sensors based on model of the world

For lake monitoring: Learn probabilistic models from data (later)For water networks: Water flow simulator from EPA

For each subset A µ V compute “sensing quality” F(A)

S2

S3

S4S1 S2

S3

S4

S1

High sensing quality F(A) = 0.9 Low sensing quality F(A)=0.01

Model predictsHigh impact

Medium impactlocation

Low impactlocation

Sensor reducesimpact throughearly detection!

S1

Contamination

Set V of all network junctions

9

Robust sensing Complex constraints

Sequential sensing

Optimizing sensing / OutlineSensing locationsSensing quality

Sensing budgetSensing cost

Sensor placement

10

Sensor placementGiven: finite set V of locations, sensing quality FWant: A*µ V such that

Typically NP-hard!

How well can this simple heuristic do?

S1

S2

S3

S4

S5

S6Greedy algorithm:Start with A = ;For i = 1 to k

s* := argmaxs F(A [ {s})

A := A [ {s*}

11

2 4 6 8 100.5

0.6

0.7

0.8

0.9

Number of sensors placed

Pop

ulat

ion

affe

cted

Performance of greedy algorithm

Greedy score empirically close to optimal. Why?

Small subset of Water networks

data

2 4 6 8 100.5

0.6

0.7

0.8

0.9


Pop

ulat

ion

affe

cted

Greedy

Optimal

Pop

ula

tion

pro

tect

ed

(h

igh

er

is b

ett

er)


12

S’

S2S3

S4S1

Key property: Diminishing returns

S2

S1

S’

Placement A = {S1, S2} Placement B = {S1, S2, S3, S4}

Adding S’ will help a lot!

Adding S’ doesn’t help muchNew

sensor S’B A

S’

+

+

Large improvement

Small improvement

For Aµ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)

Submodularity:

Theorem [Krause et al., J Wat Res Mgt ’08]:Sensing quality F(A) in water networks is submodular!

13

One reason submodularity is useful

Theorem [Nemhauser et al ‘78]Greedy algorithm gives constant factor approximation

F(Agreedy) ¸ (1-1/e) F(Aopt)

Greedy algorithm gives near-optimal solution!Guarantees best possible unless P = NP!

Many more reasons, sit back and relax…

~63%

14

People sit a lotActivity recognition inassistive technologiesSeating pressure as user interface

Equipped with 1 sensor per cm2!

Costs $16,000!

Can we get similar accuracy with fewer,

cheaper sensors?

Leanforward

SlouchLeanleft

82% accuracy on 10 postures! [Tan et al]

Building a Sensing Chair [UIST ‘07]

15

How to place sensors on a chair?Sensor readings at locations V as random variablesPredict posture Y using probabilistic model P(Y,V)Pick sensor locations A* µ V to minimize entropy:

Possible locations V

Theorem: Information gainis submodular!* [UAI’05]

*See store for details

Accuracy CostBefore 82% $16,000 After 79% $100

Placed sensors, did a user study:

Similar accuracy at <1% of the cost!

16

Battle of the Water Sensor Networks CompetitionReal metropolitan area network (12,527 nodes)Water flow simulator provided by EPA3.6 million contamination eventsMultiple objectives: Detection time, affected population, …Place sensors that detect well “on average”

17

BWSN Competition results [Ostfeld et al., J Wat Res Mgt 2008]

13 participantsPerformance measured in 30 different criteria

0

5

10

15

20

25

30

Tota

l S

core

Hig

her

is b

ett

er

Our

app

roac

h

Berry

et. a

l.

Dorin

i et.

al.

Wu

& Wal

ski

Ostfe

ld &

Sal

omon

s

Prop

ato

& Piller

Elia

des & P

olyc

arpo

u

Huang

et.

al.

Guan

et. a

l.

Ghim

ire &

Bar

kdol

l

Trac

htm

an

Gueli

Prei

s & O

stfe

ld

EE

D DG

GG

GG

H

H

H

G: Genetic algorithmH: Other heuristic

D: Domain knowledgeE: “Exact” method (MIP)

24% better performance than runner-up!

18

Simulated all on 2 weeks / 40 processors152 GB data on disk Very accurate sensing quality

Using “lazy evaluations”:1 hour/20 sensorsDone after 2 days!

, 16 GB in main memory (compressed)

Advantage through theory and engineering!

Low

er

is b

ett

er 30 hours/20 sensors

6 weeks for all30 settings

3.6M contaminations

Very slow evaluation of F(A)

1 2 3 4 5 6 7 8 9 100

100

200

300

Number of sensors selected

Runn

ing

time

(min

utes

)

Exhaustive search(All subsets)

Naivegreedy

Fast greedy ubmodularity to the rescue:

What was the trick?

19

What about worst-case?

S2

S3

S4S1

Knowing the sensor locations, an adversary contaminates here!

Where should we place sensors to quickly detect in the worst case?

Very different average-case impact,Same worst-case impact

S2

S3

S4

S1

Placement detects well on “average-case”

(accidental) contamination

20

Optimizing for the worst case

Contamination at node s Sensors AFs(A) is high

Contamination at node rFr(A) is low

Fs(B) is low

Fr(B) is high

Sensors B

Fr(C) is high

Fs(C) is highSensors C

Separate utility function Fi with each contamination i

Fi(A) = impact reduction by sensors A for contamination i

Want to solve

Each of the Fi is submodular

Unfortunately, mini Fi not submodular!

How can we solve this robust sensing problem?

21

Robust sensing

Outline

Sensor placement

Sequential sensing

Complex constraints

22

How does the greedy algorithm do?

Theorem [NIPS ’07]: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP

Optimalsolution

Greedy picks first

Then, canchoose only

or

Greedy does arbitrarily badly. Is there something better?

V={ , , }Can only buy k=2

Greedy score: Optimal score: 1

Set A F1 F2 mini Fi

1 0 00 2 0

1

2

1 2 1

Hence we can’t find any approximation algorithm.

Or can we?

23

Alternative formulationIf somebody told us the optimal value,

can we recover the optimal solution A*?

Need to find

Is this any easier?

Yes, if we relax the constraint |A| · k

24

Solving the alternative problemTrick: For each Fi and c, define truncation

c

|A|

Fi(A)

F’i,c(A)

Same optimal solutions!Solving one solves the other

Non-submodular Don’t know how to solve

Submodular!Can use greedy!

Problem 1 (last slide) Problem 2

25

Back to our example

Guess c=1First pick Then pick

Optimal solution!

How do we find c?Do binary search!

Set A F1 F2 mini Fi F’avg,1

1 0 0 ½0 2 0 ½

1 (1+)/2

2 (1+)/2

1 2 1 1

26

Truncationthreshold(color)

Saturate Algorithm [NIPS ‘07]Given: set V, integer k and submodular functions F1,…,Fm

Initialize cmin=0, cmax = mini Fi(V)

Do binary search: c = (cmin+cmax)/2Greedily find AG such that F’avg,c(AG) = c

If |AG| · k: increase cmin

If |AG| > k: decrease cmax

until convergence

27

Theoretical guarantees

Theorem: If there were a polytime algorithm with better factor < , then NP µ DTIME(nlog log n)

Theorem: Saturate finds a solution AS such that

mini Fi(AS) ¸ OPTk and |AS| · k

where OPTk = max|A|·k mini Fi(A)

= 1 + log maxs i Fi({s})

Theorem: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP

28

Example: Lake monitoringMonitor pH values using robotic sensor

Position s along transect

pH v

alue

Observations A

True (hidden) pH values

Prediction at unobservedlocations

transect

Where should we sense to minimize our maximum error?

Use probabilistic model(Gaussian processes)

to estimate prediction error

(often) submodular[Das & Kempe ’08]

Var(s | A)

Robust sensing problem!

29

Comparison with state of the artAlgorithm used in geostatistics: Simulated Annealing

[Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…]7 parameters that need to be fine-tuned

Environmental monitoring

bett

er

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

SimulatedAnnealing

0 20 40 600

0.05

0.1

0.15

0.2

0.25

Number of sensors

Max

imum

mar

gina

l var

ianc

e

Greedy

SimulatedAnnealing

Saturate

Precipitation data

0 20 40 60 80 1000.5

1

1.5

2

2.5

Number of sensors

Maxim

um

marg

inal vari

ance

Greedy

Saturate

SimulatedAnnealing

Saturate is competitive & 10x fasterNo parameters to tune!

30

Saturate

Results on water networks

60% lower worst-case detection time!

Water networks

500

1000

1500

2000

2500

3000

Number of sensors

Max

imum

det

ectio

n tim

e (m

inut

es)

Low

er

is b

ett

er

No decreaseuntil allcontaminationsdetected!

0 10 200

Greedy

SimulatedAnnealing

31

Outline

Sensor placement

Sequential sensing

Complex constraintsRobust sensing

32

Other aspects: Complex constraintsmaxA F(A) or maxA mini Fi(A) subject to

So far: |A| · kIn practice, more complex constraints:

Locations need to be connected by paths

[IJCAI ’07, AAAI ’07]

Lake monitoring

Sensors need to communicate (form a routing tree)

Building monitoring

33

Naïve approach: Greedy-connectSimple heuristic: Greedily optimize sensing quality F(A)Then add nodes to minimize communication cost C(A)

Want to find optimal tradeoffbetween information and communication cost

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

No communicationpossible!C(A) = 1

relay node

relay node

C(A) = 10

2 2

2

2 2Second

most informative

Communication cost = Expected # of trials(learned using Gaussian Processes)

F(A) = 4C(A) =

10SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

relay node

relay node

2 2

2

2 2

F(A) = 4

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

1

11

F(A) = 0.2

C(A) = 3

efficientcommunication!

Not veryinformative

Most informative

Very informative,High communication

cost!

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

1.5

1

F(A) = 3.5

C(A) = 3.5

1

34

Theorem: pSPIEL finds a placement A with sensing quality F(A) ¸ (1) OPTF

communication cost C(A) · O(log |V|) OPTC

44% lower error and 19% lower communication costthan placement by domain expert!

Real deployment at CMUArchitecture department

The pSPIEL Algorithm [IPSN ’06 – Best Paper Award] padded Sensor Placements at Informative and cost-Effective Locations

pSPIEL:

35

Outline

Sensor placement

Sequential sensing

Complex constraintsRobust sensing

36

Other aspects: Sequential sensingYou walk up to a lake, but don’t have a modelWant sensing locations to learn the model (explore) and predict well (exploit)Chosen locations depend on measurements![ICML ’07] Exploration—Exploitation analysis

for Active Learning in Gaussian Processes

37

Summary so far

Submodularity in sensing optimization

Greedy is near-optimal[UAI ’05, JMLR ’07, KDD ’07]

Robust sensingGreedy fails badly

Saturate is near-optimal[NIPS ’07]

Path planningCommunication constraintsConstrained submodular optimization

pSPIEL gives strong guarantees[IJCAI ’07, IPSN ’06, AAAI ‘07]

Sequential sensingExploration Exploitation Analysis [ICML ’07, IJCAI ‘05]

All these applications involve physical sensing.

Now for something completely different.Let’s jump from water…

38

… to the Web!

You have 10 minutes each day for reading blogs / news.

Which of the million blogs should you read?

39

Tim

e

Information cascade

Cascades in the Blogosphere[KDD 07 – Best Paper Award]

Which blogs should we read to learn about big cascades early?

Learn aboutstory after us!

40

Water vs. Web

In both problems we are givenGraph with nodes (junctions / blogs) and edges (pipes / links)Cascades spreading dynamically over the graph (contamination / citations)

Want to pick nodes to detect big cascades early

Placing sensors inwater networks

Selectinginformative blogsvs.

In both applications, utility functions submodular [Generalizes Kempe et al, KDD ’03]

41

Performance on Blog selection

Outperforms state-of-the-art heuristics700x speedup using submodularity!

Blog selection

Low

er

is b

ett

er

1 2 3 4 5 6 7 8 9 100

100

200

300

400

Number of blogs selected

Runn

ing

time

(sec

onds

)

Exhaustive search(All subsets)

Naivegreedy

Fast greedy

Blog selection~45k blogs

Hig

her

is b

ett

er

Number of blogs

Casc

ades

cap

ture

d

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Greedy

In-linksAll outlinks

# Posts

Random

42

Naïve approach: Just pick 10 best blogsSelects big, well known blogs (Instapundit, etc.)These contain many posts, take long to read!

0 1 2 3 4 5

x 104

0

0.2

0.4

0.6

Taking “attention” into account

0 1 2 3 4 5

x 104

0

0.2

0.4

0.6

Casc

ades

cap

ture

d

Number of posts (time) allowed

x 104

Cost/benefitanalysis

Ignoring cost

Cost-benefit optimization picks summarizer blogs!

43

Predicting the “hot” blogs

Jan Feb Mar Apr May0

200

#det

ectio

ns

Greedy


200#d

etec

tions

Saturate

Detects on training set

Greedy on historicTest on future

Poor generalization!Why’s that?

0 1000 2000 3000 40000

0.05

0.1

0.15

0.2

0.25Greedy on futureTest on future

“Cheating”

Casc

ades

cap

ture

d


Detect wellhere!

Detect poorlyhere!

Want blogs that will be informative in the futureSplit data set; train on historic, test on future

Blog selection “overfits”to training data!

Let’s see whatgoes wrong here.

Want blogs thatcontinue to do well!

44

Robust optimization


200

#det

ectio

ns

Greedy


200

#det

ectio

ns

Saturate


200#d

etec

tions

Greedy


200

#det

ectio

ns

Saturate

Detections using Saturate

F1(A)=.5

F2 (A)=.8

F3 (A)=.6

F4(A)=.01

F5 (A)=.02

Optimizeworst-case

Fi(A) = detections in interval i

“Overfit” blog selection A

“Robust” blog selection A*

Robust optimization Regularization!

45

Predicting the “hot” blogs

Greedy on historicTest on future

Robust solutionTest on future

0 1000 2000 3000 40000

0.05

0.1

0.15

0.2

0.25Greedy on futureTest on future

“Cheating”Se

nsin

g qu

ality


50% better generalization!

46

Future work

Submodularity in sensing optimization

Greedy is near-optimal[UAI ’05, JMLR ’07, KDD ’07]

Robust sensingGreedy fails badly

Saturate is near-optimal[NIPS ’07]

Path planningCommunication constraintsConstrained submodular optimization

pSPIEL gives strong guarantees[IJCAI ’07, IPSN ’06]

Sequential sensingExploration Exploitation Analysis [ICML ’07]

Thesis in a box

Constrained optimization better use of “attention”

Robust optimization better generalization

47

Future work

Thesis in a box

Algorithmic and statistical machine learning

UAI ‘05ICML ‘06 Sensys ‘05 ISWC ‘06

Trans MobComp ‘06

Fundamental questions in Information Acquisition

and Interpretation

Machine learningand sensing in

natural sciencesand engineering

Machine learning and optimization

on the web

48

Structure in ML / AI problems

Structural insights help us solve challenging problemsShameless plug: Tutorial at ICML 2008

ML last 10 years:

Convexity

Kernel machines SVMs, GPs, MLE…

ML “next 10 years:”

Submodularity

New structural properties

49

Future work


and Interpretation




on the web

Thesis in a box


50

Fundamental bottlenecks in information acquisition

WebUser

Attention

Privacy

Preliminary resultsUtility/Privacy in Personalized search. Submodularity is key!

Planned study: data from 2 million blogs from spinn3r.com

extremely scarce,non-renewable!

51

www.blogcascades.org

5 months, 2 million blogs, 55 million posts

52

Future work


and Interpretation




on the web

Thesis in a box


53

Machine learning for natural sciencesLarge, distributed research efforts

Multiple research teamsMultiple sensing modalities

Need to integrate experimental resultsReason about noisy measurements + scientific hypothesesKey problem in AI / ML:

Integrate noisy sensor data and symbolic, domain knowledge

New approaches toward autonomous science

[Oßwald et al, Lasser et al]

[MacIntyre et al, Singh et al]

54

Future work


and Interpretation




on the web

Thesis in a box

Many opportunities for collaboration!


55

ConclusionsSensing and information acquisition problems are important and ubiquitousCan exploit structure to find provably good solutionsPresented algorithms with strong guaranteesPerform well on real world problems

Thanks to: Carlos Guestrin, Anupam Gupta, Jon Kleinberg, Eric Horvitz,

Jure Leskovec, Ajit Singh, Amarjeet Singh, Vipul Singhvi, William Kaiser, Jeanne VanBriesen, Christos Faloutsos, Brendan McMahan, Natalie Glance

57

Worst- vs. average caseGiven: Possible locations V, submodular functions F1,…,Fm

Average-case score Worst-case score

Strong assumptions! Very pessimistic!

Want to optimize both average- and worst-case score!

Can modify Saturate to solve this problem!

58

Worst- vs. average case

0 50 100 150 200 250 300 3500

1000

2000

3000

4000

5000

6000

7000

Knee intradeoff curve

Only optimize foraverage case

Only optimize for worst case

Tradeoffs(Saturate)

Wors

t case

im

pact

lo

wer

is b

ett

er

Average case impactlower is better

Waternetworks

data

Can find good compromise between average- and worst-case score!

59

Other aspects: Sequential sensing

expected utility over outcome of observations

X5=?

X3 =? X2 =?

X7 =?

F(X5=17, X3=16, X7=19) = 3.4

X5=17X5=21

X3 =16

X7 =19 X12=? X23 =?

F(…) = 2.1 F(…) = 2.4

Sensingpolicy

F() = 3.1

60

Questions we ask1) When can we find the optimal sequential policy? [IJCAI ’05]

2) How much better is sequential selection when compared to the best subset? [ICML ’07]

X1 X2 X3

X1 X2

X3

X4 X5

For chains (HMMs, etc.)Best policy in polytime

For trees:NPPP hard

Best policy Best set

For Gaussian processes: Have bounds on GAP that lead to efficient algorithms

Optimizing Sensing from Water to the Web

Documents

Transcript of Optimizing Sensing from Water to the Web