Optimizing Sensing from Water to the Web
description
Transcript of Optimizing Sensing from Water to the Web
Optimizing Sensingfrom Water to the Web
Andreas Krause
[email protected] theory and practice collide
2
Monitoring algal blooms
Algal blooms threaten freshwater4 million people without water1300 factories shut down$14.5 billion to clean upOther occurrences in Australia, Japan, Canada, Brazil, Mexico, Great Britain, Portugal, Germany …
Growth processes still unclearNeed to characterize growth in the lakes, not in the lab!
Tai Lake China10/07 MSNBC
3
Can only make a limited number of measurements!
Dep
th
Location across lake
Monitoring rivers and lakes [IJCAI ‘07]Need to monitor large spatial phenomena
Temperature, nutrient distribution, fluorescence, …
Predict atunobserved
locations
NIMSKaiseret.al.
(UCLA)
Color indicates actual temperature Predicted temperatureUse robotic sensors tocover large areas
Where should we sense to get most accurate predictions?
4
Monitoring water networks[J Wat Res Mgt 2008]
Contamination of drinking watercould affect millions of people
Contamination
Place sensors to detect contaminations“Battle of the Water Sensor Networks” competition
Where should we place sensors to quickly detect contamination?
Sensors
Simulator from EPA Hach Sensor
~$14K
5
Sensing problemsWant to learn something about the state of the world
Estimate water quality in a geographic region, detect outbreaks, …
We can choose (partial) observations…Make measurements, place sensors, choose experimental parameters …
… but they are expensive / limitedhardware cost, power consumption, grad student time …
Want to cost-effectively get most useful information!Fundamental problem in Machine Learning:
How do we decide what to learn from?
6
Related workSensing problems considered in
Experimental design (Lindley ’56, Robbins ’52…), Spatial statistics
(Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …)
Existing algorithms typicallyHeuristics: No guarantees! Can do arbitrarily badly.Find optimal solutions (Mixed integer programming, POMDPs):
Very difficult to scale to bigger problems.
7
Contributions
Theoretical: Approximation algorithms that have theoretical guarantees and scale to large problems
Applied: Empirical studies with real deployments and large datasets
8
Model-based sensingUtility of placing sensors based on model of the world
For lake monitoring: Learn probabilistic models from data (later)For water networks: Water flow simulator from EPA
For each subset A µ V compute “sensing quality” F(A)
S2
S3
S4S1 S2
S3
S4
S1
High sensing quality F(A) = 0.9 Low sensing quality F(A)=0.01
Model predictsHigh impact
Medium impactlocation
Low impactlocation
Sensor reducesimpact throughearly detection!
S1
Contamination
Set V of all network junctions
9
Robust sensing Complex constraints
Sequential sensing
Optimizing sensing / OutlineSensing locationsSensing quality
Sensing budgetSensing cost
Sensor placement
10
Sensor placementGiven: finite set V of locations, sensing quality FWant: A*µ V such that
Typically NP-hard!
How well can this simple heuristic do?
S1
S2
S3
S4
S5
S6Greedy algorithm:Start with A = ;For i = 1 to k
s* := argmaxs F(A [ {s})
A := A [ {s*}
11
2 4 6 8 100.5
0.6
0.7
0.8
0.9
Number of sensors placed
Pop
ulat
ion
affe
cted
Performance of greedy algorithm
Greedy score empirically close to optimal. Why?
Small subset of Water networks
data
2 4 6 8 100.5
0.6
0.7
0.8
0.9
Number of sensors placed
Pop
ulat
ion
affe
cted
Greedy
Optimal
Pop
ula
tion
pro
tect
ed
(h
igh
er
is b
ett
er)
Number of sensors placed
12
S’
S2S3
S4S1
Key property: Diminishing returns
S2
S1
S’
Placement A = {S1, S2} Placement B = {S1, S2, S3, S4}
Adding S’ will help a lot!
Adding S’ doesn’t help muchNew
sensor S’B A
S’
+
+
Large improvement
Small improvement
For Aµ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)
Submodularity:
Theorem [Krause et al., J Wat Res Mgt ’08]:Sensing quality F(A) in water networks is submodular!
13
One reason submodularity is useful
Theorem [Nemhauser et al ‘78]Greedy algorithm gives constant factor approximation
F(Agreedy) ¸ (1-1/e) F(Aopt)
Greedy algorithm gives near-optimal solution!Guarantees best possible unless P = NP!
Many more reasons, sit back and relax…
~63%
14
People sit a lotActivity recognition inassistive technologiesSeating pressure as user interface
Equipped with 1 sensor per cm2!
Costs $16,000!
Can we get similar accuracy with fewer,
cheaper sensors?
Leanforward
SlouchLeanleft
82% accuracy on 10 postures! [Tan et al]
Building a Sensing Chair [UIST ‘07]
15
How to place sensors on a chair?Sensor readings at locations V as random variablesPredict posture Y using probabilistic model P(Y,V)Pick sensor locations A* µ V to minimize entropy:
Possible locations V
Theorem: Information gainis submodular!* [UAI’05]
*See store for details
Accuracy CostBefore 82% $16,000 After 79% $100
Placed sensors, did a user study:
Similar accuracy at <1% of the cost!
16
Battle of the Water Sensor Networks CompetitionReal metropolitan area network (12,527 nodes)Water flow simulator provided by EPA3.6 million contamination eventsMultiple objectives: Detection time, affected population, …Place sensors that detect well “on average”
17
BWSN Competition results [Ostfeld et al., J Wat Res Mgt 2008]
13 participantsPerformance measured in 30 different criteria
0
5
10
15
20
25
30
Tota
l S
core
Hig
her
is b
ett
er
Our
app
roac
h
Berry
et. a
l.
Dorin
i et.
al.
Wu
& Wal
ski
Ostfe
ld &
Sal
omon
s
Prop
ato
& Piller
Elia
des & P
olyc
arpo
u
Huang
et.
al.
Guan
et. a
l.
Ghim
ire &
Bar
kdol
l
Trac
htm
an
Gueli
Prei
s & O
stfe
ld
EE
D DG
GG
GG
H
H
H
G: Genetic algorithmH: Other heuristic
D: Domain knowledgeE: “Exact” method (MIP)
24% better performance than runner-up!
18
Simulated all on 2 weeks / 40 processors152 GB data on disk Very accurate sensing quality
Using “lazy evaluations”:1 hour/20 sensorsDone after 2 days!
, 16 GB in main memory (compressed)
Advantage through theory and engineering!
Low
er
is b
ett
er 30 hours/20 sensors
6 weeks for all30 settings
3.6M contaminations
Very slow evaluation of F(A)
1 2 3 4 5 6 7 8 9 100
100
200
300
Number of sensors selected
Runn
ing
time
(min
utes
)
Exhaustive search(All subsets)
Naivegreedy
Fast greedy ubmodularity to the rescue:
What was the trick?
19
What about worst-case?
S2
S3
S4S1
Knowing the sensor locations, an adversary contaminates here!
Where should we place sensors to quickly detect in the worst case?
Very different average-case impact,Same worst-case impact
S2
S3
S4
S1
Placement detects well on “average-case”
(accidental) contamination
20
Optimizing for the worst case
Contamination at node s Sensors AFs(A) is high
Contamination at node rFr(A) is low
Fs(B) is low
Fr(B) is high
Sensors B
Fr(C) is high
Fs(C) is highSensors C
Separate utility function Fi with each contamination i
Fi(A) = impact reduction by sensors A for contamination i
Want to solve
Each of the Fi is submodular
Unfortunately, mini Fi not submodular!
How can we solve this robust sensing problem?
21
Robust sensing
Outline
Sensor placement
Sequential sensing
Complex constraints
22
How does the greedy algorithm do?
Theorem [NIPS ’07]: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP
Optimalsolution
Greedy picks first
Then, canchoose only
or
Greedy does arbitrarily badly. Is there something better?
V={ , , }Can only buy k=2
Greedy score: Optimal score: 1
Set A F1 F2 mini Fi
1 0 00 2 0
1
2
1 2 1
Hence we can’t find any approximation algorithm.
Or can we?
23
Alternative formulationIf somebody told us the optimal value,
can we recover the optimal solution A*?
Need to find
Is this any easier?
Yes, if we relax the constraint |A| · k
24
Solving the alternative problemTrick: For each Fi and c, define truncation
c
|A|
Fi(A)
F’i,c(A)
Same optimal solutions!Solving one solves the other
Non-submodular Don’t know how to solve
Submodular!Can use greedy!
Problem 1 (last slide) Problem 2
25
Back to our example
Guess c=1First pick Then pick
Optimal solution!
How do we find c?Do binary search!
Set A F1 F2 mini Fi F’avg,1
1 0 0 ½0 2 0 ½
1 (1+)/2
2 (1+)/2
1 2 1 1
26
Truncationthreshold(color)
Saturate Algorithm [NIPS ‘07]Given: set V, integer k and submodular functions F1,…,Fm
Initialize cmin=0, cmax = mini Fi(V)
Do binary search: c = (cmin+cmax)/2Greedily find AG such that F’avg,c(AG) = c
If |AG| · k: increase cmin
If |AG| > k: decrease cmax
until convergence
27
Theoretical guarantees
Theorem: If there were a polytime algorithm with better factor < , then NP µ DTIME(nlog log n)
Theorem: Saturate finds a solution AS such that
mini Fi(AS) ¸ OPTk and |AS| · k
where OPTk = max|A|·k mini Fi(A)
= 1 + log maxs i Fi({s})
Theorem: The problem max|A|· k mini Fi(A) does not admit any approximation unless P=NP
28
Example: Lake monitoringMonitor pH values using robotic sensor
Position s along transect
pH v
alue
Observations A
True (hidden) pH values
Prediction at unobservedlocations
transect
Where should we sense to minimize our maximum error?
Use probabilistic model(Gaussian processes)
to estimate prediction error
(often) submodular[Das & Kempe ’08]
Var(s | A)
Robust sensing problem!
29
Comparison with state of the artAlgorithm used in geostatistics: Simulated Annealing
[Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…]7 parameters that need to be fine-tuned
Environmental monitoring
bett
er
0 20 40 600
0.05
0.1
0.15
0.2
0.25
Number of sensors
Max
imum
mar
gina
l var
ianc
e
Greedy
0 20 40 600
0.05
0.1
0.15
0.2
0.25
Number of sensors
Max
imum
mar
gina
l var
ianc
e
Greedy
SimulatedAnnealing
0 20 40 600
0.05
0.1
0.15
0.2
0.25
Number of sensors
Max
imum
mar
gina
l var
ianc
e
Greedy
SimulatedAnnealing
Saturate
Precipitation data
0 20 40 60 80 1000.5
1
1.5
2
2.5
Number of sensors
Maxim
um
marg
inal vari
ance
Greedy
Saturate
SimulatedAnnealing
Saturate is competitive & 10x fasterNo parameters to tune!
30
Saturate
Results on water networks
60% lower worst-case detection time!
Water networks
500
1000
1500
2000
2500
3000
Number of sensors
Max
imum
det
ectio
n tim
e (m
inut
es)
Low
er
is b
ett
er
No decreaseuntil allcontaminationsdetected!
0 10 200
Greedy
SimulatedAnnealing
31
Outline
Sensor placement
Sequential sensing
Complex constraintsRobust sensing
32
Other aspects: Complex constraintsmaxA F(A) or maxA mini Fi(A) subject to
So far: |A| · kIn practice, more complex constraints:
Locations need to be connected by paths
[IJCAI ’07, AAAI ’07]
Lake monitoring
Sensors need to communicate (form a routing tree)
Building monitoring
33
Naïve approach: Greedy-connectSimple heuristic: Greedily optimize sensing quality F(A)Then add nodes to minimize communication cost C(A)
Want to find optimal tradeoffbetween information and communication cost
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
No communicationpossible!C(A) = 1
relay node
relay node
C(A) = 10
2 2
2
2 2Second
most informative
Communication cost = Expected # of trials(learned using Gaussian Processes)
F(A) = 4C(A) =
10SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
relay node
relay node
2 2
2
2 2
F(A) = 4
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
1
11
F(A) = 0.2
C(A) = 3
efficientcommunication!
Not veryinformative
Most informative
Very informative,High communication
cost!
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
1.5
1
F(A) = 3.5
C(A) = 3.5
1
34
Theorem: pSPIEL finds a placement A with sensing quality F(A) ¸ (1) OPTF
communication cost C(A) · O(log |V|) OPTC
44% lower error and 19% lower communication costthan placement by domain expert!
Real deployment at CMUArchitecture department
The pSPIEL Algorithm [IPSN ’06 – Best Paper Award] padded Sensor Placements at Informative and cost-Effective Locations
pSPIEL:
35
Outline
Sensor placement
Sequential sensing
Complex constraintsRobust sensing
36
Other aspects: Sequential sensingYou walk up to a lake, but don’t have a modelWant sensing locations to learn the model (explore) and predict well (exploit)Chosen locations depend on measurements![ICML ’07] Exploration—Exploitation analysis
for Active Learning in Gaussian Processes
37
Summary so far
Submodularity in sensing optimization
Greedy is near-optimal[UAI ’05, JMLR ’07, KDD ’07]
Robust sensingGreedy fails badly
Saturate is near-optimal[NIPS ’07]
Path planningCommunication constraintsConstrained submodular optimization
pSPIEL gives strong guarantees[IJCAI ’07, IPSN ’06, AAAI ‘07]
Sequential sensingExploration Exploitation Analysis [ICML ’07, IJCAI ‘05]
All these applications involve physical sensing.
Now for something completely different.Let’s jump from water…
38
… to the Web!
You have 10 minutes each day for reading blogs / news.
Which of the million blogs should you read?
39
Tim
e
Information cascade
Cascades in the Blogosphere[KDD 07 – Best Paper Award]
Which blogs should we read to learn about big cascades early?
Learn aboutstory after us!
40
Water vs. Web
In both problems we are givenGraph with nodes (junctions / blogs) and edges (pipes / links)Cascades spreading dynamically over the graph (contamination / citations)
Want to pick nodes to detect big cascades early
Placing sensors inwater networks
Selectinginformative blogsvs.
In both applications, utility functions submodular [Generalizes Kempe et al, KDD ’03]
41
Performance on Blog selection
Outperforms state-of-the-art heuristics700x speedup using submodularity!
Blog selection
Low
er
is b
ett
er
1 2 3 4 5 6 7 8 9 100
100
200
300
400
Number of blogs selected
Runn
ing
time
(sec
onds
)
Exhaustive search(All subsets)
Naivegreedy
Fast greedy
Blog selection~45k blogs
Hig
her
is b
ett
er
Number of blogs
Casc
ades
cap
ture
d
0 20 40 60 80 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Greedy
In-linksAll outlinks
# Posts
Random
42
Naïve approach: Just pick 10 best blogsSelects big, well known blogs (Instapundit, etc.)These contain many posts, take long to read!
0 1 2 3 4 5
x 104
0
0.2
0.4
0.6
Taking “attention” into account
0 1 2 3 4 5
x 104
0
0.2
0.4
0.6
Casc
ades
cap
ture
d
Number of posts (time) allowed
x 104
Cost/benefitanalysis
Ignoring cost
Cost-benefit optimization picks summarizer blogs!
43
Predicting the “hot” blogs
Jan Feb Mar Apr May0
200
#det
ectio
ns
Greedy
Jan Feb Mar Apr May0
200#d
etec
tions
Saturate
Detects on training set
Greedy on historicTest on future
Poor generalization!Why’s that?
0 1000 2000 3000 40000
0.05
0.1
0.15
0.2
0.25Greedy on futureTest on future
“Cheating”
Casc
ades
cap
ture
d
Number of posts (time) allowed
Detect wellhere!
Detect poorlyhere!
Want blogs that will be informative in the futureSplit data set; train on historic, test on future
Blog selection “overfits”to training data!
Let’s see whatgoes wrong here.
Want blogs thatcontinue to do well!
44
Robust optimization
Jan Feb Mar Apr May0
200
#det
ectio
ns
Greedy
Jan Feb Mar Apr May0
200
#det
ectio
ns
Saturate
Jan Feb Mar Apr May0
200#d
etec
tions
Greedy
Jan Feb Mar Apr May0
200
#det
ectio
ns
Saturate
Detections using Saturate
F1(A)=.5
F2 (A)=.8
F3 (A)=.6
F4(A)=.01
F5 (A)=.02
Optimizeworst-case
Fi(A) = detections in interval i
“Overfit” blog selection A
“Robust” blog selection A*
Robust optimization Regularization!
45
Predicting the “hot” blogs
Greedy on historicTest on future
Robust solutionTest on future
0 1000 2000 3000 40000
0.05
0.1
0.15
0.2
0.25Greedy on futureTest on future
“Cheating”Se
nsin
g qu
ality
Number of posts (time) allowed
50% better generalization!
46
Future work
Submodularity in sensing optimization
Greedy is near-optimal[UAI ’05, JMLR ’07, KDD ’07]
Robust sensingGreedy fails badly
Saturate is near-optimal[NIPS ’07]
Path planningCommunication constraintsConstrained submodular optimization
pSPIEL gives strong guarantees[IJCAI ’07, IPSN ’06]
Sequential sensingExploration Exploitation Analysis [ICML ’07]
Thesis in a box
Constrained optimization better use of “attention”
Robust optimization better generalization
47
Future work
Thesis in a box
Algorithmic and statistical machine learning
UAI ‘05ICML ‘06 Sensys ‘05 ISWC ‘06
Trans MobComp ‘06
Fundamental questions in Information Acquisition
and Interpretation
Machine learningand sensing in
natural sciencesand engineering
Machine learning and optimization
on the web
48
Structure in ML / AI problems
Structural insights help us solve challenging problemsShameless plug: Tutorial at ICML 2008
ML last 10 years:
Convexity
Kernel machines SVMs, GPs, MLE…
ML “next 10 years:”
Submodularity
New structural properties
49
Future work
Fundamental questions in Information Acquisition
and Interpretation
Machine learningand sensing in
natural sciencesand engineering
Machine learning and optimization
on the web
Thesis in a box
Algorithmic and statistical machine learning
50
Fundamental bottlenecks in information acquisition
WebUser
Attention
Privacy
Preliminary resultsUtility/Privacy in Personalized search. Submodularity is key!
Planned study: data from 2 million blogs from spinn3r.com
extremely scarce,non-renewable!
51
www.blogcascades.org
5 months, 2 million blogs, 55 million posts
52
Future work
Fundamental questions in Information Acquisition
and Interpretation
Machine learningand sensing in
natural sciencesand engineering
Machine learning and optimization
on the web
Thesis in a box
Algorithmic and statistical machine learning
53
Machine learning for natural sciencesLarge, distributed research efforts
Multiple research teamsMultiple sensing modalities
Need to integrate experimental resultsReason about noisy measurements + scientific hypothesesKey problem in AI / ML:
Integrate noisy sensor data and symbolic, domain knowledge
New approaches toward autonomous science
[Oßwald et al, Lasser et al]
[MacIntyre et al, Singh et al]
54
Future work
Fundamental questions in Information Acquisition
and Interpretation
Machine learningand sensing in
natural sciencesand engineering
Machine learning and optimization
on the web
Thesis in a box
Many opportunities for collaboration!
Algorithmic and statistical machine learning
55
ConclusionsSensing and information acquisition problems are important and ubiquitousCan exploit structure to find provably good solutionsPresented algorithms with strong guaranteesPerform well on real world problems
Thanks to: Carlos Guestrin, Anupam Gupta, Jon Kleinberg, Eric Horvitz,
Jure Leskovec, Ajit Singh, Amarjeet Singh, Vipul Singhvi, William Kaiser, Jeanne VanBriesen, Christos Faloutsos, Brendan McMahan, Natalie Glance
56
57
Worst- vs. average caseGiven: Possible locations V, submodular functions F1,…,Fm
Average-case score Worst-case score
Strong assumptions! Very pessimistic!
Want to optimize both average- and worst-case score!
Can modify Saturate to solve this problem!
58
Worst- vs. average case
0 50 100 150 200 250 300 3500
1000
2000
3000
4000
5000
6000
7000
Knee intradeoff curve
Only optimize foraverage case
Only optimize for worst case
Tradeoffs(Saturate)
Wors
t case
im
pact
lo
wer
is b
ett
er
Average case impactlower is better
Waternetworks
data
Can find good compromise between average- and worst-case score!
59
Other aspects: Sequential sensing
expected utility over outcome of observations
X5=?
X3 =? X2 =?
X7 =?
F(X5=17, X3=16, X7=19) = 3.4
X5=17X5=21
X3 =16
X7 =19 X12=? X23 =?
F(…) = 2.1 F(…) = 2.4
Sensingpolicy
F() = 3.1
60
Questions we ask1) When can we find the optimal sequential policy? [IJCAI ’05]
2) How much better is sequential selection when compared to the best subset? [ICML ’07]
X1 X2 X3
X1 X2
X3
X4 X5
For chains (HMMs, etc.)Best policy in polytime
For trees:NPPP hard
Best policy Best set
For Gaussian processes: Have bounds on GAP that lead to efficient algorithms