Algorithmic aspects of large scale optimal experimental...
Transcript of Algorithmic aspects of large scale optimal experimental...
Algorithmic aspects of large scaleoptimal experimental design
Guillaume SAGNOL
Zuse Institut Berlin (ZIB)
February 8, 2012
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Optimal Design of Experiments
We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .
We have s experiments available:
y1 = A1θ + ε1 y4 = A4θ + ε4
y2 = A2θ + ε2 ...
y3 = A3θ + ε3 ys = Asθ + εs
GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the
experimental effort wi (with∑s
i=1 wi = 1).
Optimal Design of Experiments
We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:
y1 = A1θ + ε1 y4 = A4θ + ε4
y2 = A2θ + ε2 ...
y3 = A3θ + ε3 ys = Asθ + εs
GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the
experimental effort wi (with∑s
i=1 wi = 1).
Optimal Design of Experiments
We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:
y1 = A1θ + ε1 y4 = A4θ + ε4
y2 = A2θ + ε2 ...
y3 = A3θ + ε3 ys = Asθ + εs
GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the
experimental effort wi (with∑s
i=1 wi = 1).
minimizing the variance of the best estimator
Under classical assumptions(e.g. noises have unit variance and are independent),
Theorem [Gauss-Markov]
For every linear unbiased estimator θ̂ of the unknownparameter θ, we have:
Var[θ̂] �
(s∑
i=1
wiATi Ai
)−1
.
Moreover this lower bound is attained by the estimatorfrom least square theory.
Two common criterions
Definition: Information matrixM(w) :=
∑si=1 wiAT
i Ai is the information matrix of w .
Common approach: Minimize a scalar function ofM(w)−1.
Two standard criterionsc−optimality: given c ∈ Rm,
min
{cT M(w)−1c : w ∈ Rs
+,∑
i
wi = 1
}
Two common criterions
Definition: Information matrixM(w) :=
∑si=1 wiAT
i Ai is the information matrix of w .
Common approach: Minimize a scalar function ofM(w)−1.
Two standard criterionsc−optimality: given c ∈ Rm,
min
{cT M(w)−1c : w ∈ Rs
+,∑
i
wi = 1
}
relevant when we want to estimate cTθ
Two common criterions
Definition: Information matrixM(w) :=
∑si=1 wiAT
i Ai is the information matrix of w .
Common approach: Minimize a scalar function ofM(w)−1.
Two standard criterionsc−optimality: given c ∈ Rm,
min
{cT M(w)−1c : w ∈ Rs
+,∑
i
wi = 1
}
A-optimality:
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1
}
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θΦA(w)
A−optimal design:
min1λ1
+1λ2
minimize the diagonalof the bounding box.
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
c
Φc(w)c-optimal design:
minimize shadowalong vector c.
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
ΦD(w)
D−optimal design:
maxλ1λ2
minimize the volume.
Geometric interpretation
Given an eigenvalue decomposition
M(w) =∑
λiu iuTi ,
the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√
λi.
1√λ1
u1
u2
1√λ2
θ
ΦE (w)
E−optimal design:
maxλ1
minimize the largestsemi-axis.
Summary
Given observation matrices A1, . . . ,As, solve
minw
Φ
(s∑
i=1
wiATi Ai
)
s. t. w ≥ 0,s∑
i=1
wi = 1.
Φ is a function mapping X ∈ S+m to R, that is
nonincreasing with respect to Löwner ordering �, e.g.
Φ(X ) = trace X−1 (A−optimality)
Φ(X ) = cT X−1c (c−optimality)
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Optimal Network Monitoring
We want to monitor certainquantities on a network, e.g.
volume of the flowsperformance indicatornature of the traffic
General ProblemHow many monitors should we place on eachNode/Edge/OD Pair of the network ?
example: OD-flows monitoring
We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:
example: OD-flows monitoring
We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:
1→ 41→ 52→ 42→ 5
example: OD-flows monitoring (continued)
A2→4 =
1→
2
1→
4
1→
5
· · ·
2→
4
2→
5
· · ·
0 1 0 0 0 0 00 0 1 0 0 0 00 0 0 0 1 0 00 0 0 0 0 1 0
In some applications, monitors can’t determine the sourceof the flows:
A2→4 =1→
2
1→
4
1→
5
· · ·
2→
4
2→
5
· · ·( )0 1 0 0 1 0 00 0 1 0 0 1 0
BAG Project: Truck Toll in Germany
A Huge SystemIntroduced in 2005Tax based on: distance and truck category(avg price: 0,17 Euros/km)Total revenue around 3 Billion Euros / year
Control ForcesAutomatic: 300 toll checker gantries (Kontrollbrücke)Manual: 300 vehicles – 540 officiers from BAG(Federal ofOice of Freight)
Complexity of BAG’s managementSchedule of each officerChoose where and when to control
TODO: show NRW and Berlin, CPU time, what aboutwhole Germany ?
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Wynn-Fedorov exchange algorithm
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1, M(w) =∑
i
wiATi Ai
}
Algorithm (Wynn 70, Fedorov 72)This is a feasible descent algorithm:
At step k , find the atomic design eik such that thedirectional derivative is maximum.
ik ← argmaxi∈[s] ‖M(w (k))−1Ai‖F
For a step of length αk , move in the direction of eik :
w (k+1) ← (1− αk )w (k) + αkeik .
Titterington multiplicative algorithm
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1, M(w) =∑
i
wiATi Ai
}
Algorithm (Titterington 76)At step k ,
For all i ∈ [s], compute the directional derivative
di ← ‖M(w (k))−1Ai‖F
For a parameter λ, make the multiplicative updates:
w (k+1)i ← w (k)
idλ
i
Γ,
where Γ is a normalizing constant.
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
Semidefinite Programming
min
{trace M(w)−1 : w ∈ Rs
+,∑
i
wi = 1, M(w) =∑
i
wiATi Ai
}
A-optimality SDP (Vandenberghe,Boyd,Wu 98)
minw , Y∈Sm
trace Y
s. t.(
M(w) II Y
)� 0,
s∑i=1
wi = 1, ∀ i ∈ [s],wi ≥ 0.
Elfving’s Theorem
In presence of scalar observations (Ai = aiT ), we have:
Theorem [Elfving,53]The design w is c−optimal iff there exists t > 0 such that
tc =∑
k
wkak ∈ ∂(
conv{±ai , i = 1, . . . , s}).
maxλ,t
t
s.t. tc =∑
k
λiai∑k
|λk |︸︷︷︸wk
≤ 1.
a1
a2
−a1
c
−a2
a4
−a3
θ2
t∗c = 34 a3+14 (−a4)
a3
θ1
−a4
General case
Theorem [DHL09], [Sag09,10]The design w is c-optimal iff there exists t > 0 and unitvectors εk such that
tc =∑
k
wkATk εk ∈ ∂
(conv{AT
i ε, i = 1..., s, ‖ε‖ ≤ 1}).
a11
a12
a21
a31
a22a32
a33
a41
E3
E2
E1
E4
t∗c
c
x1x3
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
The uppermost half of the generalized Elfving set
An illustration in 3D
xy
z
a2
a3
a1(1)
a1(2)
c A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
If we want to estimate (x + 2z) −→ Barycenter of a1(1), a2 and a3
An illustration in 3D
x
y
z
a2a3
a1(1)
a1(2)
x1
c A1 =
x y z( )1 0 00 2 0
A2 =(
0 0.5 2)
A3 =(
0 -0.5 2)
If we want to estimate (x + y + z) −→ Barycenter of x1 and a2
SOCP formulations
Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of
maxz∈Rm
cT z
∀i ∈ [s], ‖Aiz‖ ≤ 1
Then, w := µ∗∑sk=1 µ
∗k
is a c−optimal design.
SOCP for A−optimality [Sag10]
maxU∈Rm×m
trace U
∀i ∈ [s], ‖AiU‖F ≤ 1
SOCP formulations
Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of
maxz∈Rm
cT z
∀i ∈ [s], ‖Aiz‖ ≤ 1
Then, w := µ∗∑sk=1 µ
∗k
is a c−optimal design.
SOCP for A−optimality [Sag10]
maxU∈Rm×m
trace U
∀i ∈ [s], ‖AiU‖F ≤ 1
Outline
1 Optimal Design of Experiments
2 Application to Network Monitoring
3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions
The End
Thank you for your attention