Algorithmic aspects of large scale optimal experimental...

Algorithmic aspects of large scaleoptimal experimental design

Guillaume SAGNOL

Zuse Institut Berlin (ZIB)

February 8, 2012

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .

We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

minimizing the variance of the best estimator

Under classical assumptions(e.g. noises have unit variance and are independent),

Theorem [Gauss-Markov]

For every linear unbiased estimator θ̂ of the unknownparameter θ, we have:

Var[θ̂] �

(s∑

i=1

wiATi Ai

)−1

.

Moreover this lower bound is attained by the estimatorfrom least square theory.

Two common criterions

Definition: Information matrixM(w) :=

∑si=1 wiAT

i Ai is the information matrix of w .

Common approach: Minimize a scalar function ofM(w)−1.

Two standard criterionsc−optimality: given c ∈ Rm,

min

{cT M(w)−1c : w ∈ Rs

+,∑

i

wi = 1

}



∑si=1 wiAT




min


+,∑

i

wi = 1

}

relevant when we want to estimate cTθ



∑si=1 wiAT




min


+,∑

i

wi = 1

}

A-optimality:

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1

}

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ



M(w) =∑

λiu iuTi ,


λi.

1√λ1

u1

u2

1√λ2

θΦA(w)

A−optimal design:

min1λ1

+1λ2

minimize the diagonalof the bounding box.



M(w) =∑

λiu iuTi ,


λi.

1√λ1

u1

u2

1√λ2

θ

c

Φc(w)c-optimal design:

minimize shadowalong vector c.



M(w) =∑

λiu iuTi ,


λi.

1√λ1

u1

u2

1√λ2

θ

ΦD(w)

D−optimal design:

maxλ1λ2

minimize the volume.



M(w) =∑

λiu iuTi ,


λi.

1√λ1

u1

u2

1√λ2

θ

ΦE (w)

E−optimal design:

maxλ1

minimize the largestsemi-axis.

Summary

Given observation matrices A1, . . . ,As, solve

minw

Φ

(s∑

i=1

wiATi Ai

)

s. t. w ≥ 0,s∑

i=1

wi = 1.

Φ is a function mapping X ∈ S+m to R, that is

nonincreasing with respect to Löwner ordering �, e.g.

Φ(X ) = trace X−1 (A−optimality)

Φ(X ) = cT X−1c (c−optimality)

Outline




Optimal Network Monitoring

We want to monitor certainquantities on a network, e.g.

volume of the flowsperformance indicatornature of the traffic

General ProblemHow many monitors should we place on eachNode/Edge/OD Pair of the network ?

example: OD-flows monitoring

We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:

example: OD-flows monitoring

We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:

1→ 41→ 52→ 42→ 5

example: OD-flows monitoring (continued)

A2→4 =

1→

2

1→

4

1→

5

· · ·

2→

4

2→

5

· · ·

0 1 0 0 0 0 00 0 1 0 0 0 00 0 0 0 1 0 00 0 0 0 0 1 0

In some applications, monitors can’t determine the sourceof the flows:

A2→4 =1→

2

1→

4

1→

5

· · ·

2→

4

2→

5

· · ·( )0 1 0 0 1 0 00 0 1 0 0 1 0

BAG Project: Truck Toll in Germany

A Huge SystemIntroduced in 2005Tax based on: distance and truck category(avg price: 0,17 Euros/km)Total revenue around 3 Billion Euros / year

Control ForcesAutomatic: 300 toll checker gantries (Kontrollbrücke)Manual: 300 vehicles – 540 officiers from BAG(Federal ofOice of Freight)

Complexity of BAG’s managementSchedule of each officerChoose where and when to control

TODO: show NRW and Berlin, CPU time, what aboutwhole Germany ?

Outline




Wynn-Fedorov exchange algorithm

min


+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

Algorithm (Wynn 70, Fedorov 72)This is a feasible descent algorithm:

At step k , find the atomic design eik such that thedirectional derivative is maximum.

ik ← argmaxi∈[s] ‖M(w (k))−1Ai‖F

For a step of length αk , move in the direction of eik :

w (k+1) ← (1− αk )w (k) + αkeik .

Titterington multiplicative algorithm

min


+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

Algorithm (Titterington 76)At step k ,

For all i ∈ [s], compute the directional derivative

di ← ‖M(w (k))−1Ai‖F

For a parameter λ, make the multiplicative updates:

w (k+1)i ← w (k)

idλ

i

Γ,

where Γ is a normalizing constant.

Outline




Semidefinite Programming

min


+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

A-optimality SDP (Vandenberghe,Boyd,Wu 98)

minw , Y∈Sm

trace Y

s. t.(

M(w) II Y

)� 0,

s∑i=1

wi = 1, ∀ i ∈ [s],wi ≥ 0.

Elfving’s Theorem

In presence of scalar observations (Ai = aiT ), we have:

Theorem [Elfving,53]The design w is c−optimal iff there exists t > 0 such that

tc =∑

k

wkak ∈ ∂(

conv{±ai , i = 1, . . . , s}).

maxλ,t

t

s.t. tc =∑

k

λiai∑k

|λk |︸︷︷︸wk

≤ 1.

a1

a2

−a1

c

−a2

a4

−a3

θ2

t∗c = 34 a3+14 (−a4)

a3

θ1

−a4

General case

Theorem [DHL09], [Sag09,10]The design w is c-optimal iff there exists t > 0 and unitvectors εk such that

tc =∑

k

wkATk εk ∈ ∂

(conv{AT

i ε, i = 1..., s, ‖ε‖ ≤ 1}).

a11

a12

a21

a31

a22a32

a33

a41

E3

E2

E1

E4

t∗c

c

x1x3

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)


x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

The uppermost half of the generalized Elfving set


xy

z

a2

a3

a1(1)

a1(2)

c A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

If we want to estimate (x + 2z) −→ Barycenter of a1(1), a2 and a3


x

y

z

a2a3

a1(1)

a1(2)

x1

c A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

If we want to estimate (x + y + z) −→ Barycenter of x1 and a2

SOCP formulations

Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of

maxz∈Rm

cT z

∀i ∈ [s], ‖Aiz‖ ≤ 1

Then, w := µ∗∑sk=1 µ

∗k

is a c−optimal design.

SOCP for A−optimality [Sag10]

maxU∈Rm×m

trace U

∀i ∈ [s], ‖AiU‖F ≤ 1

Outline




The End

Thank you for your attention

Algorithmic aspects of large scale optimal experimental...

Documents

Transcript of Algorithmic aspects of large scale optimal experimental...