Algorithmic aspects of large scale optimal experimental...

41
Algorithmic aspects of large scale optimal experimental design Guillaume SAGNOL Zuse Institut Berlin (ZIB) February 8, 2012

Transcript of Algorithmic aspects of large scale optimal experimental...

Page 1: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Algorithmic aspects of large scaleoptimal experimental design

Guillaume SAGNOL

Zuse Institut Berlin (ZIB)

February 8, 2012

Page 2: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 3: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 4: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .

We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

Page 5: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

Page 6: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Design of Experiments

We want to estimate the vector of parametersθ = [θ1, . . . , θm]T .We have s experiments available:

y1 = A1θ + ε1 y4 = A4θ + ε4

y2 = A2θ + ε2 ...

y3 = A3θ + ε3 ys = Asθ + εs

GOAL:1 Choose how many times to perform each experiment.2 Continuous relaxation on the unit s-simplex: distribute the

experimental effort wi (with∑s

i=1 wi = 1).

Page 7: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

minimizing the variance of the best estimator

Under classical assumptions(e.g. noises have unit variance and are independent),

Theorem [Gauss-Markov]

For every linear unbiased estimator θ̂ of the unknownparameter θ, we have:

Var[θ̂] �

(s∑

i=1

wiATi Ai

)−1

.

Moreover this lower bound is attained by the estimatorfrom least square theory.

Page 8: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Two common criterions

Definition: Information matrixM(w) :=

∑si=1 wiAT

i Ai is the information matrix of w .

Common approach: Minimize a scalar function ofM(w)−1.

Two standard criterionsc−optimality: given c ∈ Rm,

min

{cT M(w)−1c : w ∈ Rs

+,∑

i

wi = 1

}

Page 9: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Two common criterions

Definition: Information matrixM(w) :=

∑si=1 wiAT

i Ai is the information matrix of w .

Common approach: Minimize a scalar function ofM(w)−1.

Two standard criterionsc−optimality: given c ∈ Rm,

min

{cT M(w)−1c : w ∈ Rs

+,∑

i

wi = 1

}

relevant when we want to estimate cTθ

Page 10: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Two common criterions

Definition: Information matrixM(w) :=

∑si=1 wiAT

i Ai is the information matrix of w .

Common approach: Minimize a scalar function ofM(w)−1.

Two standard criterionsc−optimality: given c ∈ Rm,

min

{cT M(w)−1c : w ∈ Rs

+,∑

i

wi = 1

}

A-optimality:

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1

}

Page 11: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

Page 12: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θΦA(w)

A−optimal design:

min1λ1

+1λ2

minimize the diagonalof the bounding box.

Page 13: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

c

Φc(w)c-optimal design:

minimize shadowalong vector c.

Page 14: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

ΦD(w)

D−optimal design:

maxλ1λ2

minimize the volume.

Page 15: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Geometric interpretation

Given an eigenvalue decomposition

M(w) =∑

λiu iuTi ,

the confidence ellipsoids of θ̂ are centered in θ, and havesemi-axis of length proportional to 1√

λi.

1√λ1

u1

u2

1√λ2

θ

ΦE (w)

E−optimal design:

maxλ1

minimize the largestsemi-axis.

Page 16: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Summary

Given observation matrices A1, . . . ,As, solve

minw

Φ

(s∑

i=1

wiATi Ai

)

s. t. w ≥ 0,s∑

i=1

wi = 1.

Φ is a function mapping X ∈ S+m to R, that is

nonincreasing with respect to Löwner ordering �, e.g.

Φ(X ) = trace X−1 (A−optimality)

Φ(X ) = cT X−1c (c−optimality)

Page 17: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 18: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Optimal Network Monitoring

We want to monitor certainquantities on a network, e.g.

volume of the flowsperformance indicatornature of the traffic

General ProblemHow many monitors should we place on eachNode/Edge/OD Pair of the network ?

Page 19: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

example: OD-flows monitoring

We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:

Page 20: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

example: OD-flows monitoring

We place a monitor on the blueedge:=⇒ we collect information about4 OD flows:

1→ 41→ 52→ 42→ 5

Page 21: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

example: OD-flows monitoring (continued)

A2→4 =

1→

2

1→

4

1→

5

· · ·

2→

4

2→

5

· · ·

0 1 0 0 0 0 00 0 1 0 0 0 00 0 0 0 1 0 00 0 0 0 0 1 0

In some applications, monitors can’t determine the sourceof the flows:

A2→4 =1→

2

1→

4

1→

5

· · ·

2→

4

2→

5

· · ·( )0 1 0 0 1 0 00 0 1 0 0 1 0

Page 22: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

BAG Project: Truck Toll in Germany

A Huge SystemIntroduced in 2005Tax based on: distance and truck category(avg price: 0,17 Euros/km)Total revenue around 3 Billion Euros / year

Control ForcesAutomatic: 300 toll checker gantries (Kontrollbrücke)Manual: 300 vehicles – 540 officiers from BAG(Federal ofOice of Freight)

Complexity of BAG’s managementSchedule of each officerChoose where and when to control

Page 23: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

TODO: show NRW and Berlin, CPU time, what aboutwhole Germany ?

Page 24: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 25: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 26: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Wynn-Fedorov exchange algorithm

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

Algorithm (Wynn 70, Fedorov 72)This is a feasible descent algorithm:

At step k , find the atomic design eik such that thedirectional derivative is maximum.

ik ← argmaxi∈[s] ‖M(w (k))−1Ai‖F

For a step of length αk , move in the direction of eik :

w (k+1) ← (1− αk )w (k) + αkeik .

Page 27: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Titterington multiplicative algorithm

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

Algorithm (Titterington 76)At step k ,

For all i ∈ [s], compute the directional derivative

di ← ‖M(w (k))−1Ai‖F

For a parameter λ, make the multiplicative updates:

w (k+1)i ← w (k)

idλ

i

Γ,

where Γ is a normalizing constant.

Page 28: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 29: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Semidefinite Programming

min

{trace M(w)−1 : w ∈ Rs

+,∑

i

wi = 1, M(w) =∑

i

wiATi Ai

}

A-optimality SDP (Vandenberghe,Boyd,Wu 98)

minw , Y∈Sm

trace Y

s. t.(

M(w) II Y

)� 0,

s∑i=1

wi = 1, ∀ i ∈ [s],wi ≥ 0.

Page 30: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Elfving’s Theorem

In presence of scalar observations (Ai = aiT ), we have:

Theorem [Elfving,53]The design w is c−optimal iff there exists t > 0 such that

tc =∑

k

wkak ∈ ∂(

conv{±ai , i = 1, . . . , s}).

maxλ,t

t

s.t. tc =∑

k

λiai∑k

|λk |︸︷︷︸wk

≤ 1.

a1

a2

−a1

c

−a2

a4

−a3

θ2

t∗c = 34 a3+14 (−a4)

a3

θ1

−a4

Page 31: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

General case

Theorem [DHL09], [Sag09,10]The design w is c-optimal iff there exists t > 0 and unitvectors εk such that

tc =∑

k

wkATk εk ∈ ∂

(conv{AT

i ε, i = 1..., s, ‖ε‖ ≤ 1}).

a11

a12

a21

a31

a22a32

a33

a41

E3

E2

E1

E4

t∗c

c

x1x3

Page 32: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

Page 33: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

Page 34: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

Page 35: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

The uppermost half of the generalized Elfving set

Page 36: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

xy

z

a2

a3

a1(1)

a1(2)

c A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

If we want to estimate (x + 2z) −→ Barycenter of a1(1), a2 and a3

Page 37: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

An illustration in 3D

x

y

z

a2a3

a1(1)

a1(2)

x1

c A1 =

x y z( )1 0 00 2 0

A2 =(

0 0.5 2)

A3 =(

0 -0.5 2)

If we want to estimate (x + y + z) −→ Barycenter of x1 and a2

Page 38: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

SOCP formulations

Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of

maxz∈Rm

cT z

∀i ∈ [s], ‖Aiz‖ ≤ 1

Then, w := µ∗∑sk=1 µ

∗k

is a c−optimal design.

SOCP for A−optimality [Sag10]

maxU∈Rm×m

trace U

∀i ∈ [s], ‖AiU‖F ≤ 1

Page 39: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

SOCP formulations

Corollary: SOCP for c−optimality [Sag09,10]If µ is the optimal dual variable associated to theconstraints of

maxz∈Rm

cT z

∀i ∈ [s], ‖Aiz‖ ≤ 1

Then, w := µ∗∑sk=1 µ

∗k

is a c−optimal design.

SOCP for A−optimality [Sag10]

maxU∈Rm×m

trace U

∀i ∈ [s], ‖AiU‖F ≤ 1

Page 40: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

Outline

1 Optimal Design of Experiments

2 Application to Network Monitoring

3 Algorithms for Optimal Experimental DesignClassic algorithmsRecent developmentsNew search directions

Page 41: Algorithmic aspects of large scale optimal experimental designpage.math.tu-berlin.de/~sagnol/papers/Ergo_Edinburgh.pdfOutline 1 Optimal Design of Experiments 2 Application to Network

The End

Thank you for your attention