Rare event simulation - a Point Process interpretation ... · Rare event simulation a Point Process...

Rare event simulation

a Point Process interpretation withapplication in probability and quantileestimation and metamodel basedalgorithms

Séminaire S3 | Clément WALTER

March 13th 2015

Introduction

Problem setting:

X random vector with know distribution µX

g a "black-box" function representing a computer code: g : Rd → RY = g(X) the real-valued random variable which describes the state ofthe system; its distribution µY is unknown

Uncertainty Quanti�cation: F = {x ∈ Rd | g(x) > q}�nd p = P [X ∈ F ] = µX(F ) for a given q�nd q for a given p

Issues

p = µX(F ) = µY ([q; +∞[)� 1

needs to use g to get F or µY which is time costly

Monte Carlo estimator has a CV δ2 ≈ 1/Np⇒ N � 1/p

Séminaire S3 | March 13th 2015 | PAGE 1/26

Introduction

Two main directions to overcome this issue:

learn a metamodel on g

use variance-reduction techniques to estimate p

Importance samplingMultilevel splitting

Multilevel Splitting (Subset Simulations)

Write the sought probability p as a product of less small probabilities andestimate them with MCMC: let (qi)i=0..m be an increasing sequence withq0 = −∞ and qm = q:

p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1]× P [g(X) > qm−1]

= P [g(X) > q] = P [g(X) > qm | g(X) > qm−1]× · · · × P [g(X) > q1]

⇒ how to choose (qi)i: beforehand or on-the-go? Optimality? Bias?


Introduction




Importance sampling

Multilevel splitting







Introduction











Introduction









⇒ how to choose (qi)i: beforehand or on-the-go? Optimality? Bias?Séminaire S3 | March 13th 2015 | PAGE 2/26

Introduction

A typical MS algorithm works as follows:

1 Sample a Monte-Carlo population (Xi)i of size N ;y = (g(X1), · · · , g(XN )); j = 0

2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj ]

3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to begreater than qj+1 (the other ones don't change)

4 j ← j + 1 and repeat until j = m

⇒ Parallel computation at each iteration in the resampling step

Minimal variance when all conditional probabilities are equal [4]

Adaptive Multilevel Splitting: qj+1 ← y(p0N) or qj+1 ← y(k)empirical quantiles of order p0 ∈ (0, 1) ⇒ bias [4, 1]; the number ofsubsets converges toward a constant log p/ log p0empirical quantiles of order k/N ⇒ no bias and CLT [3, 2]minimal variance with k = 1 (Last Particle Algorithm [5, 6]); thenumber of subsets follows a Poisson law with parameter −N log p

⇒ disables parallel computation


Introduction

A typical MS algorithm works as follows:

1 Sample a Monte-Carlo population (Xi)i of size N ;y = (g(X1), · · · , g(XN )); j = 0

2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj ]

3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to begreater than qj+1 (the other ones don't change)

4 j ← j + 1 and repeat until j = m

⇒ Parallel computation at each iteration in the resampling step

Minimal variance when all conditional probabilities are equal [4]

Adaptive Multilevel Splitting: qj+1 ← y(p0N) or qj+1 ← y(k)empirical quantiles of order p0 ∈ (0, 1) ⇒ bias [4, 1]; the number ofsubsets converges toward a constant log p/ log p0empirical quantiles of order k/N ⇒ no bias and CLT [3, 2]minimal variance with k = 1 (Last Particle Algorithm [5, 6]); thenumber of subsets follows a Poisson law with parameter −N log p

⇒ disables parallel computation Séminaire S3 | March 13th 2015 | PAGE 3/26

Outline

1 Increasing random walk

2 Probability estimation

3 Quantile estimation

4 Design points

5 Conclusion


Outline




4 Design points

5 Conclusion


Increasing random walkDe�nition

De�nition

Let Y be a real-valued random variable with distribution µY and cdf F(assumed to be continuous). One considers the Markov chain (withY0 = −∞) such that:

∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =µY (A ∩ (Yn,+∞))

µY ((Yn,+∞))

i.e. Yn+1 is randomly greater than Yn: Yn+1 ∼ µY (· | Y > Yn)

the Tn = − log(P (Y > Yn)) are distributed as the arrival times of aPoisson process with parameter 1 [5, 7]Time in the Poisson process is linked with rarity in probabilityThe number of events My before y ∈ R is related to P [Y > y] = py

My =Mt=− log pyL∼ P(− log py)



De�nition


∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =µY (A ∩ (Yn,+∞))

µY ((Yn,+∞))


the Tn = − log(P (Y > Yn)) are distributed as the arrival times of aPoisson process with parameter 1 [5, 7]

Time in the Poisson process is linked with rarity in probabilityThe number of events My before y ∈ R is related to P [Y > y] = py




De�nition


∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =µY (A ∩ (Yn,+∞))

µY ((Yn,+∞))


the Tn = − log(P (Y > Yn)) are distributed as the arrival times of aPoisson process with parameter 1 [5, 7]Time in the Poisson process is linked with rarity in probability

The number of events My before y ∈ R is related to P [Y > y] = py




De�nition


∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =µY (A ∩ (Yn,+∞))

µY ((Yn,+∞))


the Tn = − log(P (Y > Yn)) are distributed as the arrival times of aPoisson process with parameter 1 [5, 7]Time in the Poisson process is linked with rarity in probabilityThe number of events My before y ∈ R is related to P [Y > y] = py




First consequence: number of simulations to get the realisation of arandom variable above a given threshold (event with probability p) followsa Poisson law P(log 1/p) instead of a Geometric law G(p).

0 20 40 60 80 100

0.00

0.05

0.10

0.15

0.20

N

Den

sity

Figure: Comparison ofPoisson and Geometricdensities withp = 0.0228


Increasing random walkExample

Y ∼ N (0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78


Plan




4 Design points

5 Conclusion


Probability estimationDe�nition of the estimator

Concept

Number of events before time t = − log(P (Y > q)) follows a Poisson lawwith parameter t estimate the parameter of a Poisson law

Let (Mi)i=1..N be N iid. RV of the number of events at time

t = − log p: Mi ∼ P(− log p); Mq =N∑i=1

Mi ∼ P(−N log p)

−̂ log p ≈ 1

N

N∑i=1

Mi =Mq

N−→ p̂ =

(1− 1

N

)Mq

⇒ Last Particle Estimator, but with parallel implementation

⇒ One has indeed an estimator of P [Y > q0] , ∀q0 ≤ q:

FN (y) = 1−(1− 1

N

)Mya.s.−−−−→

N→∞F (y)


Probability estimationExample

Y ∼ N (0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78


Probability estimationPractical implementation

Ideally, one knows how to generate the Markov chainIn practice, Y = g(X) and one can use Markov chain sampling (e.g.Metropolis-Hastings or Gibbs algorithms)

requires to work with a population to get starting points

⇒ batches of k random walks are generated together

Generating k random walks

Require: k, qGenerate k copies (Xi)i=1..k according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0)while minY < q do

3: ind← whichY < qfor i in ind do

Mi =Mi + 16: Generate X∗ ∼ µX(· | X > g(Xi))

Xi ← X∗; Yi = g(X∗)end for

9: end whileReturn M, (Xi)i=1..N , (Yi)i=1..N

⇒ each sample is resampled according to its own level



Ideally, one knows how to generate the Markov chainIn practice, Y = g(X) and one can use Markov chain sampling (e.g.Metropolis-Hastings or Gibbs algorithms)

requires to work with a population to get starting points

⇒ batches of k random walks are generated together

Generating k random walks

Require: k, qGenerate k copies (Xi)i=1..k according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0)while minY < q do

3: ind← whichY < qfor i in ind do

Mi =Mi + 16: Generate X∗ ∼ µX(· | X > g(Xi))

Xi ← X∗; Yi = g(X∗)end for

9: end whileReturn M, (Xi)i=1..N , (Yi)i=1..N

⇒ each sample is resampled according to its own levelSéminaire S3 | March 13th 2015 | PAGE 12/26


Convergence of the Markov chain (for conditional sampling) is increasedwhen the starting point already follows the targeted distribution; 2possibilities:

store each state (Xi)i and its corresponding level

re-draw only the smallest Xi (⇒ Last Particle Algorithm)

⇒ LPA is only one possible implementation of this estimator

Computing time with LPA implementation

Let tpar be the random time of generating N random walks by batches ofsize k = N/nc (nc standing for a number of cores) with burn-in T

tpar = max of nc RV ∼ P(−k log p)

E [tpar] =T (log p)2

ncδ2

1 +

√ncδ

2

(log p)2

√2 log nc +

1

T log 1/p


Probability estimationComparison

Mean computer time against coe�cient of variation: the cost of analgorithm is the number of generated samples in a row by a core. Weassume nc ≥ 1 cores and burn-in = T for Metropolis-Hastings

Algorithm Time Coef. of var. δ2 Times VS δ

Monte Carlo N/nc 1/Np 1/pδ2

AMS T log plog p0

N(1−p0)nc

log plog p0

1−p0Np0

(1−p0)2p0(log p0)2

T (log p)2

ncδ2

LPA −TN log p − log pN

T (log p)2

δ2

Random walk −T Nnc

log p − log pN

T (log p)2

ncδ2

best AMS when p0 → 1

LPA brings the theoretically best AMS but is not parallel

Random walk allows for taking p0 → 1 while keeping the parallelimplementation


Plan




4 Design points

5 Conclusion


Quantile estimationDe�nition

Concept

Approximate a time t = − log p with times of a Poisson process; the higherthe rate, the denser the discretisation of [0; +∞[

The center of the interval [TMt ;TMt+1] converges toward a randomvariable centred in t with symmetric pdf

q̂ =1

2(Ym + Ym+1) with m = bE[Mq]c = b−N log pc

CLT:√N (q̂ − q) L−→

m→∞N(0,−p2 log pf(q)2

)Bounds on bias on O(1/N)


Quantile estimationExample

Y ∼ N (0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78


Plan




4 Design points

5 Conclusion


Design pointsAlgorithm

No need for exact sampling if the goal is only to get failing samples⇒ use of a metamodel for conditional sampling

Getting Nfail failing samples

Sample a minimal-sized DoELearn a �rst metamodel with trend = failure

3: for Nfail times do . Simulate the random walks one after the otherSample X1 ∼ µX ; y1 = g(X1); m = 1; train the metamodelwhile ym < q do

6: Xm+1 = Xm; ym+1 = ymfor T times do . Pseudo burn-in

X∗ ∼ K(Xm+1, ·); g̃(X∗) = y∗ . K is a kernel for Markov chain sampling9: If y∗ > ym+1, ym+1 = y∗ and Xm+1 = X∗

end forym+1 = g(Xm+1); train the metamodel

12: If ym+1 < ym, Xm+1 = Xm; ym+1 = ym; m = m+ 1end while

end for


Design pointsExample

Parabolic limit-state function: g : x ∈ R2 7−→ 5− x2 − 0.5(x1 − 0.1)2


Design pointsExample

A two-dimensional four branches serial system:

g : x ∈ R2 7−→ min

(3 +

(x1 − x2)2

10− | x1 + x2 |√

2,7√2− | x1 − x2 |

)


Plan




4 Design points

5 Conclusion


Conclusion

Conclusion

One considers Markov chains instead of samples

⇒ N is a number of processes

Lets de�ne parallel estimators for probabilities and quantiles (andmoments [8])

Twins of Monte Carlo estimators with a "log attribute": similarstatistical properties but adding a log to the 1/p factor:

var [p̂MC] ≈p2

Np→ var [p̂] ≈ p2 log 1/p

N

var [q̂MC] ≈p2

Nf(q)2p→ var [q̂] ≈ p2 log 1/p

Nf(q)2


Conclusion

Perspectives

Adaptation of quantile estimator for optimisation problem (min ormax)

Problem of conditional simulations (Metropolis-Hastings)

Best use of a metamodel

Adaptation for discontinuous RV


Bibliography I

S-K Au and J L Beck.Estimation of small failure probabilities in high dimensions by subsetsimulation.Probabilistic Engineering Mechanics, 16(4):263�277, 2001.

Charles-Edouard Bréhier, Ludovic Goudenege, and Loic Tudela.Central limit theorem for adaptative multilevel splitting estimators inan idealized setting.arXiv preprint arXiv:1501.01399, 2015.

Charles-Edouard Bréhier, Tony Lelievre, and Mathias Rousset.Analysis of adaptive multilevel splitting algorithms in an idealized case.arXiv preprint arXiv:1405.1352, 2014.

F Cérou, P Del Moral, T Furon, and A Guyader.Sequential Monte Carlo for rare event estimation.Statistics and Computing, 22(3):795�808, 2012.


Bibliography II

A Guyader, N Hengartner, and E Matzner-Løber.Simulation and estimation of extreme quantiles and extremeprobabilities.Applied Mathematics & Optimization, 64(2):171�196, 2011.

Eric Simonnet.Combinatorial analysis of the adaptive last particle method.Statistics and Computing, pages 1�20, 2014.

Clement Walter.Moving Particles: a parallel optimal Multilevel Splitting method withapplication in quantiles estimation and meta-model based algorithms.To appear in Structural Safety, 2014.

Clement Walter.Point process-based estimation of kth-order moment.arXiv preprint arXiv:1412.6368, 2014.


Merci !

Commissariat à l'énergie atomique et aux énergies alternatives

CEA, DAM, DIF, F-91297 Arpajon, France

Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019

CEA

DAM

DIF

Rare event simulation - a Point Process interpretation ... · Rare event simulation a Point Process...

Documents

Transcript of Rare event simulation - a Point Process interpretation ... · Rare event simulation a Point Process...