A Parallel Ensemble Kalman Filter Implementation Based On ... · Local Ensemble Transform Kalman...

A Parallel Ensemble Kalman Filter ImplementationBased On Modified Cholesky Decomposition

Elias D. Nino-Ruiz and Adrian SanduComputational Science Laboratory,

Virginia Polytechnic Institute and State University,Blacksburg, VA 24060, USA

[email protected], [email protected]

November 16, 2015

[1/29]

November 16, 2015. (http://csl.cs.vt.edu)

Table of contents

Problem Formulation

Local Ensemble Transform Kalman Filter

EnKF Based on Modified Cholesky DecompositionEstimating B−1

EnKF implementations based on MCDomain decomposition

Experimental Settings

Conclusions

[2/29]


Problem Formulation I

I We want to estimate x∗ ∈ Rn×1,

x∗k =Mtk−1→tk(x∗k−1

),

model dimension: n.

I Based on (assuming Gaussian errors):I A prior estimate (best estimate prior measurements):

xb = x∗ + ξ , with ξ ∼ N (0n, B)

where B ∈ Rn×n is unknown.I A noisy observation:

y = H (x∗) + �, with � ∼ N (0m, R) ,

number of observed model components: m. Rm×m is the data errorcovariance matrix. H : Rn×1 → Rm×1. With m� n or m < n.

Problem Formulation [3/29]


Problem Formulation II

I Bayesian approximation (posterior state):

xa = xb + B ·HT ·[R + H · B ·HT

]−1· d ∈ Rn×1

where d = y −H(xb)∈ Rm×1 and H′ ≈ H ∈ Rm×N .

I How can we estimate B?

I Background error statistics of any model state x ∈ Rn×1:

x ∼ N(xb, B

).



Problem Formulation III

I Empirical moments of an ensemble:

Xb =[xb[1], xb[2], . . . , xb[N]

]∈ Rn×N .

xb ≈ xb = 1N

N∑i=1

xb[i ] ∈ Rn×1 , B ≈ Pb = S · ST ∈ Rn×n

where S = 1√N−1 ·

[Xb − xb ⊗ 1TN

]∈ Rn×N .



Problem Formulation IV

I The posterior (analysis) ensemble:

Xa = Xb + Pb ·HT ·[R + H · Pb ·HT

]−1·D ∈ Rn×N ,

the i-th column of D ∈ Rm×N reads:

d[i ] = y + �[i ] −H · xb[i ] ∈ Rm×1, for 1 ≤ i ≤ N,

with (stochastic version of the filter):

�[i ] ∼ N (0m, R) .



Problem Formulation V

I Unfortunately, the number of samples N is much lower than themodel dimension n� N.

I Pb is low-rank. (Spurious correlations)I Xa is computed in the ensemble space (few degrees of freedom)

I Model dimensions are in the order of billions, while ensemble sizes inthe order of hundreds.

I Model propagations are computationally expensive.

I Computational effort of the analysis is high.

I We do need HPC not only to speedup computations but to haveenough memory to represent ensemble members and to perform linearalgebra computations.



Local Ensemble Transform Kalman Filter (LETKF) I

I One of the best parallel ensemble based implementations.

I Analysis equations:I Perturbations: U = Xb − xb ⊗ 1TN ∈ Rn×N .I Optimality in ensemble space: Q = H ·U ∈ Rm×N ,

P̃a =[(N − 1) · IN×N + QT · R−1 ·Q

]−1 ∈ RN×Nwa = P̃a ·QT · R−1 ·

[y −H · xb

]W = wa ⊗ 1TN + Wa ∈ RN×N , Wa =

[(N − 1) · P̃a

]1/2∈ RN×N

I Analysis ensemble:

Xa = xb ⊗ 1TN + U ·W ∈ Rn×N .

I Domain localization [OHS+04, Kep00]:

Local Ensemble Transform Kalman Filter [8/29]


Local Ensemble Transform Kalman Filter (LETKF) II

1 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

10

11

(d) r = 11 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

10

11

(e) r = 3

Local Ensemble Transform Kalman Filter [9/29]


Estimating B−1 I

I When mi and mj are conditionally independent, C−1

mi ,mj = 0.

I We want to estimate B−1:I Recall U = Xb − xb ⊗ 1TN ∈ Rn×N . Thus, u[i ] ∼ N (0n, B), for

1 ≤ i ≤ N.I Let x[i ] ∈ RN×1 the vector holding the i-th row across all columns of

U, for 1 ≤ i ≤ n.I Then, the approximation of B−1 arises from:

x[i ] =i−1∑j=1

x[j] · βi,j + ξ[i ] ∈ RN×1

EnKF Based on Modified Cholesky DecompositionEstimating B−1 [10/29]November 16, 2015. (http://csl.cs.vt.edu)

Estimating B−1 II

I By the modified Cholesky (MC) decomposition for inverse covariancematrix estimation:

B−1 ≈ B̂−1 = TT ·D−1 · T ∈ Rn×n

B ≈ B̂ = T−1 ·D · T−1T ∈ Rn×n

where T ∈ Rn×n is an unitary lower triangular matrix with{T}i,j = −βi,j and D ∈ Rn×n is a diagonal matrix with{D}i,i = var

(ξ[i ])

, for 1 ≤ j < i ≤ n.I

I B̂−1 can be sparse, B̂ is not necessarily sparse. Structure of B̂−1

depends on T.


Choosing the predecessors

(f) Row-major (g) Column-major

(h) r = 1 (i) Predecess.


1 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

10

11

(a) r = 11 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

10

11

(b) r = 21 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

10

11

(c) r = 3

nz = 1021

0 50 100

0

20

40

60

80

100

120

(d) B̂−1 for r = 1

nz = 2689

0 50 100

0

20

40

60

80

100

120

(e) B̂−1 for r = 2

nz = 4993

0 50 100

0

20

40

60

80

100

120

(f) B̂−1 for r = 3


EnKF formulations based on modified Choleskydecomposition for inverse background error estimation

I Primal:

Xa = Xb +[B̂−1 + HT · R−1 ·H

]−1·HT · R−1 ·

[Ys −H · Xb

]I Dual:

Xa = Xb + X · VT ·[R + V · VT

]−1·[Ys −H ·H · Xb

]where T · X = D1/2 ∈ Rn×n and V = H · X ∈ Rm×n.

I Efficient implementations [NRSA14, NRS15].

EnKF Based on Modified Cholesky DecompositionEnKF implementations based on MC [14/29]


Domain Decomposition

EnKF Based on Modified Cholesky DecompositionDomain decomposition [15/29]


Boundary Information

EnKF Based on Modified Cholesky DecompositionDomain decomposition [16/29]


AT-GCM - SPEEDY (Numerical Model)

I SPEEDY is a simplified GCM developed at ICTP by Franco Molteniand Fred Kucharski.

I Nicknamed SPEEDY, for ”Simplified Parameterizations,privitivE-Equation DYnamics”

I It is a hydrostatic, s-coordinate, spectral-transform model in thevorticity-divergence form, with semi-implicit treatment of gravitywaves.

I 8 layers, u, v , T and sh.

I T-63 resolution (96 x 192)

Experimental Settings [17/29]


Blueridge Super Computer @ VT

I BlueRidge is a 408-node Cray CS-300 cluster.

I Each node is outfitted with two octa-core Intel Sandy Bridge CPUsand 64 GB of memory.

I Total of 6,528 cores and 27.3 TB of memory systemwide.

I Eighteen nodes have 128 GB of memory.

I In addition, 130 nodes are outfitted with two Intel MIC (Xeon Phi)coprocessors.



Experimental settings

I Number of ensemble members 96.

I 3 radius of influence are considered: 3, 4, 5.

I Model is propagated for 2 days and then observations are assimilated.

I Number of processors: 6 computing nodes (96 processors) up to 128computing nodes (2048 processors)

I Fortran 90 and 77, MPI, LAPACK and BLAS.

I 3 different observational networks.



Observational networks

(a) H[1], p ∼ 12% (b) H[2], p ∼ 6% (c) H[3], p ∼ 4%

Figure : Sparse observational networks. Observed components in black. pdenotes percentage of observed model components.



RMSE for some configurations.

Time (days)

0 5 10 15 20 25

Root M

ea

n S

qu

are

Err

or

(RM

SE

)

0

500

1000

1500

2000

2500

3000

3500

4000

4500Zonal Wind Component (U), (m/s)

BackgroundEnKF-MCLETKF

(a) r = 3, p ∼ 12%, uTime (days)

0 5 10 15 20 25

Root M

ea

n S

qu

are

Err

or

(RM

SE

)

0

500

1000

1500

2000

2500

3000

3500

4000

4500Meridional Wind Component (V) (m/s)


(b) r = 4, p ∼ 12%, vTime (days)

0 5 10 15 20 25

Root M

ea

n S

qu

are

Err

or

(RM

SE

)

0

500

1000

1500

2000

2500

3000

3500

4000



(c) r = 5, p ∼ 6%, u



Initial snapshots for r = 5 and p ∼ 4% for v

(a) Reference

(b) Background (c) EnKF-MC (d) LETKF



Initial snapshots for r = 5 and p ∼ 4% for u

(a) Reference

(b) Background (c) EnKF-MC (d) LETKF



RMSE for different variables and # of computing nodes.

Time (days)

0 10 20 30

Root M

ean S

quare

Err

or

(RM

SE

)

500

1000

1500

2000

2500

3000

3500

4000


Background

EnKF-MC 6

EnKF-MC 16

EnKF-MC 32

EnKF-MC 48

EnKF-MC 64

EnKF-MC 96

EnKF-MC-128

LETKF 6

LETKF 16

LETKF 32

LETKF 48

LETKF 64

LETKF 96

LETKF 128

(a) r = 5, p ∼ 4%, uTime (days)

0 10 20 30

Root M

ean S

quare

Err

or

(RM

SE

)

500

1000

1500

2000

2500

3000

3500

4000

4500Meridional Wind Component (V) (m/s)

Background

EnKF-MC 6

EnKF-MC 16

EnKF-MC 32

EnKF-MC 48

EnKF-MC 64

EnKF-MC 96

EnKF-MC-128

LETKF 6

LETKF 16

LETKF 32

LETKF 48

LETKF 64

LETKF 96

LETKF 128

(b) r = 5, p ∼ 4%, vTime (days)

0 10 20 30

Root M

ean S

quare

Err

or

(RM

SE

)

100

150

200

250

300

350

400

450Specific Humidity (g/Kg)

Background

EnKF-MC 6

EnKF-MC 16

EnKF-MC 32

EnKF-MC 48

EnKF-MC 64

EnKF-MC 96

EnKF-MC-128

LETKF 6

LETKF 16

LETKF 32

LETKF 48

LETKF 64

LETKF 96

LETKF 128

(c) r = 5, p ∼ 4%, sh



Elapsed time.

Computing nodes (x 16 processors)

0 50 100 150

Tim

e (

s)

0

50

100

150

200

250

300

EnkF-MC

LETKF (No Cov. Estimation)



Conclusions

I The proposed implementations outperforms the LETKF under theRMSE metric.

I Parallel resources and domain decompositions can be exploited inorder to speedup the assimilation process.

I Localization is implicit. Domain decomposition is used just forcomputational reasons.

I The computational effort of the proposed method makes it attractivefor the use under realistic scenarios.

Conclusions [26/29]


Thank You.

(John 3:16) For God so loved the world that he gave his one and onlySon, that whoever believes in him shall not perish but have eternal life.

Conclusions [27/29]


[Kep00] Christian L. Keppenne. Data Assimilation into aPrimitive-Equation Model with a Parallel Ensemble KalmanFilter. Monthly Weather Review, 128(6):1971–1981, 2000.

[NRS15] EliasD. Nino-Ruiz and Adrian Sandu. Ensemble kalman filterimplementations based on shrinkage covariance matrixestimation. Ocean Dynamics, 65(11):1423–1439, 2015.

[NRSA14] EliasD. Nino Ruiz, Adrian Sandu, and Jeffrey Anderson. AnEfficient Implementation of the Ensemble Kalman Filter Basedon an Iterative ShermanMorrison Formula. Statistics andComputing, pages 1–17, 2014.

[OHS+04] Edward Ott, Brian R. Hunt, Istvan Szunyogh, Aleksey V.Zimin, Eric J. Kostelich, Matteo Corazza, Eugenia Kalnay, D. J.Patil, and James A. Yorke. A local ensemble kalman filter foratmospheric data assimilation. Tellus A, 56(5):415–428, 2004.

Conclusions [28/29]


[B̂−1 + HT · R−1 ·H

]−1=

[XT · X + HT · R−1H

]−1=

{XT ·

[In×n + Q ·QT

]· X}−1

= X−1 ·[In×n + Q ·QT

]−1· X−T

where XT ·Q = HT · R−1/2.

Conclusions [29/29]


Problem FormulationLocal Ensemble Transform Kalman FilterEnKF Based on Modified Cholesky DecompositionEstimating B-1EnKF implementations based on MCDomain decomposition

Experimental SettingsConclusions

A Parallel Ensemble Kalman Filter Implementation Based On ... · Local Ensemble Transform Kalman...

Documents

Transcript of A Parallel Ensemble Kalman Filter Implementation Based On ... · Local Ensemble Transform Kalman...