A Parallel Ensemble Kalman Filter Implementation Based On ... · Local Ensemble Transform Kalman...

29
A Parallel Ensemble Kalman Filter Implementation Based On Modified Cholesky Decomposition Elias D. Nino-Ruiz and Adrian Sandu Computational Science Laboratory, Virginia Polytechnic Institute and State University, Blacksburg, VA 24060, USA [email protected], [email protected] November 16, 2015 [1/29] November 16, 2015. (http://csl.cs.vt.edu)

Transcript of A Parallel Ensemble Kalman Filter Implementation Based On ... · Local Ensemble Transform Kalman...

  • A Parallel Ensemble Kalman Filter ImplementationBased On Modified Cholesky Decomposition

    Elias D. Nino-Ruiz and Adrian SanduComputational Science Laboratory,

    Virginia Polytechnic Institute and State University,Blacksburg, VA 24060, USA

    [email protected], [email protected]

    November 16, 2015

    [1/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Table of contents

    Problem Formulation

    Local Ensemble Transform Kalman Filter

    EnKF Based on Modified Cholesky DecompositionEstimating B−1

    EnKF implementations based on MCDomain decomposition

    Experimental Settings

    Conclusions

    [2/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Problem Formulation I

    I We want to estimate x∗ ∈ Rn×1,

    x∗k =Mtk−1→tk(x∗k−1

    ),

    model dimension: n.

    I Based on (assuming Gaussian errors):I A prior estimate (best estimate prior measurements):

    xb = x∗ + ξ , with ξ ∼ N (0n, B)

    where B ∈ Rn×n is unknown.I A noisy observation:

    y = H (x∗) + �, with � ∼ N (0m, R) ,

    number of observed model components: m. Rm×m is the data errorcovariance matrix. H : Rn×1 → Rm×1. With m� n or m < n.

    Problem Formulation [3/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Problem Formulation II

    I Bayesian approximation (posterior state):

    xa = xb + B ·HT ·[R + H · B ·HT

    ]−1· d ∈ Rn×1

    where d = y −H(xb)∈ Rm×1 and H′ ≈ H ∈ Rm×N .

    I How can we estimate B?

    I Background error statistics of any model state x ∈ Rn×1:

    x ∼ N(xb, B

    ).

    Problem Formulation [4/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Problem Formulation III

    I Empirical moments of an ensemble:

    Xb =[xb[1], xb[2], . . . , xb[N]

    ]∈ Rn×N .

    xb ≈ xb = 1N

    N∑i=1

    xb[i ] ∈ Rn×1 , B ≈ Pb = S · ST ∈ Rn×n

    where S = 1√N−1 ·

    [Xb − xb ⊗ 1TN

    ]∈ Rn×N .

    Problem Formulation [5/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Problem Formulation IV

    I The posterior (analysis) ensemble:

    Xa = Xb + Pb ·HT ·[R + H · Pb ·HT

    ]−1·D ∈ Rn×N ,

    the i-th column of D ∈ Rm×N reads:

    d[i ] = y + �[i ] −H · xb[i ] ∈ Rm×1, for 1 ≤ i ≤ N,

    with (stochastic version of the filter):

    �[i ] ∼ N (0m, R) .

    Problem Formulation [6/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Problem Formulation V

    I Unfortunately, the number of samples N is much lower than themodel dimension n� N.

    I Pb is low-rank. (Spurious correlations)I Xa is computed in the ensemble space (few degrees of freedom)

    I Model dimensions are in the order of billions, while ensemble sizes inthe order of hundreds.

    I Model propagations are computationally expensive.

    I Computational effort of the analysis is high.

    I We do need HPC not only to speedup computations but to haveenough memory to represent ensemble members and to perform linearalgebra computations.

    Problem Formulation [7/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Local Ensemble Transform Kalman Filter (LETKF) I

    I One of the best parallel ensemble based implementations.

    I Analysis equations:I Perturbations: U = Xb − xb ⊗ 1TN ∈ Rn×N .I Optimality in ensemble space: Q = H ·U ∈ Rm×N ,

    P̃a =[(N − 1) · IN×N + QT · R−1 ·Q

    ]−1 ∈ RN×Nwa = P̃a ·QT · R−1 ·

    [y −H · xb

    ]W = wa ⊗ 1TN + Wa ∈ RN×N , Wa =

    [(N − 1) · P̃a

    ]1/2∈ RN×N

    I Analysis ensemble:

    Xa = xb ⊗ 1TN + U ·W ∈ Rn×N .

    I Domain localization [OHS+04, Kep00]:

    Local Ensemble Transform Kalman Filter [8/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Local Ensemble Transform Kalman Filter (LETKF) II

    1 2 3 4 5 6 7 8 9 10 11

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    (d) r = 11 2 3 4 5 6 7 8 9 10 11

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    (e) r = 3

    Local Ensemble Transform Kalman Filter [9/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Estimating B−1 I

    I When mi and mj are conditionally independent, C−1

    mi ,mj = 0.

    I We want to estimate B−1:I Recall U = Xb − xb ⊗ 1TN ∈ Rn×N . Thus, u[i ] ∼ N (0n, B), for

    1 ≤ i ≤ N.I Let x[i ] ∈ RN×1 the vector holding the i-th row across all columns of

    U, for 1 ≤ i ≤ n.I Then, the approximation of B−1 arises from:

    x[i ] =i−1∑j=1

    x[j] · βi,j + ξ[i ] ∈ RN×1

    EnKF Based on Modified Cholesky DecompositionEstimating B−1 [10/29]November 16, 2015. (http://csl.cs.vt.edu)

  • Estimating B−1 II

    I By the modified Cholesky (MC) decomposition for inverse covariancematrix estimation:

    B−1 ≈ B̂−1 = TT ·D−1 · T ∈ Rn×n

    B ≈ B̂ = T−1 ·D · T−1T ∈ Rn×n

    where T ∈ Rn×n is an unitary lower triangular matrix with{T}i,j = −βi,j and D ∈ Rn×n is a diagonal matrix with{D}i,i = var

    (ξ[i ])

    , for 1 ≤ j < i ≤ n.I

    I B̂−1 can be sparse, B̂ is not necessarily sparse. Structure of B̂−1

    depends on T.

    EnKF Based on Modified Cholesky DecompositionEstimating B−1 [11/29]November 16, 2015. (http://csl.cs.vt.edu)

  • Choosing the predecessors

    (f) Row-major (g) Column-major

    (h) r = 1 (i) Predecess.

    EnKF Based on Modified Cholesky DecompositionEstimating B−1 [12/29]November 16, 2015. (http://csl.cs.vt.edu)

  • 1 2 3 4 5 6 7 8 9 10 11

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    (a) r = 11 2 3 4 5 6 7 8 9 10 11

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    (b) r = 21 2 3 4 5 6 7 8 9 10 11

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    (c) r = 3

    nz = 1021

    0 50 100

    0

    20

    40

    60

    80

    100

    120

    (d) B̂−1 for r = 1

    nz = 2689

    0 50 100

    0

    20

    40

    60

    80

    100

    120

    (e) B̂−1 for r = 2

    nz = 4993

    0 50 100

    0

    20

    40

    60

    80

    100

    120

    (f) B̂−1 for r = 3

    EnKF Based on Modified Cholesky DecompositionEstimating B−1 [13/29]November 16, 2015. (http://csl.cs.vt.edu)

  • EnKF formulations based on modified Choleskydecomposition for inverse background error estimation

    I Primal:

    Xa = Xb +[B̂−1 + HT · R−1 ·H

    ]−1·HT · R−1 ·

    [Ys −H · Xb

    ]I Dual:

    Xa = Xb + X · VT ·[R + V · VT

    ]−1·[Ys −H ·H · Xb

    ]where T · X = D1/2 ∈ Rn×n and V = H · X ∈ Rm×n.

    I Efficient implementations [NRSA14, NRS15].

    EnKF Based on Modified Cholesky DecompositionEnKF implementations based on MC [14/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Domain Decomposition

    EnKF Based on Modified Cholesky DecompositionDomain decomposition [15/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Boundary Information

    EnKF Based on Modified Cholesky DecompositionDomain decomposition [16/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • AT-GCM - SPEEDY (Numerical Model)

    I SPEEDY is a simplified GCM developed at ICTP by Franco Molteniand Fred Kucharski.

    I Nicknamed SPEEDY, for ”Simplified Parameterizations,privitivE-Equation DYnamics”

    I It is a hydrostatic, s-coordinate, spectral-transform model in thevorticity-divergence form, with semi-implicit treatment of gravitywaves.

    I 8 layers, u, v , T and sh.

    I T-63 resolution (96 x 192)

    Experimental Settings [17/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Blueridge Super Computer @ VT

    I BlueRidge is a 408-node Cray CS-300 cluster.

    I Each node is outfitted with two octa-core Intel Sandy Bridge CPUsand 64 GB of memory.

    I Total of 6,528 cores and 27.3 TB of memory systemwide.

    I Eighteen nodes have 128 GB of memory.

    I In addition, 130 nodes are outfitted with two Intel MIC (Xeon Phi)coprocessors.

    Experimental Settings [18/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Experimental settings

    I Number of ensemble members 96.

    I 3 radius of influence are considered: 3, 4, 5.

    I Model is propagated for 2 days and then observations are assimilated.

    I Number of processors: 6 computing nodes (96 processors) up to 128computing nodes (2048 processors)

    I Fortran 90 and 77, MPI, LAPACK and BLAS.

    I 3 different observational networks.

    Experimental Settings [19/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Observational networks

    (a) H[1], p ∼ 12% (b) H[2], p ∼ 6% (c) H[3], p ∼ 4%

    Figure : Sparse observational networks. Observed components in black. pdenotes percentage of observed model components.

    Experimental Settings [20/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • RMSE for some configurations.

    Time (days)

    0 5 10 15 20 25

    Root M

    ea

    n S

    qu

    are

    Err

    or

    (RM

    SE

    )

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500Zonal Wind Component (U), (m/s)

    BackgroundEnKF-MCLETKF

    (a) r = 3, p ∼ 12%, uTime (days)

    0 5 10 15 20 25

    Root M

    ea

    n S

    qu

    are

    Err

    or

    (RM

    SE

    )

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500Meridional Wind Component (V) (m/s)

    BackgroundEnKF-MCLETKF

    (b) r = 4, p ∼ 12%, vTime (days)

    0 5 10 15 20 25

    Root M

    ea

    n S

    qu

    are

    Err

    or

    (RM

    SE

    )

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500Zonal Wind Component (U), (m/s)

    BackgroundEnKF-MCLETKF

    (c) r = 5, p ∼ 6%, u

    Experimental Settings [21/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Initial snapshots for r = 5 and p ∼ 4% for v

    (a) Reference

    (b) Background (c) EnKF-MC (d) LETKF

    Experimental Settings [22/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Initial snapshots for r = 5 and p ∼ 4% for u

    (a) Reference

    (b) Background (c) EnKF-MC (d) LETKF

    Experimental Settings [23/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • RMSE for different variables and # of computing nodes.

    Time (days)

    0 10 20 30

    Root M

    ean S

    quare

    Err

    or

    (RM

    SE

    )

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500Zonal Wind Component (U), (m/s)

    Background

    EnKF-MC 6

    EnKF-MC 16

    EnKF-MC 32

    EnKF-MC 48

    EnKF-MC 64

    EnKF-MC 96

    EnKF-MC-128

    LETKF 6

    LETKF 16

    LETKF 32

    LETKF 48

    LETKF 64

    LETKF 96

    LETKF 128

    (a) r = 5, p ∼ 4%, uTime (days)

    0 10 20 30

    Root M

    ean S

    quare

    Err

    or

    (RM

    SE

    )

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500Meridional Wind Component (V) (m/s)

    Background

    EnKF-MC 6

    EnKF-MC 16

    EnKF-MC 32

    EnKF-MC 48

    EnKF-MC 64

    EnKF-MC 96

    EnKF-MC-128

    LETKF 6

    LETKF 16

    LETKF 32

    LETKF 48

    LETKF 64

    LETKF 96

    LETKF 128

    (b) r = 5, p ∼ 4%, vTime (days)

    0 10 20 30

    Root M

    ean S

    quare

    Err

    or

    (RM

    SE

    )

    100

    150

    200

    250

    300

    350

    400

    450Specific Humidity (g/Kg)

    Background

    EnKF-MC 6

    EnKF-MC 16

    EnKF-MC 32

    EnKF-MC 48

    EnKF-MC 64

    EnKF-MC 96

    EnKF-MC-128

    LETKF 6

    LETKF 16

    LETKF 32

    LETKF 48

    LETKF 64

    LETKF 96

    LETKF 128

    (c) r = 5, p ∼ 4%, sh

    Experimental Settings [24/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Elapsed time.

    Computing nodes (x 16 processors)

    0 50 100 150

    Tim

    e (

    s)

    0

    50

    100

    150

    200

    250

    300

    EnkF-MC

    LETKF (No Cov. Estimation)

    Experimental Settings [25/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Conclusions

    I The proposed implementations outperforms the LETKF under theRMSE metric.

    I Parallel resources and domain decompositions can be exploited inorder to speedup the assimilation process.

    I Localization is implicit. Domain decomposition is used just forcomputational reasons.

    I The computational effort of the proposed method makes it attractivefor the use under realistic scenarios.

    Conclusions [26/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • Thank You.

    (John 3:16) For God so loved the world that he gave his one and onlySon, that whoever believes in him shall not perish but have eternal life.

    Conclusions [27/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • [Kep00] Christian L. Keppenne. Data Assimilation into aPrimitive-Equation Model with a Parallel Ensemble KalmanFilter. Monthly Weather Review, 128(6):1971–1981, 2000.

    [NRS15] EliasD. Nino-Ruiz and Adrian Sandu. Ensemble kalman filterimplementations based on shrinkage covariance matrixestimation. Ocean Dynamics, 65(11):1423–1439, 2015.

    [NRSA14] EliasD. Nino Ruiz, Adrian Sandu, and Jeffrey Anderson. AnEfficient Implementation of the Ensemble Kalman Filter Basedon an Iterative ShermanMorrison Formula. Statistics andComputing, pages 1–17, 2014.

    [OHS+04] Edward Ott, Brian R. Hunt, Istvan Szunyogh, Aleksey V.Zimin, Eric J. Kostelich, Matteo Corazza, Eugenia Kalnay, D. J.Patil, and James A. Yorke. A local ensemble kalman filter foratmospheric data assimilation. Tellus A, 56(5):415–428, 2004.

    Conclusions [28/29]

    November 16, 2015. (http://csl.cs.vt.edu)

  • [B̂−1 + HT · R−1 ·H

    ]−1=

    [XT · X + HT · R−1H

    ]−1=

    {XT ·

    [In×n + Q ·QT

    ]· X}−1

    = X−1 ·[In×n + Q ·QT

    ]−1· X−T

    where XT ·Q = HT · R−1/2.

    Conclusions [29/29]

    November 16, 2015. (http://csl.cs.vt.edu)

    Problem FormulationLocal Ensemble Transform Kalman FilterEnKF Based on Modified Cholesky DecompositionEstimating B-1EnKF implementations based on MCDomain decomposition

    Experimental SettingsConclusions