Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of...

49
Applications of Information Theory in Ensemble Data Assimilation Dusanka Zupanski 1 , Arthur. Y. Hou 2 , Sara Q. Zhang 2 , Milija Zupanski 1 , Christian D. Kummerow 1 , and Samson H. Cheung 3 1 Colorado State University, Fort Collins, Colorado 2 NASA Goddard Space Flight Center, Greenbelt, Maryland 3 University of California, Davis, California Manuscript submitted to Quart. J. Roy. Meteor. Soc. May 6, 2007 (2 tables, 7 figures) Corresponding author address: Dusanka Zupanski, Cooperative Institute for Research in the Atmosphere/Colorado State University, Fort Collins, Colorado, 80523-1375; E-mail: [email protected]

Transcript of Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of...

Page 1: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

Applications of Information Theory in Ensemble Data Assimilation

Dusanka Zupanski1, Arthur. Y. Hou2, Sara Q. Zhang2, Milija Zupanski1,

Christian D. Kummerow1, and Samson H. Cheung3

1Colorado State University, Fort Collins, Colorado

2NASA Goddard Space Flight Center, Greenbelt, Maryland

3University of California, Davis, California

Manuscript submitted to Quart. J. Roy. Meteor. Soc.

May 6, 2007

(2 tables, 7 figures)

Corresponding author address:

Dusanka Zupanski, Cooperative Institute for Research in the Atmosphere/Colorado State

University, Fort Collins, Colorado, 80523-1375; E-mail: [email protected]

Page 2: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

2

SUMMARY

We apply information theory within an ensemble-based data assimilation approach and

define information matrix in ensemble subspace. The information matrix in ensemble subspace

employs a flow-dependent forecast error covariance and it is of relatively small dimensions

(equal to the ensemble size). The information matrix in ensemble subspace can be directly linked

to the information matrix typically used in non-ensemble based data assimilation methods, such

as the Kalman Filter (KF) and the 3-dimensioanl variational (3d-var) methods, which provides a

framework for consistent comparisons of information measures between different data

assimilation methods.

We evaluate information measures, such as degrees of freedom for signal, within the

Maximum Likelihood Ensemble Filter (MLEF) data assimilation approach and compare them

with those obtained using the KF approach and the 3d-var approach. We assimilate model-

simulated observations and use the Goddard Earth Observing System Single Column Model

(GEOS-5 SCM) as a dynamical forecast model.

The experimental results demonstrate that the proposed framework is useful for

comparing information measures obtained in different data assimilation approaches. These

comparisons indicate that using a flow-dependent forecast error covariance matrix (e.g., as in the

KF and the MLEF experiments) is fundamentally important for adequately describing prior

knowledge about the true model state when calculating information measures of assimilated

observations. We also demonstrate that data assimilation results obtained using the KF and the

MLEF approach (when ensemble size is larger than 10 ensemble members) are superior to the

results of the 3d-var approach.

Page 3: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

3

Keywords: Ensemble data assimilation, Information theory,

Maximum Likelihood Ensemble Filter, Kalman filter, 3d-var

Page 4: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

4

1. INTRODUCTION

It has been recognized that information theory (e.g., Shannon and Weaver 1949; Rodgers

2000) and predictability are inherently related (e.g., Schneider and Griffies 1999; Kleeman 2002;

Roulston and Smith 2002; DelSole 2004; Abramov et al. 2005). Information theory has also

come to the attention of data assimilation, where it has been used to calculate information

content of various observations (e.g., Wahba 1985; Purser and Huang 1993; Wahba et al. 1995;

Rodgers 2000; Rabier et al. 2002; Fisher 2003; Johnson 2003; Engelen and Stephens 2004;

L’Ecuyer et al. 2006). Information content of observations can potentially have many

applications, including planning measurement missions, designing observational systems and

defining targeted observations and data selection strategies. These applications have been

underutilized so far, and were mainly oriented towards defining data selection strategies (e.g.,

Rabier et al. 2002 and references therein). Nevertheless, progress in data assimilation methods

should foster applications of information theory in many different areas.

Ensemble-based data assimilation methods, often referred to as Ensemble Kalman Filter

(EnKF) methods, are novel data assimilation techniques that have rapidly progressed since the

pioneering work of Evensen (1994) has appeared. As a result of this progress, many different

variants of the EnKF have evoloved (e.g., Pham et al. 1997, 1998; Houtekamer and Mitchell

1998; Lermusiaux and Robinson 1999; Hamill and Snyder 2000; Keppenne 2000; Mitchell and

Houtekamer 2000; Anderson 2001; Bishop et al. 2001; van Leeuwen 2001; Pham 2001; Reichle

et al. 2002a,b; Whitaker and Hamill 2002; Hoteit et al. 2002, 2003; Tippett et al. 2003; Zhang et

al. 2004; Ott et al. 2005; Szunyogh et al. 2005; Peters et al. 2005; Zupanski 2005; Zupanski and

Zupanski 2006, just to mention some). While there are respectful differences between different

Page 5: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

5

variants the EnKF, they are all closely related in data assimilation problems involving Gaussian

Probability Density Functions (PDFs) and linear dynamical forecast models. In such case, the

EnKFs share the common property of being rank-reduced approximations to the theoretically

optimal, full-rank KF solution. Under more general conditions, involving highly non-linear

dynamical models and non-Gaussian PDFs the differences between different EnKFs could be

more significant (e.g., Fletcher and Zupanski 2006).

Even though there was much advancement in the EnKF methods, this was not matched

by applications of information theory within these methods. In fact, information theory has

primarily been applied within other data assimilation methods (e.g., variational, KF), while its

application to ensemble data assimilation has been rather limited so far. Some of the pioneering

studies in this area are as follows. Wang and Bishop (2003) examined the eignevalues and

eigenvectors of the Ensemble Transform Kalman Filter (ETKF, Bishop et al. 2001 and Wang and

Bishop 2003) transformation matrix and demonstrated that these eigenvalues and eigenvectors

define the amount and the direction of the maximum forecast error reduction due to information

from the observations. Patil et al. (2001), Oczkowski et al. (2005), and Wei et al. (2006) used the

eigenvalues of the ETKF transformation matrix to define measures of information, referred to as

“bred dimension”, “effective degrees of freedom”, and “E dimension”, respectively. These

studies have recognized that ensemble-based methods have a potential to improve measures of

information due to the use of flow-dependent forecast error covariance matrix, especially in

applications to adaptive observations. A recent study by Uzunoglu et al. (2007) described a novel

application of information measures in ensemble data assimilation: for ensemble size reduction

or inflation.

Page 6: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

6

Building upon the previous studies, and recognizing that there is similarity between the

ETKF and the MLEF approach, we link the MLEF transformation matrix with the so-called

information or observability matrix, defined in ensemble subspace. We also demonstrate how the

information matrix can be used to define standard measures of information theory, such as

Degrees of Freedom (DOF) for signal and Shannon entropy reduction (e.g. Rodgres 2000). Thus,

we propose a general framework to link together ensemble data assimilation and information

theory in a similar manner as in variational and KF methods. This framework can be used for

comparing information measures of different data assimilation approaches. Additionally, as

demonstrated in Zupanski et al. (2007), the information measures in ensemble subspace can be

employed to define a flow-dependent “distance” function for covariance localization. We

evaluate this framework within an ensemble-based data assimilation method, using a single

column precipitation model and simulated observations. We also evaluate the results of the KF

and the 3-dimensional variational (3d-var) approaches, defined as special applications of the

proposed framework.

The paper is organized as follows. In section 2 the general framework is described. The

experimental design is explained in section 3, and experimental results are presented in section 4.

Finally, in section 5, the conclusions are summarized and their relevance for future research is

discussed.

2. GENERAL FRAMEWORK

In this study we employ an ensemble data assimilation approach referred to as Maximum

Likelihood Ensemble Filter (MLEF, Zupanski 2005; Zupanski and Zupanski 2006; Zupanski et

Page 7: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

7

al. 2006). Here we shortly describe the MLEF. The MLEF seeks a maximum likelihood state

solution employing an iterative minimization of a cost function. The solution for a state vector x

(also referred to as control variable), of dimension Nstate, is obtained by minimizing the cost

function J defined as

J(x) =

1

2[x ! xb ]

TPf

!1[x ! xb ]+

1

2[ y ! H (x)]

TR

!1[ y ! H (x)] , (1)

where y is an observation vector of dimension equal to the number of observations (Nobs), and H

is, in general, a non-linear observation operator. Subscript b denotes a background (i.e., prior)

estimate of x, and superscript T denotes a transpose. The Nobs ×Nobs matrix R is a prescribed

observation error covariance, and it includes instrumental and representativeness errors (e.g.,

Cohn 1997). The matrix Pf of dimension Nstate×Nstate is the forecast error covariance. As in many

other ensemble-based methods, we do not use the full matrix Pf explicitly, but we employ the

rank-reduced square-root formulation

Pf = Pf

1

2 Pf

1

2( )T

, where Pf

1

2 is an Nstate×Nens square-root

matrix (Nens being the ensemble size).

Uncertainties of the optimal estimate of the state x are defined as square roots of the

analysis error covariance ( Pa

1

2 ) and the forecast error covariance ( Pf

1

2 ), both defined in ensemble

subspace. The square root of the analysis error covariance is obtained as (e.g., Zupanski 2005)

!Pa

12 = pa

1pa2... pa

Nens!" #$!!= Pf

12 (Iens +C )

% 12 , (2)

where Iens

is an identity matrix of dimension Nens× Nens, and pa

i are column vectors representing

analysis perturbations in ensemble subspace. The square root in (2) is calculated via eigenvalue

Page 8: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

8

decomposition of C. It is defined as a symmetric positive semi-definite square root, and therefore

it is unique (e.g., Horn and Johnson 1985, Theorem 7.2.6).

Matrix C has dimensions Nens×Nens and is defined by

ZZCT

= ; zi= R

!12H (x + p f

i) ! R

!12H (x) , (3)

where vectors zi are the columns of the matrix Z of dimension Nobs×Nens. Note that, when

calculating zi, a nonlinear operator H is applied to perturbed and unperturbed states x. Vectors

i

fp are columns of the square root of the background error covariance matrix and are obtained

via ensemble forecasting employing a non-linear forecast model M:

!Pf

1

2 = p f

1p f

2... p f

Nens!" #$! ; p f

i= M (x + pa

i) ! M (x) . (4)

Equations (1)-(3), are solved iteratively in each data assimilation cycle, while Eq. (4) is used to

propagate in time the columns of the forecast error covariance matrix Pf

1

2 .

An information measure referred to as the DOF for signal is often used in information

theory (e.g., Rodgers 2000). In data assimilation applications, DOF for signal (here denoted ds) is

commonly defined in terms of analysis and forecast error covariances, Paand

Pf , (e.g., Wahba

1985; Purser and Huang 1993; Wahba et al. 1995; Rodgers 2000; Rabier et al. 2002; Fisher

2003; Johnson 2003; Engelen and Stephens 2004) as

ds = tr [Istate ! PaPf

!1] , (5a)

Page 9: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

9

where tr denotes trace, and Istate

is an identity matrix of dimension Nstate× Nstate. The quantity ds

counts the number of new pieces of information brought to the analysis by the observations, with

respect to what was already known, as expressed by Pf . Being dependent on the ratio between

the analysis and forecast error covariance ( PaPf

!1 ), ds measures the forecast error reduction due

to new information from the observations. Wahba et al. (1995) define ds in terms of so-called

influence matrix A as

ds= tr [R

- 12 HP

aH

TR- 12 ] = tr [A] , (5b)

which is equivalent to (5a), as pointed out by Fisher (2003).

Employing definition of Pa in ensemble subspace (2) and using

tr [xx

T] = tr [x

Tx] we

can write (5b) in ensemble subspace as

ds = tr [(Iens +C )

!1(Pf

12 )T

HT(R

!12 )T

R!12 HPf

12 ] . (6)

Assuming that the linear operator H is the first derivative of a weakly non-linear operator H at

the point x, we can write the following approximate equation for the columns ri of the matrix

R

!1

2 HPf

1

2

ri ! R

"12H (x + p f

i) " R

"12H (x) . (7)

Finally, by combining (3), (6), and (7) we have

Page 10: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

10

ds= tr [(I

ens+C )

!1Z

TZ ] = tr [(I

ens+C )

!1C ] . (8)

Definition (8) is essentially the same as Eq. (2.61) of Rodgers (2000). The only difference is that

the trace is obtained employing matrix C of dimension Nens×Nens, while in the formulation of

Rodgers (2000), the trace is obtained employing an information matrix of dimensions Nstate×Nstate

(the full-rank information matrix). We will denote matrix C as the information matrix in

ensemble subspace.

By introducing information matrix C, we have defined a link between information theory

and ensemble data assimilation. Having this link is of special importance for the following

reasons. When calculating information content measures such as ds, a flow-dependent

Pf obtained directly from ensemble data assimilation is used. In addition, eigen-decomposition

of C is easily accomplished due to the relatively small size of this matrix (Nens×Nens) compared to

the typical number of observations (Nobs) used in applications to complex forecast models with

large state vectors (of dimension Nstate). A possible disadvantage of this ensemble-based

approach, as of any ensemble-based approach, is that a small ensemble size might not be

sufficient to adequately describe the variability of the full-rank forecast error covariance matrix.

In such cases, the information measures would still measure the amount of information brought

by the observations with respect to what was already known, however, the quality of the analysis

could be poor. One of the main focuses of this study is to evaluate the impact of ensemble size

on the information measures.

Page 11: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

11

Once the information matrix C is available, various information measures can be

calculated. It is especially useful to define these measures in terms of the eigenvalues 2

i! of C.

Thus, as in Rodgers (2000), we can define (8) in terms of 2

i! and calculate ds as

ds=

!i

2

(1+ !i

2)i

" . (9)

Equations (3) and (7) indicate that the eigeinvalues !i

2 depend on the ratio between the

forecast error covariance and the observation error covariance, both defined in the observation

locations. Thus, for the forecast errors larger than the observation errors we have !i

2" 1 (signal),

and for the forecast errors smaller than the observation errors we have !i

2< 1 (noise). Using

eigenvalues !i

2 one can also calculate other information measures, such as Shannon information

content defined as the reduction of entropy due to added information from the observations

(Shannon and Weaver 1949; Rodgers 2000). Since this measure is quite similar to DOF for

signal, it will not be examined in this study.

An important characteristic of the MLEF approach is that it can be made identical to KF

or variational methods, under special conditions that are explained below. This provides an

opportunity to compare information measures obtained using different data assimilation

approaches.

(a) Connection to KF

A linear version of the full-rank MLEF is identical to the classical linear KF when using

Gaussian PDFs, linear models M, and linear observation operators H. The full-rank MLEF

Page 12: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

12

solution is obtained by setting Nens=Nstate. Under these conditions, the solution that minimizes (1)

can be explicitly calculated using (e.g., Zupanski 2005, Appendix A, Eq. A7)

x = xb +!Pf H

T(HPf

TH

T+ R)

"1[ y " H (xb )] . (10)

If both the KF and the MLEF are initialized using the same forecast error covariance the MLEF

solution after the first data assimilation cycle (Eq. 10) will be identical to the KF solution,

because the minimization step-size α is equal to 1 for quadratic cost functions (Gill et al. 1981).

The MLEF solution will remain identical to the KF solution throughout all data assimilation

cycles, since the linear version of the forecast error covariance update (Eq. 4) is the same as the

KF update equation. Thus, we can conclude that the full-rank MLEF (Nens=Nstate) is identical to

the full-rank KF under the above assumptions. Under the same assumptions, the reduced-rank

MLEF (Nens<Nstate) could be interpreted as a variant of a reduced-rank KF, since the same

equations are being solved in both approaches, however, a reduced rank Pf is used. Since

different variants of the reduced-rank KF would produce different solution due to different ways

of defining a reduced rank Pf , the link between the reduced-rank MLEF and reduced-rank KFs

is not uniquely defined.

(b) Connection to 3d-var

As explained before, the solution obtained by the MLEF is a maximum likelihood one,

and, in general, a non-linear one. These characteristics are shared with variational methods, thus

there is a connection to these methods as well. The full-rank non-linear MLEF solution without

the update of the forecast error covariance [i.e., using a prescribed covariance instead of Eq. (4)]

Page 13: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

13

is identical to the 3d-var solution, since the same cost function (1) is minimized. To obtain

identical results, one can employ the same minimization method with the same preconditioning

in both the MLEF and the 3d-var (e.g., Zupanski 2005). In this study we employ Hessian

preconditioning, which may not be always feasible in variational methods due to large

dimensions of the full-rank covariance matrices.

In summary, the general framework proposed here should be directly applicable not only

to EnKF methods, but also to KF and 3d-var methods, as long as it remains practical to evaluate

full-rank covariance matrices. There are, however, some restrictions to the proposed general

framework. For example, when deriving information measures (e.g., DOF for signal and entropy

reduction) we have assumed, as in Rodgers 2000, that all errors are Gaussian. Therefore, we

have implicitly assumed weak nonlinearity in M and H, even though ensemble-based and

variational methods do not necessarily require this assumption. Consequently, the information

measures obtained in highly non-linear data assimilation problems, and also for variables that are

typically non-Gaussian (e.g., humidity and cloud microphysical variables) could be incorrect, or

only approximately correct. A theoretical framework for information measures employing non-

Gaussian ensembles is proposed in Majda et al. (2002) and Abramov and Majda (2004). They

have employed a different approach, based on the moment constraint optimization, to estimate

the so-called “predictive utility”, which is an information measure derived from the Shannon

entropy. As shown in Abramov and Majda (2004), higher order moments, up to first four

moments, would be required for non-Gaussian information measures in typical atmospheric

applications. The framework proposed here could be further generalized following Majda et al.

(2002) and Abramov and Majda (2004). An extension of the MLEF to account for log-normally

distributed observations has already been developed by Fletcher and Zupanski (2006) and could

Page 14: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

14

be used as a starting point for defining non-Gaussian information measures within the MLEF. As

indicated in Fletcher and Zupanski (2006), the cost function should include an additional term in

order to account for log-normally distributed observations.

3. EXPERIMENTAL DESIGN

(a) Forecast model

A single column version of the GEOS-5 Atmospheric General Circulation Model

(AGCM) is used in this study. We refer to this model as GEOS-5 SCM (Single Column Model).

Previous experience employing column versions of the GEOS-series within a 1-dimensional

variational data assimilation technique indicated that the 1-dimensional framework could

produce useful data assimilation results, especially in applications to rainfall assimilation (Hou et

al. 2000, 2001, 2004).

The GEOS-5 SCM consists of the model physics components of the GEOS-5 AGCM:

moist processes (Relaxed Arakawa-Schubert convection and prognostic large-scale cloud

condensation), turbulence, radiation, land surface, and chemistry. The dynamic advection is

driven by prescribed forcing time series. The column model is capable of updating all the

prognostic state variables and evaluating of a suite of additional observable quantities such as

precipitation and cloud properties. The GEOS-5 SCM retains most of the non-linear complexities

and interaction between physical processes as in the full AGCM. In the meanwhile, it has the

advantage of reduced dimensions when it is used in the research experiments of ensemble data

assimilation.

Page 15: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

15

(b) Control variable, observations

In the applications of this paper, we focus on using simulated observations directly on

two state variables: vertical profiles of temperature (T) and specific humidity (q). They are also

the control variables for data assimilation. In the experiments presented, 40 model levels are

used. Thus, the dimension of the control vector is 80. The column model only updates

temperature and specific humidity during a data assimilation interval. Remaining state variables,

along with the advection forcing, are prescribed by the Atmospheric Radiation Measurement

(ARM) data time series. The Tropical Western Pacific site (130E, 15N) in ARM observation

program is chosen for the application discussed in this paper. The assimilation experiments cover

the period from 7 May 1998 to 24 May 1998 (17 days).

A data assimilation interval of 6 hours is used in the experiments, and simulated

observations of temperature and specific humidity are assimilated at the end of each data

assimilation interval. Simulated observations are defined using the “true” state, defined by the

GEOS-5 SCM, and by adding Gaussian white noise to the “true” state. Thus, the observation

error covariance matrix R is assumed diagonal and constant in time. We use the same version of

the model to perform data assimilation and to create observations, thus we assume that the model

is perfect. In experiments with real observations the perfect model assumption might not be

justified. In order to relax this assumption one can use some of the recently proposed model error

estimation approaches (e.g., Heemink et al. 2001; Mitchell et al. 2002; Reichle et al. 2002a;

Zupanski and Zupanski 2006).

Page 16: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

16

Observations are created assuming an instrument error of 0.2 K for T at all model levels

( Rinst

1 2= 0.2K ). Instrument errors for q vary between

Rinst

1 2= 6.1*10

!8 and Rinst

1 2= 7.9 *10

!4 ; the

errors are defined to decrease from the lowest to the highest model level. The total observation

errors are defined as R1 2

= !Rinst

1 2 , where an empirical parameter α>1 is employed to

approximately account for representativeness errors. Here, the representativeness error

approximately accounts for the mismatch between the observed and modeled scales (which is a

common definition of representativeness error), and also for inadequate scales in the forecast

error covariance due to the small ensemble size. To approximately account for both parts of the

representativeness error we require that for the largest ensemble size the parameter α is greater

than 1, and we let it increase with decreasing ensemble size. The values of the parameter α are

tuned to the ensemble size to approximately satisfy the expected chi-square innovation statistic,

calculated for optimized innovations and normalized by the analysis error (e.g., Dee et al. 1995;

and Menard et al. 2000; Zupanski 2005). Instrument errors and the values of the parameter α

used in data assimilation experiments of this study are listed in Table 1.

Initial conditions for T and q at the beginning of the first data assimilation cycle are from

ARM observations of T and q at the time (0000 UTC 07 May 1998), and they are interpolated

from observation levels to the model levels. With this configuration errors in the initial

conditions are simulated by the difference between ARM observations and the “true” states

defined by the model simulation (started from 1800 UTC 06 May 1998 and integrated for 6

hours to 0000 UTC 07 May1998). This has resulted in Root Mean Square (RMS) errors of 0.46

K for Tb and 4.8×10-4 for qb in the first data assimilation cycle (recall that subscript b denotes

Page 17: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

17

background values). In all subsequent cycles, the 6-h forecast of T and q from the previous cycle

is used to define the background for the current cycle.

(c) Ensemble perturbations

The square root forecast error covariance Pf

1

2 is initialized in the first data assimilation

cycle using prescribed perturbations i

fp (cold start); in the subsequent cycles the data

assimilation scheme updates i

fp according to Eqs. (2)-(4). The cold start ensemble perturbations

are defined using Gaussian white noise with prescribed standard deviation of comparable

magnitude to the observations errors. A compactly supported second-order correlation function

of Gaspari and Cohn (1999), with decorrelation length of 3 vertical layers, is applied to the

random perturbations to define a correlated random noise (e.g., Zupanski et al. 2006). The

decorrelation length of 3 vertical layers was determined empirically, based on the overall best

data assimilation performance.

(d) Minimization

A conjugate gradient minimization algorithm (e.g., Luenberger 1984), with the line-

search technique as in Navon et al. (1992) and with Hessian preconditioning as in Zupanski

(2005), is used in the experiments of this paper. In all data assimilation experiments, only a

single iteration of the minimization is performed, which is sufficient for linear observation

operators (Zupanski 2005). Note that non-linearity of the forecast model M, even though it

influences the final data assimilation results, does not influence the minimization results within a

Page 18: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

18

filter formulation. This would be, however, different for a smoother application, since the non-

linear model would be included in the cost function.

(e) Covariance localization

Covariance localization is often used in ensemble-data assimilation applications to better

constrain the data assimilation problems with either insufficient observations or insufficient

ensemble size (e.g., Houtekamer and Mitchell 1998; Hamill et al. 2001; Whitaker and Hamill

2002). The localization was also found beneficial in the full-rank KF filter applications due to

spurious loss of variance in the discrete KF covariance evolution equation (e.g., Menard et al.

2000). Since covariance localizations are typically achieved by employing arbitrary covariance

functions (e.g., Gaspari and Cohn 1999) it is important to evaluate how such localizations impact

the information measures.

We use a localization technique based on Schur (element-wise) product between the

forecast error covariance matrix and a compactly supported covariance function (e.g.,

Houtekamer and Mitchell 1998; Hamill et al. 2001; Whitaker and Hamill 2002). Since the

dimensions of the full forecast error covariance are small (80×80), we evaluate the full

covariance Pf and multiply it, element-wise, with the localization function (the second-order

correlation function of Gaspari and Cohn 1999 with decorrelation length of 3 vertical layers). As

a result, we obtain a localized Pf . We then perform the eigenvalue decomposition of the

localized covariance and keep only the Nens leading eigenvalues and eigenvectors in data

assimilation. Note, however, that covariance localization could be achieved in a different (i.e.,

Page 19: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

19

approximate) way in applications to large-size problems (e.g., Whitaker and Hamill 2002; Ott et

al. 2005).

4. RESULTS

(a) Verification summary

Verifications of data assimilation experiments listed in Table 1 are performed in terms of

analysis and background errors and the chi-square innovation statistic tests (e.g., Dee et al. 1995;

and Menard et al. 2000; Zupanski 2005). The verification summary is given in Table 2. The

RMS errors of the analysis and the 6-h forecast (background) are calculated with respect to the

truth as mean values over 70 consecutive data assimilation cycles. The mean values and the

standard deviations of the chi-square statistic are calculated over 70 data assimilation cycles

from the chi-square statistic values obtained in the individual data assimilation cycles. Note that

the ergodic hypothesis was made when calculating the mean chi-square values: sample mean was

replaced by time mean, calculated over 70 data assimilation cycles.

The results in Table 2 indicate superior performance of the KF approach, and also good

performance of the MLEF approach, with the RMS errors decreasing as the ensemble size

increases, and also as the number of observation increases, which is an expected performance. In

comparison to the 3d-var experiment, the MLEF errors are generally smaller for larger ensemble

sizes (20 and 40 members) and larger for the smallest ensemble size (10 members). The analysis

errors of the MLEF experiments with 80 observations are within the estimated total observation

errors (note that the total observation errors also include empirical represenativeness errors). The

Page 20: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

20

analysis and background errors of all experiments are smaller than the errors of the experiment

without data assimilation (no_obs), thus indicating a positive impact of data assimilation. Table 2

also indicates that covariance localization, which is applied in the experiment with 10 ensemble

members and 40 observations, reduces analysis and background errors (compare experiments

10ens_40obs and 10ens_40obs_loc).

Mean values of the chi-square statistic indicate that the experiments are generally within

20% difference from the expected value of 1, with standard deviations within 15%-34%, with the

exception of the 3d-var experiment. In the 3d-var experiment there are larger fluctuations of the

chi-square statistic from one data assimilation cycle to another (the standard deviation is 78%)

which is a consequence of using a constant forecast error covariance in all data assimilation

cycles. Note that the chi-square values larger (smaller) than 1 indicate an underestimation

(overestimation) of the forecast error variance. One should, however, expect departures from the

expected chi-square statistic, since the Gaussian assumption is not strictly valid due to non-

linearity of the forecast model. The chi-square values calculated in individual data assimilation

cycles indicated no time increasing or decreasing trends, meaning that all data assimilation

experiments had stable filter performance.

(b) Impact of ensemble size

Let us now examine the impact of ensemble size on DOF for signal. We calculate DOF

for signal ds in data assimilation experiments with 80 and 40 observations and plot it as a

function of data assimilation cycles in Figs. 1a and 1b, respectively. Results from the reduced-

rank MLEF experiments (with 10 and 40 ensemble members) are shown along with the full-rank

Page 21: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

21

KF and 3d-var experiments. Recall that, since the observation number and the observation errors

did not change from one data assimilation cycle to another, the time variability of ds reflects the

time variability of the forecast error covariance matrix. Comparing the results obtained using the

same ensemble size in Figs. 1a and 1b, we can notice generally higher values of ds in the

experiments with 80 observations in the first few data assimilation cycles; the differences

between 80 and 40 observations diminish in the later cycles. This is an indication that the KF and

also the MLEF have learning capabilities, since they recognize that previously assimilated

observations had an impact on reducing the initially prescribed forecast errors, thus more

observations in the later cycles is less beneficial than in the earlier cycles. This learning

capability is not present in the 3d-var approach since the forecast error covariance is kept

constant at all times. Consequently, the 3d-var approach could not recognize that the previously

assimilated observations had an impact on reducing the forecast uncertainty.

As seen in Figs. 1a and 1b, the experiments with larger ensemble size typically have

larger values of ds, and vice versa. The smaller (larger) value of ds is a consequence of using the

forecast error covariance matrix of a smaller (larger) rank. The important observation is that the

KF experiment and all reduced-rank experiments show similar time variability of the information

measures. Assuming that the full-rank KF experiment produces the best analysis solution and the

best estimate of the flow-dependent forecast error covariance, these results indicate that the

forecast error covariance is also realistically described in the reduced-rank experiments. We will

examine this issue farther in the section “Temporal evolution of the information measures”.

(c) Impact of covariance localization

Page 22: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

22

In this subsection we evaluate the impact of covariance localization on the information

measures focusing on the experiment with small ensemble size (10 ensemble members) and with

smaller number of observations (40 observations). In Fig. 2, DOF for signal, obtained in the

experiments with and without localization is plotted as a function of data assimilation cycles.

The figure indicates that the covariance localization generally increases the amount of

information. This is not surprising, since covariance localization introduces extra DOF to the

data assimilation system (e.g., Hamill et al. 2001), but the total number of DOF cannot exceed

the ensemble size (Nens), since the information matrix C can have up to Nens non-zero

eigenvalues. An important observation is that the localization does not change the essential

character of the information measures (the lines with and without covariance localization are

approximately parallel). There is, however, a notable departure between the two lines around

cycle 56. Note, however, that because we use a single column model, it is likely to get shifted

maxima and minima by a single point in time, even under similar experimental conditions.

(d) Temporal evolution of the information measures

As observed in Figs. 1 and 2, the information measures reach a maximum in the first data

assimilation cycle. There are also two pronounced local maxima around cycles 40 and 50 (the

exact locations of the maxima vary between different experiments). In the following text, we

examine if there is a correlation between the information measures in Figs. 1 and 2 and the true

model state evolution.

The true model state evolution is shown in Figs. 3a, b, c and d, where true temperature,

true specific humidity, observed temperature, and observed specific humidity are plotted as

Page 23: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

23

functions of data assimilation cycles and model vertical levels. One can observe rapid, front-like,

time-tilted changes in both temperature and humidity around cycles 40 and 50. Figs. 1 and 2

indicate the two local maxima in the information measures around the same data assimilation

cycles. One can also observe correlations between additional smaller local maxima in Figs. 1 and

2 and rapid changes in Fig. 3, though, the rapid changes are more pronounced in the humidity

field than in the temperature field. It is, therefore, evident that the time evolution of the

information measures is correlated with the true model state time evolution. Since the

information measures employ a flow-dependent forecast error covariance, this confirms that the

flow-dependency of the forecast error covariance is reasonably correct. Note, however, that a

“flow-dependent” forecast error covariance does not always imply that the information measures

are flow-dependent. For example, the experiments with an insufficient ensemble size would

commonly produce ds=Nens in all data assimilation cycles, thus indicating that ds is not sensitive

to the changes in the true model state, even though the forecast error covariance is “flow-

dependent”. Thus, having a correct flow-dependent forecast error covariance matrix is of

fundamental importance for describing a prior knowledge about the truth when calculating

information measures.

One can also observe more variability in the observations than in the corresponding

“true” fields, especially for the specific humidity field (Figs. 3b and 3d). This is a manifestation

of representativeness error, introduced by randomly perturbing the model state variables when

creating simulated observations. Recall that we have approximately accounted for the impact of

the representativeness error through the empirical parameter α (Table 1).

(e) Trace of Pf

Page 24: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

24

Let us now examine the magnitude of the forecast error covariance. As an example, we

present the trace of the forecast error covariance matrix as a function of data assimilation cycles

obtained in the KF and 3d-var experiments with 80 observations (Fig. 4). In Fig. 4a the

temperature component (of the total trace) is given and in Fig. 4b the specific humidity

component is shown. As expected, Fig. 4 indicates time varying magnitudes of Pf for the KF

experiment and constant magnitudes of Pf for the 3d-var experiment. The figure also implies that

the largest values of ds obtained in the first data assimilation cycle (Figs. 1 and 2) are not a

simple consequence of using large initial Pf, since much larger magnitudes of Pf are obtained in

later data assimilation cycles (e.g., in the KF experiment around cycles 40 and 50). Thus, we can

conclude that the reason for large values of ds in the first data assimilation cycle is an inadequate

Pf , but not necessarily large Pf. The results in Fig. 4 also suggest that the large values of the

information measures obtained in the 3d-var experiment are a consequence of the shape of Pf,

not necessarily of the large magnitude of Pf. One can, however, appropriately tune the Pf used in

the 3d-var experiment in order to reduce or increase the information measures.

(f) Temporal evolution of the analysis and forecast errors

Let us conclude this section by examining temporal evolution of the analysis and forecast

RMS errors, calculated with respect to the “truth”, of different data assimilation experiments. We

show the RMS errors obtained in the KF, the 3d-var and the experiment with 10 ensemble

members with covariance localization. The errors of the three experiments are shown as

functions of model vertical levels and data assimilation cycles for temperature (Fig. 5) and for

Page 25: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

25

specific humidity (Fig. 6). For reference, we also show temporal evolution of the errors of

temperature and specific humidity obtained in the experiment without data assimilation (Figs. 7a

and 7b). Examination of Figs. 5, 6 and 7 indicates that the errors of the experiment without data

assimilation (no_obs, shown in Fig. 7) are the largest, and that they are at maximum around

cycles 40 and 50. Around the same cycles maxima in ds (Figs. 1 and 2) and the abrupt changes in

the “true” model state (Fig. 3) were also observed. These largest errors are reduced by the

greatest amount, but still not completely eliminated, in the KF experiment, as shown in Figs. 5

and 6. This is an expected result, which indicates a highly efficient use of the observed

information in the KF, owing to the use of the full-rank flow-dependent forecast error

covariance. Other two experiments, the 3d-var (Figs. 5c, 5d, 6c and 6d) and the 10-ensemble

members experiment (Figs. 5e, 5f, 6e and 6f), also indicate considerable, but much smaller, error

reductions with respect to the experiment without data assimilation. Comparisons of the RMS

errors of the 3d-var experiment, which uses a full-rank but constant forecast error covariance,

with the MLEF experiment with 10 ensemble members, which uses a flow-dependent forecast

error covariance but with a considerably reduced rank, indicates generally slightly better

performance of the 3d-var experiment. As shown in Table 2, the analysis and the background errors

obtained using larger ensemble sizes (e.g., 20 and 40 ensembles) are generally smaller the 3d-var

errors. Thus, there is a trade-off regarding the quality of the analysis, depending on how many

ensemble members are feasible to employ.

Page 26: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

26

5. CONCLUSIONS

In this study, we have applied information theory within an ensemble-based data

assimilation approach and defined information matrix in ensemble subspace. We have shown

that the information matrix in ensemble subspace can be directly linked to the information matrix

typically used in non-ensemble based data assimilation methods, such as the KF and the 3d-var

methods, which provides a framework for consistent comparisons of information measures

between different data assimilation methods.

We have evaluated this framework in application to the GEOS-5 SCM and simulated

observations, employing ARM observations as forcing. We have compared three different data

assimilation approaches, the KF, the MLEF and the 3d-var, focusing on the impact of ensemble

size, covariance localization, and the temporal evolution of the “true” model state on the

information measures.

Experimental results indicated that the essential character of the information measures

was similar in all experiments using a flow-dependent forecast error covariance matrix (the KF

and the MLEF experiments with varying ensemble sizes) by indicating similar trends of increase

or decrease with time. The temporal evolution of the information measures was correlated with

the true model state evolution, which was an indication that the flow-dependent forecast error

covariance was reasonable. The 3d-var based information measures were insensitive to the

changes in the true model state, since the forecast error covariance was (inadequately)

prescribed. These results indicated that it is fundamentally important to use a flow-dependent

forecast error covariance in order to adequately describe the prior knowledge about the truth

when calculating information measures.

Page 27: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

27

As expected, the impact of covariance localization was in improved data assimilation

results and in increased values of the information measures. Temporal evolution of the

information measures remained sensitive to the major changes it the true model state in a similar

way as in the experiments without localization.

Comparisons of the three different data assimilation approaches indicated superior

performance of the KF approach, owing to the use of the full-rank flow-dependent forecast error

covariance matrix. Comparisons of the reduced-rank MLEF and the 3d-var approach indicated

superior MLEF results when the ensemble size was greater than 10, and comparable or slightly

worse MLEF results for smaller ensemble sizes (without covariance localization).

The results of this study indicated effectiveness of the proposed framework in

applications to different data assimilation approaches. Although the results were very

encouraging, further evaluations of the proposed framework are still necessary, especially in

applications to data assimilation problems with numerous observations and atmospheric models

with many degrees of freedom.

Page 28: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

28

Acknowledgements

The first author would also like to thank Graeme Stephens, Christine Johnson, Louie

Grasso, and Stephane Vannitsem for inspiring discussions regarding information content

measures. This research was supported by NASA grants: 621-15-45-78, NAG5-12105, and

NNG04GI25G. We also acknowledge computational resources provided by the Explore

computer system at NASA’s Goddard Space Flight Center.

Page 29: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

29

References:

Abramov, R, and A. Majda, 2004: Quantifying uncertainty for non-Gaussian ensembles in

complex systems. SIAM J. Sci. Stat. Comp., 26, 411-447.

Abramov, R., A. Majda and R. Kleeman, 2005: Information theory and predictability for low-

frequency variability. J. Atmos. Sci., 62, 65–87.

Anderson, J. L., 2001: An ensemble adjustment filter for data assimilation. Mon. Wea. Rev., 129,

2884–2903.

Bishop, C. H., B. J. Etherton, and S. Majumjar, 2001: Adaptive sampling with the ensemble

Transform Kalman filter. Part 1: Theoretical aspects. Mon. Wea. Rev., 129, 420–436.

Bishop, C. H., and Z. Toth, 1999: Ensemble transformation and adaptive observations. J. Atmos.

Sci., 56, 1748–1765.

Cohn, S. E., 1997: An introduction to estimation theory. J. Meteor. Soc. Japan, 75, 257–288.

Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.

Dee, D., 1995: On-line estimation of error covariance parameters for atmospheric data

assimilation. Mon. Wea. Rev., 123, 1128–1145.

DelSole, T., 2004: Predictability and information theory. Part I: Measures of predictability. J.

Atmos. Sci., 61, 2425–2440.

Engelen, R. J., and G. L. Stephens, 2004: Information Content of Infrared Satellite Sounding

Measurements with Respect to CO2. J. Appl. Meteor. 43, 373–378.

Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using

Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, (C5), 10143-

10162.

Fisher, M., 2003: Estimation of entropy reduction and degrees of freedom for signal for large

variational analysis systems. ECMWF Tech. Memo. No. 397. 18 pp.

Page 30: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

30

Fletcher, S.J., and M. Zupanski, 2006: A data assimilation method for lognormally distributed

observational errors. Q. J. Roy. Meteor. Soc., 132, 2505-2520.

Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three

dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757.

Gill, P. E., W. Murray, and M. H. Wright, 1981: Practical Optimization. Academic Press, 401

pp.

Golub, G. H., and C. F. van Loan, 1989: Matrix Computations. 2d ed. The Johns Hopkins

University Press, 642 pp.

Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter/3D-variational analysis

scheme. Mon. Wea. Rev., 128, 2905–2919.

Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background

error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–

2790.

Heemink, A. W., M. Verlaan, and A. J. Segers, 2001: Variance reduced ensemble Kalman

filtering. Mon. Wea. Rev., 129, 1718–1728.

Horn, R. A., and C. R. Johnson, 1985: Matrix Analysis. Cambridge University Press, 561 pp.

Hoteit, I., D.-T. Pham, and J. Blum, 2002: A simplified reduced-order kalman filtering and

application to altimetric data assimilation in tropical Pacific. J. Mar. Sys., 36, 101-127.

Hoteit, I., D.-T. Pham, and J. Blum, 2003: A semi-evolutive filter with partially local correction

basis for data assimilation in oceanography. Oceanologica Acta, 26, 511-524.

Hou, A. Y., S. Q. Zhang, A. da Silva and W. Olson, 2000: Improving assimilated global datasets

using TMI rainfall and columnar moisture observations. J. Climate., 13, 4180–4195.

Page 31: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

31

Hou, A. Y., S. Q, Zhang, A. da Silva, W. Olson, C. Kummerow, and J. Simpson, 2001:

Improving global analysis and short-range forecast using rainfall and moisture

observations derived from TRMM and SSM/I passive microwave sensors. Bull. Amer.

Meteor. Soc., 81, 659–679.

Hou, A. Y., S. Q. Zhang, and O. Reale, 2004: Variational continuous assimilation of TMI and

SSM/I rain rates: Impact on GEOS-3 hurricane analyses and forecasts. Mon. Wea. Rev.,

132, 2094–2109.

Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter

technique. Mon. Wea. Rev., 126, 796–811.

Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for

atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137.

Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

Johnson, C., 2003: Information content of observations in variational data assimilation. Ph.D.

thesis, Department of Meteorology, University of Reading, 218 pp. [Available from

University of Reading, Whiteknights, P.O. Box 220, Reading, RG6 2AX, United

Kingdom.]

Keppenne, C., 2000: Data assimilation into a primitive-equation model with a parallel ensemble

Kalman filter. Mon. Wea. Rev., 128, 1971–1981.

Kleeman, R, 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci.,

59, 2057–2072.

Lermusiaux, P. F. J., and A. R. Robinson, 1999: Data assimilation via error subspace statistical

estimation. Part I: Theory and schemes. Mon. Wea. Rev., 127, 1385–1407.

L’Ecuyer, T. S., P. Gabriel, K. Leesman, S. J. Cooper, and G. L. Stephens. 2006: Objective

assessment of the information content of visible and infrared radiance measurements for

Page 32: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

32

cloud microphysical property retrievals over the global oceans. Part i: liquid clouds. J.

Appl. Meteor. Climat., 45, 20–41.

Lorenc, A. C., 1986: Analysis methods for numerical weather prediction. Quart. J. Roy. Meteor.

Soc., 112, 1177–1194.

Luenberger, D. L., 1984: Linear and Non-linear Programming. 2d ed. Addison-Wesley, 491 pp.

Menard, R., S. E. Cohn, L.-P. Chang, and P. M. Lyster, 2000: Assimilation of stratospheric

chemical tracer observations using a Kalman filter. Part I: Formulation. Mon. Wea. Rev.,

128, 2654–2671.

Mitchell, H. L, and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea.

Rev., 128, 416–433.

Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size, balance, and model-

error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130, 2791–2808.

Navon, I. M., X. Zou, J. Derber, and J. Sela, 1992: Variational data assimilation with an

adiabatic version of the NMC spectral model. Mon. Wea. Rev., 120, 1433–1446.

Oczkowski, M., I. Szunyogh, and D. J. Patil, 2005: Mechanism for the development of locally

low-dimensional atmospheric dynamics. J. Atmos. Sci., 62, 1135-1156.

Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza, M., Kalnay, E.,

Patil, D. J. and Yorke, J. A. 2004: A local ensemble Kalman filter for atmospheric

data assimilation. Tellus, 56A, 273-277.

Pham D. T., 2001: Stochastic methods for sequential data assimilation in strongly nonlinear

systems. Mon. Wea. Rev, 129, 1194–1207.

Pham, D.T., Verron, J., Roubaud, M.C., 1997. Singular evolutive Kalman filter with EOF

initialization for data assimilation in oceanography. J. Mar. Syst. 16, 323– 340.

Page 33: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

33

Pham D. T., J. Verron, and M. C. Roubaud, 1998: A singular evolutive extended Kalman filter for

data assimilation in oceanography. J. Mar. Syst., 16, 323–340.

Patil, D. J., B. R. Hunt, E. Kalnay, J.A. Yorke, and E. Ott, 2001. Local low dimensionality of

atmospheric dynamics. Phys. Rev. Lett., 86, 5878-5881.

Peters, W., J.B. Miller, J. Whitaker, A.S. Denning, A. Hirsch, M.C. Krol, D. Zupanski, L.

Bruhwiler, and P.P. Tans, 2005: An ensemble data assimilation system to estimate

CO2 surface fluxes from atmospheric trace gas observations. J. Geophys. Res. 110,

D24304, doi:10.1029/2005JD006157.

Purser, R.J., and H.-L. Huang, 1993: Estimating effective data density in a satellite retrieval or an

objective analysis. J. Appl. Meteorol., 32, 1092–1107.

Rabier F., N. Fourrie, C. Djalil, and P. Prunet, 2002: Channel selection methods for Infrared

Atmospheric Sounding Interferometer radiances. Quart. J. Roy. Meteor. Soc., 128, 1011–

1027.

Reichle, R. H., D. B. McLaughlin, D. Entekhabi, 2002a: Hydrologic data assimilation with the

ensemble Kalman filter. Mon. Wea. Rev., 130, 103–114.

Reichle, R.H., J.P. Walker, R.D. Koster, and P.R. Houser, 2002b: Extended versus ensemble

Kalman filtering for land data assimilation. J. Hydrometorology, 3, 728-740.

Rodgers, C. D., 2000: Inverse Methods for Atmospheric Sounding: Theory and Practice. World

Scientific, 238 pp.

Roulston, M, and L. Smith, 2002: Evaluating probabilistic forecasts using information theory.

Mon. Wea. Rev., 130, 1653–1660.

Schneider, T, and S. Griffies, 1999: A conceptual framework for predictability studies. J.

Climate., 12, 3133–3155.

Page 34: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

34

Shannon, C. E., and W. Weaver, 1949: The Mathematical Theory of Communication. University

of Illinois Press, 144 pp.

Szunyogh, I., E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, E. Ott, and J. A.

Yorke, 2005: Assessing a local ensemble Kalman filter: Perfect model experiments with

the NCEP global model. Submitted to Tellus, 57A, 528-545.

Tippett, M., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble

square-root filters. Mon. Wea. Rev., 131, 1485–1490.

Uzunoglu, B., S. J. Fletcher, M. Zupanski, and I. M. Navon, 2007: Adaptive ensemble member

size reduction and inflation. Quart. J. Roy. Meteor. Soc., (in print).

van Leeuwen, P. J., 2001: An ensemble smoother with error estimates. Mon. Wea. Rev., 129,

709–728.

Wahba, G., 1985: Design criteria and eigensequence plots for satellite-computed tomography. J.

Atmos. Oceanic Technol., 2, 125–132.

Wahba, G., D. R. Johnson, F. Gao, and J. Gong, 1995: Adaptive tuning of numerical weather

prediction models: Randomized GCV in three- and four-dimensional data assimilation.

Mon. Wea. Rev., 123, 3358–3370.

Wang, X., and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman

filter ensemble forecast schemes. J. Atmos. Sci., 60, 1140–1158.

Wei, M., Z. Toth, R.Wobus, Y. Zhu, C.H. Bishop, and X. Wang, 2006: Ensemble Transform

Kalman Filter-based ensemble perturbations in an operational global prediction system at

NCEP, Tellus, 58A, 28-44.

Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed

observations. Mon. Wea. Rev., 130, 1913–1924.

Page 35: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

35

Zhang, F., Z. Meng, and A. Aksoy, 2006: Tests of an ensemble Kalman filter for mesoscale and

regional-scale data assimilation. Part I: perfect model experiments. Mon. Wea. Rev., 134,

722–736.

Zhang, F., Snyder, C., and Sun J., 2004: Impacts of initial estimate and observation availability

on convective-scale data assimilation with an ensemble Kalman filter. Mon. Wea. Rev.

132, 1238–1253.

Zupanski, D., A. S. Denning, M. Uliasz, M. Zupanski, A. E. Schuh, P. J. Rayner, W. Peters and

K. D. Corbin, 2007: Carbon flux bias estimation employing Maximum Likelihood

Ensemble Filter (MLEF). J. Geophys. Res., (accepted with revisions).

Zupanski D. and M. Zupanski, 2006: Model error estimation employing an ensemble data

assimilation approach. Mon. Wea. Rev., 134, 1337-1354.

Zupanski, M., 2005: Maximum Likelihood Ensemble Filter: Theoretical Aspects. Mon. Wea.

Rev., 133, 1710–1726.

Zupanski, M., S. J Fletcher, I. M. Navon, B. Uzunoglu, R. P. Heikes, D. A. Randall, T. D.

Ringler, and D. Daesccu, 2006: Initiation of ensemble data assimilation. Tellus,

58A, 159-170.

Page 36: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

36

Table Captions List

Table 1. List of data assimilation experiments discussed in this paper. Nobs indicates the number

of observations per data assimilation cycle. The empirical parameter α, varying with ensemble

size, is employed to approximately account for an unknown representativeness error. Prefixes

“KF” and “3dv” indicate Kalman Filter and 3d-var experiments, respectively. Suffix “loc”

indicates that localization is applied to the forecast error covariance. Experiment denoted no_obs

is an experiment without data assimilation.

Table 2. Total RMS errors of the analysis and the background solutions, calculated with respect

to the truth over 70 data assimilation cycles, for the experiments listed in Table 1. The RMS

analysis and background errors are shown for temperature (denoted RMS Ta and RMS Tb) and

for specific humidity (denoted RMS qa and RMS qb). The RMS errors are smallest for the KF

experiment with 80 observations and are largest for the experiment without data assimilation

(no_obs). The smallest RMS errors are highlighted in bold, and the largest RMS errors are

highlighted in bold italic. Also shown are the mean values and standard deviations of the chi-

square statistic, calculated over 70 data assimilation cycles.

Page 37: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

37

*Covariance localization was not applied in the 3d-var experiments; however, the 3d-var covariance is localized by definition [defined using the Gaspari and Cohn (1999) correlation function].

Table 1. List of data assimilation experiments discussed in this paper. Nobs indicates the number

of observations per data assimilation cycle. The empirical parameter α, varying with ensemble

size, is employed to approximately account for an unknown representativeness error. Prefixes

“KF” and “3dv” indicate Kalman Filter and 3d-var experiments, respectively. Suffix “loc”

indicates that localization is applied to the forecast error covariance. Experiment denoted no_obs

is an experiment without data assimilation.

Experiment

Nens

(T and q

estimated)

Nobs

(T and q

observed)

Rinst

1 2 for T in

degrees K

Rinst

1 2 for q in kg/kg

(Min; Max errors)

Parameter

α

Localization

10ens_80obs 10 80 0.2 6.1*10-8 ; 7.9*10-4 2.1 NO

20ens_80obs 20 80 0.2 6.1*10-8 ; 7.9*10-4 1.7 NO

40ens_80obs 40 80 0.2 6.1*10-8 ; 7.9*10-4 1.4 NO

KF_80obs 80 80 0.2 6.1*10-8 ; 7.9*10-4 1.15 NO

10ens_40obs 10 40 0.2 6.1*10-8 ; 7.9*10-4 2.1 NO

20ens_40obs 20 40 0.2 6.1*10-8 ; 7.9*10-4 1.7 NO

40ens_40obs 40 40 0.2 6.1*10-8 ; 7.9*10-4 1.4 NO

KF_40obs 80 40 0.2 6.1*10-8 ; 7.9*10-4 1.15 NO

10ens_40obs_loc 10 40 0.2 6.1*10-8 ; 7.9*10-4 2.1 YES

3dv_40obs 80 40 0.2 6.1*10-8 ; 7.9*10-4 1.15 NO*

no_obs _ 0 _ _ _ _

Page 38: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

38

Experiment RMS Ta

(K)

RMS Tb

(K)

RMS qa

(kg/kg)

RMS qb

(kg/kg)

Chi-square

(mean)

Chi-square

(stddev)

10ens_80obs 0.45 0.49 3.77*10-4 3.97*10-4 1.11 0.27

20ens_80obs 0.28 0.35 2.65*10-4 3.08*10-4 0.95 0.20

40ens_80obs 0.23 0.32 2.26*10-4 2.91*10-4 0.92 0.15

KF_80obs 0.21 0.31 2.04*10-4 2.57*10-4 1.06 0.20

10ens_40obs 0.64 0.68 4.93*10-4 5.08*10-4 1.16 0.31

20ens_40obs 0.54 0.57 4.07*10-4 4.27*10-4 1.03 0.31

40ens_40obs 0.51 0.55 3.74*10-4 4.14*10-4 0.84 0.22

KF_40obs 0.38 0.40 3.38*10-4 3.42*10-4 0.81 0.20

10ens_40obs_loc 0.57 0.58 4.35*10-4 4.51*10-4 1.21 0.34

3dv_40obs 0.51 0.61 4.34*10-4 4.43*10-4 0.84 0.78

no_obs 0.82 0.82 6.56*10-4 6.56*10-4 _ _

Table 2. Total RMS errors of the analysis and the background solutions, calculated with respect

to the truth over 70 data assimilation cycles, for the experiments listed in Table 1. The RMS

analysis and background errors are shown for temperature (denoted RMS Ta and RMS Tb) and

for specific humidity (denoted RMS qa and RMS qb). The RMS errors are smallest for the KF

experiment with 80 observations and are largest for the experiment without data assimilation

(no_obs). The smallest RMS errors are highlighted in bold, and the largest RMS errors are

highlighted in bold italic. Also shown are the mean values and standard deviations of the chi-

square statistic, calculated over 70 data assimilation cycles.

Page 39: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

39

Figure Captions List

Fig. 1. Degrees of Freedom (DOF) for signal (ds), obtained in the experiments with (a) 80

observations and (b) 40 observations per data assimilation cycle. Note that ds is constant in time

in the 3d-var experiment, which is a consequence of a constant forecast error covariance.

Fig. 2. Values of DOF for signal (ds), obtained in the experiments with 10 ensemble members

and 40 observations (with and without covariance localization), plotted as functions of data

assimilation cycles.

Fig. 3. (a) True temperature, (b) true specific humidity, (c) observed temperature, and (d)

observed specific humidity, shown as functions of data assimilation cycles and model vertical

levels. Observations defined in each grid point (80 observations) are used in Fig. 4c,d. Units for

temperature are K, and for specific humidity g kg-1. Note rapid time-tilted changes in both

temperature and humidity around cycles 40 and 50.

Fig. 4. Trace of Pf, shown as a function of data assimilation cycles. Results from the KF and the

3d-var experiments with 80 observations are plotted. The temperature component of the total

trace is given in (a) in units of K2, and the specific humidity component of the total trace is given

in (b) in units of kg2 kg-2. Trace of Pf is constant in all cycles for the 3d-var experiment. It is

equal to 1.6 K2, for temperature, and 9.2*10-5 kg2 kg-2, for specific humidity.

Page 40: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

40

Fig. 5. Analysis and background errors of temperature obtained in three different data

assimilation experiments with 40 observations: KF_40obs, 3dv_40obs and 10ens_40obs_loc.

The errors are calculated with respect to the “truth” and are shown as functions of data

assimilation cycles and model vertical levels. The results from the KF experiment are shown in

(a), for the analysis, and in (b), for the background. The results of the 3d-var experiment are

shown in (c), for the analysis, and (d), for the background. The results of the experiment with 10

ensemble members, which also includes covariance localization, are given in (e), for the analysis

and in (f), for the background. The numbers in the upper right corners are total RMS errors from

Table 2. The units are in K degrees for both the plots and the total RMS errors.

Fig. 6. As in Fig. 5, but for specific humidity in g kg-1. The numbers in the upper right corners

are total RMS errors from Table 2, given in kg kg-1.

Fig. 7. Analysis errors of the experiment without data assimilation (no_obs), calculated with

respect to the “truth”. The results are plotted in (a), for temperature in K, and in (b), for specific

humidity in g kg-1. The RMS errors in the upper right corners are in units of K degrees, for

temperature, and in kg kg-1, for specific humidity.

Page 41: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

41

(a)

Fig. 1. Degrees of Freedom (DOF) for signal (ds), obtained in the experiments with (a) 80

observations and (b) 40 observations per data assimilation cycle. Note that ds is constant in time in

the 3d-var experiment, which is a consequence of a constant forecast error covariance.

(b)

Page 42: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

42

Fig. 2. Values of DOF for signal (ds), obtained in the experiments with 10 ensemble members and

40 observations (with and without covariance localization), plotted as functions of data assimilation

cycles.

Page 43: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

43

Fig. 3. (a) True temperature, (b) true specific humidity, (c) observed temperature, and (d) observed

specific humidity, shown as functions of data assimilation cycles and model vertical levels.

Observations defined in each grid point (80 observations) are used in Fig. 4c,d. Units for

temperature are K, and for specific humidity g kg-1. Note rapid time-tilted changes in both

temperature and humidity around cycles 40 and 50.

(a) T true

(c) T obs

(b) q true

(d) q obs

Vert

ical

leve

ls

Data assimilation cycles

Page 44: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

44

Fig. 4. Trace of Pf, shown as a function of data assimilation cycles. Results from the KF and the 3d-

var experiments with 80 observations are plotted. The temperature component of the total trace is

given in (a) in units of K2, and the specific humidity component of the total trace is given in (b) in

units of kg2 kg-2. Trace of Pf is constant in all cycles for the 3d-var experiment. It is equal to 1.6 K2,

for temperature, and 9.2*10-5 kg2 kg-2, for specific humidity.

(a)

(b)

Page 45: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

45

RMS=0.38K RMS=0.40K

Data assimilation cycles

Ver

tical

leve

ls

(a) (b)

(c) RMS=0.51K (d) RMS=0.61K

RMS=0.57K (e) RMS=0.58K (f)

Page 46: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

46

Fig. 5. Analysis and background errors of temperature obtained in three different data assimilation

experiments with 40 observations: KF_40obs, 3dv_40obs and 10ens_40obs_loc. The errors are

calculated with respect to the “truth” and are shown as functions of data assimilation cycles and

model vertical levels. The results from the KF experiment are shown in (a), for the analysis, and in

(b), for the background. The results of the 3d-var experiment are shown in (c), for the analysis, and

(d), for the background. The results of the experiment with 10 ensemble members, which also

includes covariance localization, are given in (e), for the analysis and in (f), for the background.

The numbers in the upper right corners are total RMS errors from Table 2. The units are in K

degrees for both the plots and the total RMS errors.

Page 47: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

47

Data assimilation cycles

Ver

tical

leve

ls

RMS=3.38*10-4 kg kg -1 RMS=3.42*10-4 kg kg -1

(a) (b)

RMS=4.43*10-4 kg kg -1 RMS=4.34*10-4 kg kg -1 (c) (d)

RMS=4.35*10-4 kg kg -1 (e) RMS=4.51*10-4 kg kg -1 (f)

Page 48: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

48

Fig. 6. As in Fig. 5, but for specific humidity in g kg-1. The numbers in the upper right corners are

total RMS errors from Table 2, given in kg kg-1.

Page 49: Applications of Information Theory in Ensemble Data ...cheung/PUB/QJ-06... · Applications of Information Theory in Ensemble Data Assimilation ... Dusanka Zupanski, ... should foster

49

Fig. 7. Analysis errors of the experiment without data assimilation (no_obs), calculated with respect

to the “truth”. The results are plotted in (a), for temperature in K, and in (b), for specific humidity

in g kg-1. The RMS errors in the upper right corners are in units of K degrees, for temperature, and

in kg kg-1, for specific humidity.

Data assimilation cycles

Ver

tical

leve

ls

RMS=0.82K RMS=6.56*10-4 kg kg -1 (a) (b)