A Statistical Theory of the Kalman...

A Statistical Theory of the Kalman Filter

Vivak PatelDepartment of Statistics, University of Chicago SIAM UQ, Lausanne, Switzerland April 8, 2016

Outline

I. Kalman Filter

II. Single State Estimation

III. Selective Assimilation

IV. Incremental Estimator

V. Numerical Experiment

VI. Conclusions

Kalman FilterLet be a sequence of states in related by the stochastic difference equation

where and .

Our objective is to estimate given and a sequence of noisy observations related to

where and

The Kalman Filter estimates and its covariance from estimates and using:

{χt : t ∈ N} Rd

χt+1 = Dtχt + ηt

Dt ∈ Rd×d ηt ∼ N (0, QM )

χt Dt {Zt : t ∈ N} χt

Zt = Otχt + ξt

Ot ∈ Rd×d′ξt ∼ N (0, QO)

χt+1 Ct+1 χ̂t Ct

χ̂t+1 = arg min {∥Zt+1 − Ot+1χ∥2Q−1

O+ ∥χ − Dtχ̂t∥

2(DtCtD

Tt

+QM )−1}C −1

t+1 = (DtCtDTt + QM )−1 + OT

t+1Q−10 Ot+1

Kalman FilterObservability/Controllability

Kalman, 1960Kalman & Bucy, 1961

Exponential Convergence (Fixed State, Deterministic Observations)

Johnstone, Johnson, Bitmead & Anderson, 1982Bittanti, Bolzern & Campi, 1990Parkum, Poulsen & Holst, 1992Cao & Schwarz, 2003

Nonexpansive in Specialized Metrics

Bougerol, 1993Atar & Zeitouni, 1997Le Gland, et. al., 2004Carli & Sepulchre, 2015

Kalman FilterQuestions important to statisticians:

1. Q: How many observations are needed (with stochastic dynamics and noisy observations) to estimate a single state "well"? A: With probability approaching 1, objective decays like .

2. Q: Does the covariance estimate actually estimate the asymptotic covariance of the parameter estimate? A: Yes, it does.

3. Q: How important are the apriori parameter and covariance estimate? A: ...

d/k

Kalman FilterQuestions important to numerical optimizer:

1. Q: Can we do "better" than the Kalman Filter? A: No.

2. Q: What is the rate of convergence of the Kalman Filter? A: Sublinear to a single state. Presumably, we cannot do better.

3. Q: Do we need to carry around the covariance estimate from state to state? A: ...

4. Q: Can we parallelize computations over relevant dimensions? A: ...

Kalman FilterQuestions important to UQ community (not in other categories):

1. Q: How well does the KF "perform" when the dynamics are just stable (e.v. of 1) or unstable (e.v. > 1)? A: ...

2. Q: How well does the KF "perform" when deterministic dynamics are misspecified? A: ...

3. Q: How well does the KF "perform" when model noise is misspecified/correlated from state to state? A: ...

4. Q: How well does the (E)KF "perform" when the observation function is misspecified?

5. Q: How well does the KF "perform" when the observation model noise is misspecified/correlated from state to state?

Single State EstimationTo get to statistical notation:

Initialization:

Observations, Errors, Covariates:

True Parameter, Data Model:

β0 = Dtχ̂t M0 = DtCtDTt + QM

Yk is the kth component of Zt

ϵk is the kth component of ξt

Xk is the kth row of Ot

β∗ = χt+1 Yk = XTk

β∗ + ϵk

Selective AssimilationQuestion: Suppose is the estimator of after observations are assimilated. How quickly does converge to ?

General Assumptions:

are independent, and explore the entire space regularly

(Iterative) Batch Estimator:

If and and conditioned on then almost surely. However:

are never known a priori. Requires iteration.Batch estimator must be recomputed if is incremented.

β̂k β∗ k β̂k β∗

ϵk E [ϵk|Xk] = 0 supk E [ϵ2k|Xk] < ∞

Xk Rd

β̂k = arg min { k

∑j=1

(Yj − XTj β)2 + ∥β − β0∥2

M −10

}1

σ2j

β0 ∈ Rd M0 ≻ 0 X1, … , Xk β̂k → β∗

σ2j

k

Incremental EstimatorKalman Filter (Single State)

Assumptions:

are independent, and are independent, identically distributed with finite second moment.

for all s.t. .

β̂k+1 = arg min { (Yk+1 − XTk+1β)2 + ∥∥β − β̂k

∥∥2

M −1k

}M −1

k+1 = M −1k + Xk+1XT

k+1

1

γ2k

1

γ2k

ϵk E [ϵk|Xk] = 0 supk E [ϵ2k|Xk] < ∞

X1, X2, …P [|XT

1 v| = 0] < 1 v ∈ Rd ∥v∥2 = 1

Incremental EstimatorTuning Parameter Requirements:

Theorem 1: Conditioned on , almost surely.

Theorem 2: Let

If then for all asymptotically almost surely

0 < δ2 ≤ infk

γ2k

≤ supk

γ2k

≤ Δ2 < ∞

β0 is arbitrary and M0 ≻ 0

X1, X2, … , Xk ∥βk − β∗∥ → 0

Mk = E [(β̂k − β∗) (β̂k − β∗)T ∣∣∣X1, … , Xk]

σ2j = σ2 ∀j ∈ N ϵ > 0

Mk ⪯ Mk ⪯ Mk

1 − ϵ

Δ2

1σ2

1 + ϵ

δ2

Numerical ExperimentData Set

Source: Public Use File from Center of Medicare and Medicaid Services.Described health care expenses, type of visit, patient demographics.Contained million records (too big to fit in my computer's 8GB memory).

Linear Model

Response: Cost of visit.Predictors: Patient's gender, Type of Facility, Patient's age.Intercept term was included, giving unknown variables.

Tuning Parameter

N = 2.8

p = 31

γ21 (k) = 1

k

γ2 = 37000γ3 = 0.001

Numerical ExperimentConvergence of Estimated Covariance.

Recall: .

Estimated covariance tracks well with mean residuals squared.

Mk ≺ Mk ≺ Mk1−ϵ

maxk γ2k

1σ2

1+ϵ

mink γ2k

Conclusions1. Q: How many observations are needed (with stochastic dynamics and noisy observations) to estimate

a single state "well"? A: (The trace of) is sufficient for determining convergence.

2. Q: Does the covariance estimate actually estimate the asymptotic covariance of the parameter estimate? A: Yes.

3. Q: Can we converge faster than the Kalman Filter? A: For a single state and given that is estimating , no. We will incrementally invert the hessian of the objective. In fact, we show that the conditioning has no impact on the rate of convergence.

Mk

Mk ≺ Mk ≺ Mk1−ϵ

maxk γ2k

1σ2

1+ϵ

mink γ2k

Mk Mk

Thank YouAcknowledgements

Mihai AnitescuGabriel Terejanu & Haiyan ChengUniversity of Chicago, Department of Statistics & SIAM Travel Award

Slides This slide deck can be found at galton.uchicago.edu/~vpatel.

Reference

Patel, Vivak. "Kalmanbased Stochastic Gradient Method with Stop Condition and Insensitivity toConditioning." arXiv preprint arXiv:1512.01139 (2015).

https://galton.uchicago.edu/~vpatel

A Statistical Theory of the Kalman...

Documents

Transcript of A Statistical Theory of the Kalman...