A Statistical Theory of the Kalman...

16
A Statistical Theory of the Kalman Filter Vivak Patel Department of Statistics, University of Chicago SIAM UQ, Lausanne, Switzerland April 8, 2016

Transcript of A Statistical Theory of the Kalman...

Page 1: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

A Statistical Theory of  the Kalman Filter

Vivak PatelDepartment of Statistics, University of Chicago  SIAM UQ, Lausanne, Switzerland  April 8, 2016  

Page 2: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Outline

I. Kalman Filter  

II. Single State Estimation  

III. Selective Assimilation  

IV. Incremental Estimator  

V. Numerical Experiment  

Page 3: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

VI. Conclusions

Page 4: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Kalman FilterLet   be a sequence of states in   related by the stochastic difference equation

where   and  .

Our objective is to estimate   given   and a sequence of noisy observations   related to 

where   and 

The Kalman Filter estimates   and its covariance   from estimates   and   using:

{χt : t ∈ N} Rd

χt+1 = Dtχt + ηt

Dt ∈ Rd×d ηt ∼ N (0, QM )

χt Dt {Zt : t ∈ N} χt

Zt = Otχt + ξt

Ot ∈ Rd×d′ξt ∼ N (0, QO)

χt+1 Ct+1 χ̂t Ct

χ̂t+1 = arg min {∥Zt+1 − Ot+1χ∥2Q−1

O+ ∥χ − Dtχ̂t∥

2(DtCtD

Tt

+QM )−1}C −1

t+1 = (DtCtDTt + QM )−1 + OT

t+1Q−10 Ot+1

Page 5: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Kalman FilterObservability/Controllability

Kalman, 1960Kalman & Bucy, 1961

Exponential Convergence (Fixed State, Deterministic Observations)

Johnstone, Johnson, Bitmead & Anderson, 1982Bittanti, Bolzern & Campi, 1990Parkum, Poulsen & Holst, 1992Cao & Schwarz, 2003

Nonexpansive in Specialized Metrics

Bougerol, 1993Atar & Zeitouni, 1997Le Gland, et. al., 2004Carli & Sepulchre, 2015

Page 6: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Kalman FilterQuestions important to statisticians:

1.  Q: How many observations are needed (with stochastic dynamics and noisy observations) to estimate a single state "well"?  A: With probability approaching 1, objective decays like  . 

2.  Q: Does the covariance estimate actually estimate the asymptotic covariance of the parameter estimate? A: Yes, it does. 

3.  Q: How important are the apriori parameter and covariance estimate? A: ...  

d/k

Page 7: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Kalman FilterQuestions important to numerical optimizer:

1.  Q: Can we do "better" than the Kalman Filter? A: No. 

2.  Q: What is the rate of convergence of the Kalman Filter? A: Sublinear to a single state. Presumably, we cannot do better. 

3.  Q: Do we need to carry around the covariance estimate from state to state? A: ...  

4.  Q: Can we parallelize computations over relevant dimensions? A: ...  

Page 8: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Kalman FilterQuestions important to UQ community (not in other categories):

1.  Q: How well does the KF "perform" when the dynamics are just stable (e.v. of 1) or unstable (e.v. > 1)?  A: ...  

2.  Q: How well does the KF "perform" when deterministic dynamics are misspecified?  A: ...  

3.  Q: How well does the KF "perform" when model noise is misspecified/correlated from state to state?  A: ...  

4.  Q: How well does the (E)KF "perform" when the observation function is misspecified?  

5.  Q: How well does the KF "perform" when the observation model noise is misspecified/correlated from state to state?

Page 9: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Single State EstimationTo get to statistical notation:

Initialization:

Observations, Errors, Covariates:

True Parameter, Data Model:

β0 = Dtχ̂t M0 = DtCtDTt + QM

Yk is the kth component of Zt

ϵk is the kth component of ξt

Xk is the kth row of Ot

β∗ = χt+1 Yk = XTk

β∗ + ϵk

Page 10: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Selective AssimilationQuestion: Suppose   is the estimator of   after   observations are assimilated. How quickly does   converge to  ?

General Assumptions:

 are independent,   and  explore the entire space   regularly

(Iterative) Batch Estimator:

If   and   and conditioned on   then   almost surely. However:

 are never known a priori. Requires iteration.Batch estimator must be recomputed if   is incremented.

β̂k β∗ k β̂k β∗

ϵk E [ϵk|Xk] = 0 supk E [ϵ2k|Xk] < ∞

Xk Rd

β̂k = arg min { k

∑j=1

(Yj − XTj β)2 + ∥β − β0∥2

M −10

}1

σ2j

β0 ∈ Rd M0 ≻ 0 X1, … , Xk β̂k → β∗

σ2j

k

Page 11: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Incremental EstimatorKalman Filter (Single State)

Assumptions:

 are independent,   and  are independent, identically distributed with finite second moment.

 for all   s.t.  .

β̂k+1 = arg min { (Yk+1 − XTk+1β)2 + ∥∥β − β̂k

∥∥2

M −1k

}M −1

k+1 = M −1k + Xk+1XT

k+1

1

γ2k

1

γ2k

ϵk E [ϵk|Xk] = 0 supk E [ϵ2k|Xk] < ∞

X1, X2, …P [|XT

1 v| = 0] < 1 v ∈ Rd ∥v∥2 = 1

Page 12: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Incremental EstimatorTuning Parameter Requirements:

Theorem 1: Conditioned on  ,   almost surely.

Theorem 2: Let

If   then for all   asymptotically almost surely

0 < δ2 ≤ infk

γ2k

≤ supk

γ2k

≤ Δ2 < ∞

β0 is arbitrary and M0 ≻ 0

X1, X2, … , Xk ∥βk − β∗∥ → 0

Mk = E [(β̂k − β∗) (β̂k − β∗)T ∣∣∣X1, … , Xk]

σ2j = σ2 ∀j ∈ N ϵ > 0

Mk ⪯ Mk ⪯ Mk

1 − ϵ

Δ2

1σ2

1 + ϵ

δ2

Page 13: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Numerical ExperimentData Set

Source: Public Use File from Center of Medicare and Medicaid Services.Described health care expenses, type of visit, patient demographics.Contained   million records (too big to fit in my computer's 8GB memory).

Linear Model

Response: Cost of visit.Predictors: Patient's gender, Type of Facility, Patient's age.Intercept term was included, giving   unknown variables.

Tuning Parameter

N = 2.8

p = 31

γ21 (k) = 1

k

γ2 = 37000γ3 = 0.001

Page 14: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Numerical ExperimentConvergence of Estimated Covariance.

Recall:  .

Estimated covariance tracks well with mean residuals squared.

Mk ≺ Mk ≺ Mk1−ϵ

maxk γ2k

1σ2

1+ϵ

mink γ2k

Page 15: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Conclusions1.  Q: How many observations are needed (with stochastic dynamics and noisy observations) to estimate 

a single state "well"?  A: (The trace of)   is sufficient for determining convergence.  

2.  Q: Does the covariance estimate actually estimate the asymptotic covariance of the parameter estimate? A: Yes.    

3.  Q: Can we converge faster than the Kalman Filter? A: For a single state and given that   is estimating  , no. We will incrementally invert the hessian of the objective. In fact, we show that the conditioning has no impact on the rate of convergence. 

Mk

Mk ≺ Mk ≺ Mk1−ϵ

maxk γ2k

1σ2

1+ϵ

mink γ2k

Mk Mk

Page 16: A Statistical Theory of the Kalman Filterpages.cs.wisc.edu/~vrpatel6/resources/Presentations/pdf/A... · 2018-09-08 · This slide deck can be found at galton.uchicago.edu/~vpatel.

Thank YouAcknowledgements

Mihai AnitescuGabriel Terejanu & Haiyan ChengUniversity of Chicago, Department of Statistics & SIAM Travel Award

Slides  This slide deck can be found at galton.uchicago.edu/~vpatel.

Reference 

Patel, Vivak. "Kalman­based Stochastic Gradient Method with Stop Condition and Insensitivity toConditioning." arXiv preprint arXiv:1512.01139 (2015).