3 Kalman Filters

3.1 Introduction

The Kalman filter can be used for

state estimation—this is the direct filtering (or smoothing) problem
parameter estimation—this is the inverse problem based on filtering with a pseudo-time

Kalman filters are linear (and Gaussian) or nonlinear. Among the nonlinear approaches, the Ensemble Kalman filter (EnKF) provides a derivative-free approach to the solution of inverse problems.

Definition: filtering problem

Filtering is the sequential updating of the probability distribution of the state of a (possibly stochastic) dynamical system, given partial (or sparse), noisy observations.

The KF provides an explicit, optimal solution to this problem in the setting of linear dynamical systems, linear observations, and additive Gaussian noise. In this case, we obtain explicit update formulas for the mean and covariance of the resulting posterior Gaussian probability distribution.

3.2 Kalman filter problem - General formulation

We present a very general formulation that will later be convenient for joint state and parameter estimation problems.

We consider a discrete-time dynamical system with noisy state transitions and noisy observations.

Dynamics:

\[v_{j+1} = \Psi(v_j) + \xi_j, \quad j \in \mathbb{Z}^+\]
Observations:

\[y_{j+1} = h (v_{j+1}) + \eta_{j+1}, \quad j \in \mathbb{Z}^+\]
Probability (densities):

\[v_0 \sim \mathcal{N}(m_0,C_0), \quad \xi_j \sim \mathcal{N}(0,\Sigma), \quad \eta_j \sim \mathcal{N}(0,\Gamma)\]
Probability (independence):

\[v_0 \perp {\xi_j} \perp {\eta_j}\]
Operators:

\[\begin{eqnarray} \Psi \colon \mathcal{H}_s &\mapsto \mathcal{H}_s, \\ h \colon \mathcal{H}_s &\mapsto \mathcal{H}_o, \end{eqnarray}\] where \(v_j \in \mathcal{H}_s,\) \(y_j \in \mathcal{H}_o\) and \(\mathcal{H}\) is a finite-dimensional Hilbert space.

Filtering problem: estimate (optimally) the state \(v_j\) of the dynamical system at time \(j,\) given the data \(Y_j = \{y_i\}_{i=1}^{j}\) up to time \(j.\) This is achieved by using a two-step predictor-corrector method. We will use the more general notation of (Law, Stuart, and Zygalakis 2015) instead of the usual, classical state-space formulation that is used in (Asch, Bocquet, and Nodet 2016) and (Asch 2022).

The objective is to update the filtering distribution \(\mathbb{P}(v_j \vert Y_j),\) from time \(j\) to time \(j+1,\) in the linear, Gaussian case, where - \(\Psi\) and \(h\) are linear maps - all distributions are Gaussian.

Suppose \[\begin{eqnarray} \Psi(v) &= Mv \\ h(v) &= Hv, \end{eqnarray}\] where the matrices \(M \in \mathbb{R}^{n \times n},\) \(H \in \mathbb{R}^{m \times n},\) with \(m \le n\) and \(\mathrm{rank}(H)=m.\)

Let \((m_j, C_j)\) denote the mean and covariance of \(v_j \vert Y_j\) and note that these entirely characterize the random variable since it is Gaussian.
Let \((\hat{m}_{j+1}, \hat{C}_{j+1})\) denote the mean and covariance of \(v_{j+1} \vert Y_j\) and note that these entirely characterize the random variable since it is Gaussian.
Derive the map \((m_j, C_j) \mapsto (m_{j+1}, C_{j+1})\) using the previous step.

3.2.1 Prediction/Forecast

\[ \mathbb{P}(v_n \vert y_1, \ldots, y_n) \mapsto \mathbb{P}(v_{n+1} \vert y_1, \ldots, y_n) \]

P0: initialize \((m_0, C_0)\) and compute \(v_0\)
P1: predict the state, measurement

\[\begin{align} v_{j+1} &= M v_j + \xi_j \\ y_{j+1} &= H v_{j+1} + \eta_{j+1} \end{align}\]
P2: predict the mean and covariance

\[\begin{align} \hat{m}_{j+1} &= M m_j \\ \hat{C}_{j+1} &= M C_j M^{\mathrm{T}} + \Sigma \end{align}\]

3.2.2 Correction/Analysis

\[ \mathbb{P}(v_{n+1} \vert y_1, \ldots, y_n) \mapsto \mathbb{P}(v_{n+1} \vert y_1, \ldots, y_{n+1}) \]

C1: compute the innovation

\[ d_{j+1} = y_{j+1} - H \hat{m}_{j+1} \]
C2: compute the measurement covariance

\[ S_{j+1} = H \hat{C}_{j+1} H^{\mathrm{T}} + \Gamma \]
C3: compute the (optimal) Kalman gain

\[ K_{j+1} = \hat{C}_{j+1} H^{\mathrm{T}} S_{j+1}^{-1} \]
C4: update/correct the mean and covariance

\[\begin{align} {m}_{j+1} &= \hat{m}_{j+1} + K_{j+1} d_{j+1}, \\ {C}_{j+1} &= \hat{C}_{j+1} - K_{j+1} S_{j+1} K_{j+1}^{\mathrm{T}}. \end{align}\]

3.2.3 Loop over time

set \(j = j+1\)
go to step P1

3.3 State-space formulation

In classical (linear) filter theory, a state space formulation is usually used.

\[\begin{eqnarray} && x_{k+1} = F x_k + B u_k + w_k \\ && y_{k+1} = H x_k + v_k, \end{eqnarray}\] where \(u\) is a control input, and \(w_k \sim \mathcal{N}(0,Q),\) \(v_k \sim \mathcal{N}(0,R).\) Moreover, \(A\) is the dynamics and \(H\) the observation operator.

The 2-step filter:

Initialization

\[ x_0, \quad P_0 \]

1. Prediction

\[\begin{eqnarray} && x_{k+1}^- = F x_k \\ && P_{k+1}^- = F P_k F^{\mathrm{T}} + Q \end{eqnarray}\]

2. Correction

\[\begin{eqnarray} K_{k+1} && = P_{k+1}^{-} H^{\mathrm{T}} ( H P_{k+1}^{-} H^{\mathrm{T}} + R )^{-1} \quad (= P_{k+1}^- H^{\mathrm{T}} S^{-1})\\ x_{k+1} &&= x_{k+1}^{-} + K_{k+1} (y_{k+1} - H x_{k+1}^- ) \\ P_{k+1} &&= (I - K_{k+1} H ) P_{k+1}^- \quad (= P_{k+1}^- - K_{k+1} S K^{\mathrm{T}}_{k+1}) \end{eqnarray}\]

Loop

Set \(k = k+1\) and go to step 1.

Tip

In some cases, the superscripts f and a are used to denote the forecast/prediction, and analysis/correction variables, respectively. We avoid this, for clarity of notation.

3.4 Passage from Continuous to Discrete for Random Dynamic Systems

Most often, in the use of Kalman filtering, we deal with a continuous dynamic system that is modelled by a system of ODEs, with some random process noise. The passage from the continuous to a discrete-time formulation, which is needed for the KF, requires some attention. Following (Särkkä and Svensson 2023), we will derive the associated discretization of the process matrix and the process noise covariance matrix.

Suppose we have a linear, time invariant (LTI) system of ODEs with an additive random noise term (this is the engineering approach that avoids the use of SDEs and Itô calculus—see (Asch 2022)

\[ \frac{\mathrm{d}\mathbf{x}}{\mathrm{d}t} = F \mathbf{x}(t) + L \mathbf{w}(t) , \quad \mathbf{x}(0) = \mathbf{x}_0, \]

where \(\mathbf{w}(t)\) is a white, Gaussian noise process with expectation and covariance

\[\begin{align} & \mathrm{E}[\mathbf{w}(t)] = 0, \\ & \mathrm{Cov}[\mathbf{w}(\tau_1)\mathbf{w}(\tau_2)] = \delta(\tau_1 - \tau_2) Q^c \end{align}\]

with \(Q^c\) the spectral density matrix of the white noise process, which is the continuous-time analogue of a covariance matrix, and in the scalar case is just the noise variance. Usually

\[ Q^c = \mathrm{diag}(q^c_1, \ldots, q^c_n), \]

where \(q^c_1, \ldots, q^c_n\) are the spectral densities of \(w_1(t), \ldots, w_n(t),\) the components of \(\mathbf{w}(t).\)

We now proceed to convert this continous LTI system into its discrete counterpart, needed for the KF algorithm,

\[ \mathbf{x}_{k+1 } = A_k \mathbf{x}_{k} + \mathbf{q}_{k}, \]

where \(A_k\) is the discretized process/dynamics transition matrix, and \(\mathbf{q}_{k} \sim \mathcal{N} (0, Q)\) is the Gaussian process noise.

To obtain expressions for \(A\) and \(Q,\) we need to solve the random ODE system. Doing this, we can show that

\[ A_k = \exp (F \Delta t_{k}) = \sum_{n=0}^{\infty} \frac{(F \Delta t_{k})^n}{n !} \]

and

\[ Q_k = \int_0^{\Delta t_{k}} \exp (F s) L Q^c L^{\mathrm{T}} \exp (F s)^{\mathrm{T}} \mathrm{d}s. \]

Usually, \(F\) is nilpotent to some low order, so the computation of the matrix exponential is relatively easy to do. This will be illustrated in the examples.

Asch, Mark. 2022. A Toolbox for Digital Twins: From Model-Based to Data-Driven. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611976977.

Asch, Mark, Marc Bocquet, and Maëlle Nodet. 2016. Data Assimilation: Methods, Algorithms, and Applications. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611974546.

Calvello, Edoardo, Sebastian Reich, and Andrew M. Stuart. 2022. “Ensemble Kalman Methods: A Mean Field Perspective.” arXiv (to appear in Acta Numerica 2025). http://arxiv.org/abs/2209.11371.

Carrillo, J. A., F. Hoffmann, A. M. Stuart, and U. Vaes. 2024a. “Statistical Accuracy of Approximate Filtering Methods.” https://arxiv.org/abs/2402.01593.

———. 2024b. “The Mean Field Ensemble Kalman Filter: Near-Gaussian Setting.” https://arxiv.org/abs/2212.13239.

Dashti, Masoumeh, and Andrew M. Stuart. 2015. “The Bayesian Approach to Inverse Problems.” In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–118. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-11259-6_7-1.

Huang, Daniel Zhengyu, Jiaoyang Huang, Sebastian Reich, and Andrew M Stuart. 2022. “Efficient Derivative-Free Bayesian Inference for Large-Scale Inverse Problems.” Inverse Problems 38 (12): 125006. https://doi.org/10.1088/1361-6420/ac99fa.

Iglesias, Marco A, Kody J H Law, and Andrew M Stuart. 2013. “Ensemble Kalman Methods for Inverse Problems.” Inverse Problems 29 (4): 045001. https://doi.org/10.1088/0266-5611/29/4/045001.

James, G., D. Witten, T. Hastie, and R. Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. Second Edition. Springer-Verlag New York. https://doi.org/10.1007/978-1-0716-1418-1.

Law, Kody, Andrew Stuart, and Konstantinos Zygalakis. 2015. Data Assimilation: A Mathematical Introduction. Vol. 62. Texts in Applied Mathematics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20325-6.

Reich, Sebastian, and Colin Cotter. 2015. Probabilistic Forecasting and Bayesian Data Assimilation. Cambridge University Press.

Sanita Vetra-Carvalho, Lars Nerger, Peter Jan van Leeuwen, and Jean-Marie Beckers. 2018. “State-of-the-Art Stochastic Data Assimilation Methods for High-Dimensional Non-Gaussian Problems.” Tellus A: Dynamic Meteorology and Oceanography 70 (1): 1–43. https://doi.org/10.1080/16000870.2018.1445364.

Särkkä, S., and L. Svensson. 2023. Bayesian Filtering and Smoothing. 2nd ed. Institute of Mathematical Statistics Textbooks. Cambridge University Press. https://doi.org/10.1017/9781108917407.