8 Nonlinear Kalman Filters

8.1 Introduction

Recall that the Kalman filter can be used for

state estimation—this is the direct filtering (or smoothing) problem
parameter estimation—this is the inverse problem based on filtering with a pseudo-time

Kalman filters are linear (and Gaussian) or nonlinear. Here we will formulate the basic nonliner filter, known as the extended Kalman filter.

8.2 Recall: Kalman filter problem - general formulation

We present a very general formulation that will later be convenient for joint state and parameter estimation problems. Consider a discrete-time nonlinear dynamical system with noisy state transitions and noisy observations that are also noinlinear.

We consider a discrete-time dynamical system with noisy state transitions and noisy observations.

Dynamics:

\[v_{j+1} = \Psi(v_j) + \xi_j, \quad j \in \mathbb{Z}^+\]
Observations:

\[y_{j+1} = h (v_{j+1}) + \eta_{j+1}, \quad j \in \mathbb{Z}^+\]
Probability (densities):

\[v_0 \sim \mathcal{N}(m_0,C_0), \quad \xi_j \sim \mathcal{N}(0,\Sigma), \quad \eta_j \sim \mathcal{N}(0,\Gamma)\]
Probability (independence):

\[v_0 \perp {\xi_j} \perp {\eta_j}\]
Operators:

\[\begin{eqnarray} \Psi \colon \mathcal{H}_s &\mapsto \mathcal{H}_s, \\ h \colon \mathcal{H}_s &\mapsto \mathcal{H}_o, \end{eqnarray}\] where \(v_j \in \mathcal{H}_s,\) \(y_j \in \mathcal{H}_o\) and \(\mathcal{H}\) is a finite-dimensional Hilbert space.

Filtering problem:

Estimate (optimally) the state \(v_j\) of the dynamical system at time \(j,\) given the data \(Y_j = \{y_i\}_{i=1}^{j}\) up to time \(j.\) This is achieved by using a two-step predictor-corrector method. We will use the more general notation of (Law, Stuart, and Zygalakis 2015) instead of the usual, classical state-space formulation that is used in (Asch, Bocquet, and Nodet 2016) and (Asch 2022).

The objective here is to update the filtering distribution \(\mathbb{P}(v_j \vert Y_j),\) from time \(j\) to time \(j+1,\) in the nonlinear, Gaussian case, where

\(\Psi\) and \(h\) are nonlinear functions,
all distributions are Gaussian.

Suppose the Jacobian matrices of \(\Psi\) and \(h\) exist, and are denoted by

\[\begin{eqnarray} \Psi_x(v) &= \left[ \frac{\partial \Psi}{\partial x} \right]_{x=m},\\ h_x(v) &= \left[ \frac{\partial h}{\partial x} \right]_{x=m}. \end{eqnarray}\]

Let \((m_j, C_j)\) denote the mean and covariance of \(v_j \vert Y_j\) and note that these entirely characterize the random variable since it is Gaussian.
Let \((\hat{m}_{j+1}, \hat{C}_{j+1})\) denote the mean and covariance of \(v_{j+1} \vert Y_j\) and note that these entirely characterize the random variable since it is Gaussian.
Derive the map \((m_j, C_j) \mapsto (m_{j+1}, C_{j+1})\) using the previous step.

8.2.1 Prediction/Forecast

\[ \mathbb{P}(v_n \vert y_1, \ldots, y_n) \mapsto \mathbb{P}(v_{n+1} \vert y_1, \ldots, y_n) \]

P0: initialize \((m_0, C_0)\) and compute \(v_0\)
P1: predict the state, measurement

\[\begin{align} v_{j+1} &= \Psi (v_j) + \xi_j \\ y_{j+1} &= h (v_{j+1}) + \eta_{j+1} \end{align}\]
P2: predict the mean and covariance

\[\begin{align} \hat{m}_{j+1} &= \Psi (m_j) \\ \hat{C}_{j+1} &= \Psi_x C_j \Psi_x^{\mathrm{T}} + \Sigma \end{align}\]

8.2.2 Correction/Analysis

\[ \mathbb{P}(v_{n+1} \vert y_1, \ldots, y_n) \mapsto \mathbb{P}(v_{n+1} \vert y_1, \ldots, y_{n+1}) \]

C1: compute the innovation

\[ d_{j+1} = y_{j+1} - h (\hat{m}_{j+1}) \]
C2: compute the measurement covariance

\[ S_{j+1} = h_x \hat{C}_{j+1} h_x^{\mathrm{T}} + \Gamma \]
C3: compute the (optimal) Kalman gain

\[ K_{j+1} = \hat{C}_{j+1} h_x^{\mathrm{T}} S_{j+1}^{-1} \]
C4: update/correct the mean and covariance

\[\begin{align} {m}_{j+1} &= \hat{m}_{j+1} + K_{j+1} d_{j+1}, \\ {C}_{j+1} &= \hat{C}_{j+1} - K_{j+1} S_{j+1} K_{j+1}^{\mathrm{T}}. \end{align}\]

8.2.3 Loop over time

set \(j = j+1\)
go to step P1

8.3 State-space formulation

In classical filter theory, a state space formulation is usually used.

\[\begin{eqnarray} && x_{k+1} = f (x_k) + w_k \\ && y_{k+1} = h (x_k) + v_k, \end{eqnarray}\]

where \(f\) and \(h\) are nonlinear, differentiable functions with Jacobian matrices \(D_f\) and \(D_h\) respectively, \(w_k \sim \mathcal{N}(0,Q),\) \(v_k \sim \mathcal{N}(0,R).\)

The 2-step filter:

8.3.1 Initialization

\[ x_0, \quad P_0 \]

8.3.2 1. Prediction

\[\begin{eqnarray} && x_{k+1}^- = f (x_k) \\ && P_{k+1}^- = D_f P_k D_f^{\mathrm{T}} + Q \end{eqnarray}\]

8.3.3 2. Correction

\[\begin{eqnarray} K_{k+1} && = P_{k+1}^{-} D_h^{\mathrm{T}} ( D_h P_{k+1}^{-} D_h^{\mathrm{T}} + R )^{-1} \quad (= P_{k+1}^- D_h^{\mathrm{T}} S^{-1}) \\ x_{k+1} &&= x_{k+1}^{-} + K_{k+1} (y_{k+1} - h (x_{k+1}^-) ) \\ P_{k+1} &&= (I - K_{k+1} D_h ) P_{k+1}^- \quad (= P_{k+1}^- - K_{k+1} S K^{\mathrm{T}}_{k+1}) \end{eqnarray}\]

8.3.4 Loop

Set \(k = k+1\) and go to step 1.

8.4 Other nonlinear filters

unscented Kalman filter
particle filter

For details, please consult the references.

Asch, Mark. 2022. A Toolbox for Digital Twins: From Model-Based to Data-Driven. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611976977.

Asch, Mark, Marc Bocquet, and Maëlle Nodet. 2016. Data Assimilation: Methods, Algorithms, and Applications. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611974546.

Calvello, Edoardo, Sebastian Reich, and Andrew M. Stuart. 2022. “Ensemble Kalman Methods: A Mean Field Perspective.” arXiv (to appear in Acta Numerica 2025). http://arxiv.org/abs/2209.11371.

Carrillo, J. A., F. Hoffmann, A. M. Stuart, and U. Vaes. 2024a. “Statistical Accuracy of Approximate Filtering Methods.” https://arxiv.org/abs/2402.01593.

———. 2024b. “The Mean Field Ensemble Kalman Filter: Near-Gaussian Setting.” https://arxiv.org/abs/2212.13239.

Dashti, Masoumeh, and Andrew M. Stuart. 2015. “The Bayesian Approach to Inverse Problems.” In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–118. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-11259-6_7-1.

Huang, Daniel Zhengyu, Jiaoyang Huang, Sebastian Reich, and Andrew M Stuart. 2022. “Efficient Derivative-Free Bayesian Inference for Large-Scale Inverse Problems.” Inverse Problems 38 (12): 125006. https://doi.org/10.1088/1361-6420/ac99fa.

Iglesias, Marco A, Kody J H Law, and Andrew M Stuart. 2013. “Ensemble Kalman Methods for Inverse Problems.” Inverse Problems 29 (4): 045001. https://doi.org/10.1088/0266-5611/29/4/045001.

James, G., D. Witten, T. Hastie, and R. Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. Second Edition. Springer-Verlag New York. https://doi.org/10.1007/978-1-0716-1418-1.

Law, Kody, Andrew Stuart, and Konstantinos Zygalakis. 2015. Data Assimilation: A Mathematical Introduction. Vol. 62. Texts in Applied Mathematics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20325-6.

Reich, Sebastian, and Colin Cotter. 2015. Probabilistic Forecasting and Bayesian Data Assimilation. Cambridge University Press.

Sanita Vetra-Carvalho, Lars Nerger, Peter Jan van Leeuwen, and Jean-Marie Beckers. 2018. “State-of-the-Art Stochastic Data Assimilation Methods for High-Dimensional Non-Gaussian Problems.” Tellus A: Dynamic Meteorology and Oceanography 70 (1): 1–43. https://doi.org/10.1080/16000870.2018.1445364.

Särkkä, S., and L. Svensson. 2023. Bayesian Filtering and Smoothing. 2nd ed. Institute of Mathematical Statistics Textbooks. Cambridge University Press. https://doi.org/10.1017/9781108917407.