18 Ensemble Kalman Inversion (EKI)

Estimating the statistics of the state of a dynamical system from partial/sparse and noisy measurements is both mathematically challenging and the subject of numerous real-life applications. These can have great societal impact, such as in the cases of numerical weather forecasting, epidemic evolution modelling, just to name a few.

Ensemble Kalman filtering methodology is known to be effective in high dimensions and in nonlinear settings, as opposed to regular Kalman filters. Moreover, ensemble Kalman filters can be used to solve quite general inverse problems.

The theoretical approach, known as EKI, was originally formulated in (Iglesias, Law, and Stuart 2013) and refined in the review (Calvello, Reich, and Stuart 2022) (to appear in Acta Numerica 2025). A shorter version can be found in the recent preprint (Huang et al. 2022), where extensive numerical tests were performed.

18.1 Properties of EKI

a derivative- and gradient-free optimization method;
full uncertainty quantification;
relatively few evaluations of the (possibly expensive) forward model;
robust to noise in the evaluation of the model.

18.2 Formulation

Recall the inverse problem: recover an unknown function \(\theta\) from known data \(y\) related by

\[ y = \mathcal{G}(\theta) + \eta, \] where \(\mathcal{G}\) denotes the forward model from inputs to observables, \(\eta \sim \mathcal{N}(0, \Sigma_{\eta})\) represents the model error and observation noise, with \(\Sigma_{\eta}\) a positive definite noise covariance matrix.

Conceptually, the idea of EKI is very simple. Just apply the standard EnKF to an augmented system of state plus parameters, for a given number of iterations, to invert for \(\theta.\) We begin by pairing the parameter-to-data map with a dynamical system for the parameter, and then employ techniques from filtering to estimate the parameter given the data.

The system for EKI is written as,

\[\begin{align*} \theta_{n+1} &= \theta_n\\ y_{n+1} &= \mathcal{G}(\theta_{n+1}) + \eta_{n+1}, \end{align*}\]

where the operator \(\mathcal{G}\) contains both the system dynamics and the observation function.

Consider the stochastic dynamical system

\[\begin{align} &\textrm{evolution:} && \theta_{n+1} = \theta_{n} + \omega_{n+1}, &&\omega_{n+1} \sim \mathcal{N}(0,\Sigma_{\omega}),\\ &\textrm{observation:} && x_{n+1} = \mathcal{F}(\theta_{n+1}) + \nu_{n+1}, &&\nu_{n+1} \sim \mathcal{N}(0,\Sigma_{\nu}). \end{align}\]

We seek the best Gaussian approximation of the posterior distribution of \(\theta\) for ill-posed inverse problems, where the prior is a Gaussian, \(\theta_0 \sim \mathcal{N}(r_0, \Sigma_0).\)

Consider the case:

\(\Sigma_{\omega} = \frac{\Delta t}{1 - \Delta t} C_{n}\), where \(C_{n}\) is the covariance estimation at the current step.
\(x_{n+1} = \begin{bmatrix} y \\ r_0 \end{bmatrix}, \quad \mathcal{F}(\theta) = \begin{bmatrix} \mathcal{G}(\theta) \\ \theta \end{bmatrix},\quad \textrm{and}\quad \Sigma_{\nu} = \frac{1}{\Delta t} \begin{bmatrix} \Sigma_{\eta} & 0 \\ 0 & \Sigma_0\end{bmatrix}\)

where \(r_0\) and \(\Sigma_0\) are prior mean and covariance, and the hyperparameter \(0 < \Delta t < 1\) is set to \(1/2.\)

Linear Analysis

In the linear setting,

\[ \mathcal{G}(\theta) = G\cdot \theta \qquad F = \begin{bmatrix} G \\ I \end{bmatrix} \]

The update equations become

\[\begin{align*} \hat{m}_{n+1} &= m_n\\ \hat{C}_{n+1} &= \frac{1}{1 - \Delta t} C_{n} \end{align*}\]

and

\[\begin{align*} m_{n+1} &= m_{n} + \hat{C}_{n+1} F^T (F \hat{C}_{n+1} F^T + \Sigma_{\nu,n+1})^{-1} (x_{n+1} - F m_{n}) \\ C_{n+1}&= \hat{C}_{n+1} - \hat{C}_{n+1} F^T(F \hat{C}_{n+1} F^T + \Sigma_{\nu,n+1})^{-1} F \hat{C}_{n+1}, \end{align*}\]

We have the following theorem about the convergence of the algorithm in the setting of the linear forward model:

Theorem (EKI)

Assume that the prior covariance matrix \(\Sigma_{0} \succ 0\) and initial covariance matrix \(C_{0} \succ 0.\) The iteration for the conditional mean \(m_n\) and covariance matrix \(C_{n}\) characterizing the distribution of \(\theta_n|Y_n\) converges exponentially fast to the posterior mean, \(m_{\mathrm{post}},\) and covariance, \(C_{\mathrm{post}}.\)

Reference

Please see Efficient Derivative-free Bayesian Inference for Large-Scale Inverse Problems, the arXiv version of (Huang et al. 2022).

Note

This result has been recently extended to near-Gaussian, nonlinear cases in (Carrillo et al. 2024a) and (Carrillo et al. 2024b).

18.3 Algorithms: EKI, ETKI

We will formulate the inversion for the stochastic ensemble filter (EKI, see Chapter 11 and Section 15.2) and then for the ensemble transfer filter (ETKI, see Section 15.3), the latter having a better convergence behavior.

EKI

Prediction step: \[ \hat{m}_{n+1} = m_n, \quad \hat{\theta}^j_{n+1} = \hat{m}_{n+1} + \sqrt{\frac{1}{1-\Delta \tau}} \left( \theta_n^j - m_n \right) \]
Analysis step:

\[\begin{align*} &\hat{x}_{n+1}^{j} = \mathcal{F}(\hat{\theta}_{n+1}^{j}) , \qquad \hat{x}_{n+1} = \frac{1}{J}\sum_{j=1}^{J}\hat{x}_{n+1}^{j},\\ &\hat{C}_{n+1}^{\theta x} = \frac{1}{J-1}\sum_{j=1}^{J}(\hat{\theta}_{n+1}^{j} - \hat{m}_{n+1})(\hat{x}_{n+1}^{j} - \hat{x}_{n+1})^\mathrm{T} , \\ &\hat{C}_{n+1}^{xx} = \frac{1}{J-1}\sum_{j=1}^{J}(\hat{x}_{n+1}^{j} - \hat{x}_{n+1})(\hat{x}_{n+1}^{j} - \hat{x}_{n+1})^\mathrm{T} +\Sigma_{\nu,n+1}, \\ &\theta_{n+1}^{j} = \hat{\theta}_{n+1}^{j} + \hat{C}_{n+1}^{\theta x}\left(\hat{C}_{n+1}^{xx}\right)^{-1}(x - \hat{x}_{n+1}^{j} - \nu_{n+1}^{j}),\\ &m_{n+1} = \frac{1}{J} \sum_{j=1}^{J} \theta_{n+1}^{j} , \end{align*}\]

where \(\nu_{n+1}^{j} \sim \mathcal{N}(0,\Sigma_{\nu,n+1}).\)

A few remarks.

The covariance update equation, \[ C_{n+1} = \hat{C}_{n+1} + \hat{C}_{n+1}^{\theta x} \left( \hat{C}_{n+1}^{x x} \right)^{-1} \left(\hat{C}_{n+1}^{\theta x}\right)^\mathrm{T} \] does not hold here. This will be remedied by the ETKF.
The factor \(\sqrt{1/(1-\Delta \tau)}\) produces covariance inflation and prevents filter collapse.
In the analysis step, noise is added in the \(\theta\) update instead of in the \(\hat{x}\) step to ensure symmetry and positive-definiteness of \(\hat{C}_{n+1}^{xx}.\)

For the ensemble transform filter (described above in Section 15.3), we need to define some matrix square roots for the covariance matrices \(\hat{C}_{n+1}^{xx}\) and \(\hat{C}_{n+1}^{\theta x},\) as follows. We denote the matrix square roots \(\hat{Z}_{n+1},\, Z_{n+1} \in \mathbb{R}^{N_{\theta}\times J}\) of \(\hat{C}_{n+1}^{\theta x},\,C_{n+1}^{\theta x}\) and \(\hat{\mathcal{Y}}_{n+1}\) of \(\hat{C}_{n+1}^{x x},\) and define,

\[\begin{align*} \hat{Z}_{n+1} &= \frac{1}{\sqrt{J-1}}\Big(\hat{\theta}_{n+1}^{1} - \hat{m}_{n+1}\quad \hat{\theta}_{n+1}^{2} - \hat{m}_{n+1}\quad...\quad\hat{\theta}_{n+1}^{J} - \hat{m}_{n+1} \Big),\\ Z_{n+1} &= \frac{1}{\sqrt{J-1}}\Big(\theta_{n+1}^{1} - m_{n+1}\quad \theta_{n+1}^{2} - m_{n+1}\quad...\quad\theta_{n+1}^{J} - m_{n+1} \Big),\\ \hat{\mathcal{Y}}_{n+1} &= \frac{1}{\sqrt{J-1}}\Big(\hat{x}_{n+1}^{1} - \hat{x}_{n+1}\quad \hat{x}_{n+1}^{2} - \hat{x}_{n+1}\quad...\quad\hat{x}_{n+1}^{J} - \hat{x}_{n+1} \Big). \end{align*}\]

ETKI

Prediction step :

\[\begin{align*} \hat{\theta}_{n+1}^{j} &= \theta_{n}^{j} + \omega_{n+1}^{j},\\ \hat{m}_{n+1} &= \frac{1}{J}\sum_{j=1}^{J}\hat{\theta}_{n+1}^{j} \end{align*}\]

Analysis step :

\[\begin{align*} &m_{n+1} = \hat{m}_{n+1} + \hat{C}_{n+1}^{\theta x}\left(\hat{C}_{n+1}^{xx}\right)^{-1}(x - \hat{x}_{n+1})\\ &Z_{n+1} = \hat{Z}_{n+1} T \end{align*}\]

where \(T = P(\Gamma + I)^{-\frac{1}{2}}P^\mathrm{T}\), with

\[\begin{align*} \textrm{SVD:} \quad \hat{\mathcal{Y}}_{n+1} \Sigma_{\nu,n+1}^{-1} \hat{\mathcal{Y}}_{n+1} = P\Gamma P^\mathrm{T} \end{align*}\]

Remark

When \(\Sigma_{\omega} = \gamma C_n\), the prediction step can be treated deterministically, as follows,

\[\begin{align*} \hat{m}_{n+1} &= m_n \\ \hat{\theta}_{n+1}^{j} &= \hat{m}_{n+1} + \sqrt{1 + \gamma} \; (\theta_n^{j} - m_n)\\ \end{align*}\]

18.4 Conclusions

Kalman-based inversion has been widely used to construct derivative-free optimization and s ampling methods for nonlinear inverse problems. In the paper (Huang et al. 2022), the authors developed new Kalman-based inversion methods, for Bayesian inference and uncertainty quantification, which build on the work in both optimization and sampling. They propose a new method for Bayesian inference based on filtering a novel mean-field dynamical system subject to partial noisy observations, and which depends on the law of its own filtering distribution, together with application of the Kalman methodology. Theoretical guarantees are presented: for linear inverse problems, the mean and covariance obtained by the method converge exponentially fast to the posterior mean and covariance. For nonlinear inverse problems, numerical studies indicate the method delivers an excellent approximation of the posterior distribution for problems which are not too far from Gaussian.

In terms of performance:

The methods are shown to be superior to existing coupling/transport methods, collectively known as iterative Kalman methods.
Deterministic, such as ETKF, rather than stochastic implementations of Kalman methodology are found to be favorable.

Asch, Mark. 2022. A Toolbox for Digital Twins: From Model-Based to Data-Driven. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611976977.

Asch, Mark, Marc Bocquet, and Maëlle Nodet. 2016. Data Assimilation: Methods, Algorithms, and Applications. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611974546.

Calvello, Edoardo, Sebastian Reich, and Andrew M. Stuart. 2022. “Ensemble Kalman Methods: A Mean Field Perspective.” arXiv (to appear in Acta Numerica 2025). http://arxiv.org/abs/2209.11371.

Carrillo, J. A., F. Hoffmann, A. M. Stuart, and U. Vaes. 2024a. “Statistical Accuracy of Approximate Filtering Methods.” https://arxiv.org/abs/2402.01593.

———. 2024b. “The Mean Field Ensemble Kalman Filter: Near-Gaussian Setting.” https://arxiv.org/abs/2212.13239.

Dashti, Masoumeh, and Andrew M. Stuart. 2015. “The Bayesian Approach to Inverse Problems.” In Handbook of Uncertainty Quantification, edited by Roger Ghanem, David Higdon, and Houman Owhadi, 1–118. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-11259-6_7-1.

Huang, Daniel Zhengyu, Jiaoyang Huang, Sebastian Reich, and Andrew M Stuart. 2022. “Efficient Derivative-Free Bayesian Inference for Large-Scale Inverse Problems.” Inverse Problems 38 (12): 125006. https://doi.org/10.1088/1361-6420/ac99fa.

Iglesias, Marco A, Kody J H Law, and Andrew M Stuart. 2013. “Ensemble Kalman Methods for Inverse Problems.” Inverse Problems 29 (4): 045001. https://doi.org/10.1088/0266-5611/29/4/045001.

James, G., D. Witten, T. Hastie, and R. Tibshirani. 2021. An Introduction to Statistical Learning with Applications in R. Second Edition. Springer-Verlag New York. https://doi.org/10.1007/978-1-0716-1418-1.

Law, Kody, Andrew Stuart, and Konstantinos Zygalakis. 2015. Data Assimilation: A Mathematical Introduction. Vol. 62. Texts in Applied Mathematics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-20325-6.

Reich, Sebastian, and Colin Cotter. 2015. Probabilistic Forecasting and Bayesian Data Assimilation. Cambridge University Press.

Sanita Vetra-Carvalho, Lars Nerger, Peter Jan van Leeuwen, and Jean-Marie Beckers. 2018. “State-of-the-Art Stochastic Data Assimilation Methods for High-Dimensional Non-Gaussian Problems.” Tellus A: Dynamic Meteorology and Oceanography 70 (1): 1–43. https://doi.org/10.1080/16000870.2018.1445364.

Särkkä, S., and L. Svensson. 2023. Bayesian Filtering and Smoothing. 2nd ed. Institute of Mathematical Statistics Textbooks. Cambridge University Press. https://doi.org/10.1017/9781108917407.