DT-tbx website | NN for PDEs

In a recent talk I attempted to review the state-of-the-art in the use of neural networks for solving direct and inverse problems based on PDEs. This topic is treated in Chapter 13 of the book, but is evolving at lightning speed, so I will need to update this post regularly.

As always, my objective is to see how these approaches can be used in concrete digital twins. The potential is, IMHO, enormous. They represent, in theory and in practice, an ideal coupling of data-driven and model-based methods. Please consult the post on AI for Science, for a more general viewpoint.

The theoretical foundations for the approaches are strong:

The universal approximation property.
The “unreasonable” efficiency of automatic differetntiation.

We will come back to these two, after formulating the problem.

Recall that for Digital Twins, in essence, we want to model a mapping \(f\) between input variables (design parameters), \(\theta,\) and output data (observations, measurements, simulation results), \(y,\)

\[y = f(x; \theta),\]

where \(x\) represents the independent variable (space, time). For given \(\theta,\) (and \(f\)) this is the direct problem. If we know \(y\) and we seek \(\theta,\) then we need to solve an inverse problem. In real contexts, there will be an additional term \(\xi(x,t)\) to model the noise, or uncertainty,

\[y = f(x; \theta) + \xi .\]

Solving the direct and inverse problems can be difficult, time-consuming, unreliable, in particular in the presence of noise, which is always present, though rarely taken into account. The inverse problems are notoriously complicated and for concrete problems, impossible to solve in reasonable time and with reasonable accuracy, unless one disposes of immense computational power, eg. for daily weather forecasting.

So, the challenge is: if we have (a lot of) measurements, and if we have reliable models, can we combine the two, using machine learning to go beyond what each one, separately, can achieve? Let me reassure you, the answer is “Yes, but…”, since the recent advances are impressive, though there is “no free lunch”. Meaning, one has to work on one’s particular context, process, problem in order to obtain the promised/desired improvements.

Let us now return to the two foundations.

The first foundation justifies the use of simple neural networks for the approximation of a complex, nonlinear map bteween inputs and outputs. There are solid theorems that guarantee the convergence of NNs to virtually any complex functional relationship. We can then build machine leanring models based directly on these theorems, just as we have always done for model-based approximations and their computational modeling.

The second foundation provides the means to ensure that the data-driven model also respects the underlying physics/chemistry/biology/etc. This is essential if we are to base decision-making on the outcomes of our DTs. Theoretically, this respect of the “physics” is extremely simple to implement. Let us now explain this simplicity.

Suppose that the ML model mimimizes a loss function \(\mathcal{L}_M(w),\) where \(w\) are the parameters—weights and biases—of the neural network. Now, just add terms to the loss function that enforce the respect of the PDE and its initial/boundary conditions. That is, we seek to minimize the composite loss function

\[\mathcal{L}_M(w,\theta) = \mathcal{L}_M(w) + \mathcal{L}_P(\theta),\]

where \(\theta\) are the coefficients of the PDE. This formulation is then valid for

the direct problem, and
the inverse problem.

The actual formulation of \(\mathcal{L}_P(\theta)\) is obtained using AD (automatic differentiation), just as used in the stochastic gradient minimization algorithm for finding the best NN coefficients. It’s as “easy” as that! Ok, “easy” is in quotes, since we obtain a horrible optimization problem that usually requires some intense hyperparameter tuning, or ensembles of initializations, to converge reasonably well. But when it does, we are then in the SUMO position—see this blog post—where once we have completed the expensive, offline learning stage, we have an inference tool that can provide speedups of \(10^3\) or \(10^4.\) This is not a 2X or 3X speedup, but literally three to four orders of magnitude! For example, a 10 hour computation can be performed in \(0.01\) hours, or \(100\) such computations can be performed in one hour.

For all the details and numerous examples, the following references are recommended.

Lu, Karniadakis. SIAM Review, 2021.
Wang, Wang, Bhouri, Perdikaris. arXiv:2103.10974v1, arXiv:2106.05384, arXiv:2110.01654, arXiv:2110.13297
Mishra, Molinaro. arXiv:2006.16144v2
Battacharya, Hosseini, Kovachki, Stuart. arXiv:2005.03180
Lu, Meng, Cai, Mai, Goswami, Zhang, Karniadakis. arXiv:2111.05512

My talk is a personal summary of the above references, in order to apply the approach to a digital twin of a sperm-whale that we are developing in the framework of the national AI chair ADSIL. In the twins section, there is a description of the whale’s digital twin.