Adaptive step size ODE filters

In ProbNum the step size adaptation scheme from Schober et al., 2019 is implemented.

In this notebook we demonstrate how for the KF-IBM(1) Combo, they work quite well (which is what is claimed in the paper) but also, how it doesn’t quite generalize to other filters.

[1]:
# Make inline plots vector graphics instead of raster graphics
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')

# Plotting
import matplotlib.pyplot as plt
plt.style.use('../probnum.mplstyle')
[2]:
import matplotlib.pyplot as plt
import numpy as np
[3]:
from probnum import random_variables as rvs
from probnum.diffeq import IVP, lotkavolterra, probsolve_ivp

IBM(1) with KF

We start by showing how solving an IVP with adaptive steps works.

[4]:
initrv = rvs.Constant(np.array([20, 20]))
ivp = lotkavolterra([0.0, 20.0], initrv)
sol = probsolve_ivp(ivp, tol=1e-1)

means = sol.y.mean

plt.plot(sol.t, means[:, 0], label="Prey")
plt.plot(sol.t, means[:, 1], label="Predators")
plt.legend(loc="upper right")
plt.show()
../_images/tutorials_adaptive_steps_odefilter_5_0.svg

Lets visualise the individual steps.

[5]:
plt.plot(sol.t, means[:, 0], label="Prey")
plt.plot(sol.t, means[:, 1], label="Predators")
for t in sol.t:
    plt.axvline(t, linewidth=0.15, color="gray")
plt.legend(loc="upper right")
plt.show()
../_images/tutorials_adaptive_steps_odefilter_7_0.svg

Note how more steps are taken at the inclines than at the declines.

Adaptive steps in a very noisy setting and q=2 priors

We investigate how different priors and different filters fare, wenn the ODE is corrupted. That is, we add \(R=0.5\) to the measurement model at each step.

[6]:
priors = ["ibm2", "ioup2", "matern52"]
filters = ["ekf0", "ekf1", "ukf"]

initrv = rvs.Constant(np.array([20.0, 20.0]))
t0, tmax = 0.0, 16
ivp = lotkavolterra([t0, tmax], initrv)

fig, ax = plt.subplots(
    nrows=len(priors), ncols=len(filters), sharex=True, sharey=True, figsize=(12, 12)
)

for idx in range(len(priors)):  # priors in rows
    for jdx in range(len(filters)):  # filters in cols
        sol = probsolve_ivp(
            ivp,
            tol=0.25,
            firststep=None,
            evlvar=5e-1,
            which_prior=priors[idx],
            method=filters[jdx],
        )
        ts, means, stds = sol.t, sol.y.mean, sol.y.std
        ax[idx][jdx].plot(ts, means)
        ax[idx][jdx].fill_between(
            ts, means[:, 0] - 3 * stds[:, 0], means[:, 0] + 3 * stds[:, 0], alpha=0.25
        )
        ax[idx][jdx].fill_between(
            ts, means[:, 1] - 3 * stds[:, 1], means[:, 1] + 3 * stds[:, 1], alpha=0.25
        )
        for t in ts:
            ax[idx][jdx].axvline(t, linewidth=0.1, color="black")
        ax[idx][jdx].set_title(priors[idx] + " - " + filters[jdx])
        ax[idx][jdx].set_xlim((t0, tmax))
        ax[idx][jdx].set_ylim((-10, 75))
        ax[idx][jdx].set_title(priors[idx] + " - " + filters[jdx])
plt.show()
../_images/tutorials_adaptive_steps_odefilter_10_0.svg

Note how the Matern prior goes to sleep straightaway. The IOUP and IBM priors seem to retract some sort of periodicity, especially with the KF method. EKF and UKF are doing really badly, but at least their uncertainty explodes which is something.

It seems that IOUP implies more steps than IBM and that Matern implies many more steps. This is probably due to the different whitenings in error estimation: the innovation term is much smaller for Matern and IOUP, hence the error estimate is increased compared to the IBM. It is not clear what the better choice is between IBM and IOUP, however, it is evident that the innovation term for Matern is a bit overconfident.

The same, in a “nicer” setting

Here, nicer means: a lot less evaluation variance (1e-6 instead of 5e-1). This is a well-posed problem and as such, should be solved well with the adaptive step sizes.

[7]:
priors = ["ibm2", "ioup2", "matern52"]
filters = ["ekf0", "ekf1", "ukf"]

initrv = rvs.Constant(np.array([20, 20]))
t0, tmax = 0.0, 16
ivp = lotkavolterra([t0, tmax], initrv)

fig, ax = plt.subplots(
    nrows=len(priors), ncols=len(filters), sharex=True, sharey=True, figsize=(12, 12)
)

for idx in range(len(priors)):  # priors in rows
    for jdx in range(len(filters)):  # filters in cols
        sol = probsolve_ivp(
            ivp,
            tol=0.25,
            firststep=None,
            evlvar=1e-06,
            which_prior=priors[idx],
            method=filters[jdx],
        )
        ts, means, stds = sol.t, sol.y.mean, sol.y.std
        ax[idx][jdx].plot(ts, means)
        ax[idx][jdx].fill_between(
            ts, means[:, 0] - 3 * stds[:, 0], means[:, 0] + 3 * stds[:, 0], alpha=0.25
        )
        ax[idx][jdx].fill_between(
            ts, means[:, 1] - 3 * stds[:, 1], means[:, 1] + 3 * stds[:, 1], alpha=0.25
        )
        for t in ts:
            ax[idx][jdx].axvline(t, linewidth=0.05, color="black")
        ax[idx][jdx].set_title(priors[idx] + " - " + filters[jdx])
        ax[idx][jdx].set_xlim((t0, tmax))
        ax[idx][jdx].set_ylim((-10, 35))
        ax[idx][jdx].set_title(priors[idx] + " - " + filters[jdx])
plt.show()
../_images/tutorials_adaptive_steps_odefilter_13_0.svg

They all seem to catch the true solution. Note again how the Matern prior again takes (probably way) too many steps. But at least it catches the true solution. EKF and UKF seem unstable, but using them in conjunction with IOUP seems to lessen the problem. At least for IBM their uncertainties are very large at the right places. In general it seems that IOUP fits a bit better for Lotka-Volterra equations but that is a rather vague statement and needs more investigation.

Could it have been the hyperparameters?

It seems that the “too many steps” phenomena were due to poor hyperparameter choices.

[8]:
priors = ["ibm2", "ioup2", "matern52"]
filters = ["ekf0", "ekf1", "ukf"]

initrv = rvs.Constant(np.array([20, 20]))
t0, tmax = 0.0, 16
ivp = lotkavolterra([t0, tmax], initrv)

fig, ax = plt.subplots(
    nrows=len(priors), ncols=len(filters), sharex=True, sharey=True, figsize=(12, 12)
)

for idx in range(len(priors)):  # priors in rows
    for jdx in range(len(filters)):  # filters in cols
        sol = probsolve_ivp(
            ivp,
            tol=0.125,
            firststep=None,
            evlvar=1e-04,
            lengthscale=8,
            which_prior=priors[idx],
            method=filters[jdx],
        )
        ts, means, stds = sol.t, sol.y.mean, sol.y.std
        ax[idx][jdx].plot(ts, means)
        ax[idx][jdx].fill_between(
            ts, means[:, 0] - 3 * stds[:, 0], means[:, 0] + 3 * stds[:, 0], alpha=0.25
        )
        ax[idx][jdx].fill_between(
            ts, means[:, 1] - 3 * stds[:, 1], means[:, 1] + 3 * stds[:, 1], alpha=0.25
        )
        for t in ts:
            ax[idx][jdx].axvline(t, linewidth=0.05, color="black")
        ax[idx][jdx].set_title(priors[idx] + " - " + filters[jdx])
        ax[idx][jdx].set_xlim((t0, tmax))
        ax[idx][jdx].set_ylim((-10, 35))
        ax[idx][jdx].set_title(priors[idx] + " - " + filters[jdx])
plt.show()
../_images/tutorials_adaptive_steps_odefilter_16_0.svg

Unfortunately we have to adjust the evaluation variance in order to guarantee positive definiteness of the innovation term.

The IBM prior essentially has no hyperparameters (except for the diffusion constant which gets calibrated in the end and is only responsible for the uncertainty estimates). Matern and IOUP however depend on length scale and drift speed, for example. If the lengthscale is increased from \(1\) to \(10\), the Matern solution roughly takes as many steps as the IBM prior.

This also makes the Matern solution do much better on the noisy setting.

The problem here is that in order to calibrate these hyperparameters in advance we cannot rely on usual GP tuning methods as the data is not available. It seems that setting it to lengthscale = 1/tol does a decent job, however, that is not backed by theory.

Future Work

Based on this, it seems that a promising line of work is hyperparameter tuning for IOUP and Matern priors. If this does well, that opens a lot of doors for reliable usage of these filters in interesting settings.

It might also be interesting to think more about the relative error/absolute error relation. Of course, this goes hand in hand with the previous point (better hyperparameters give better error estimates which give more reasonable steps) but is an interesting and important question on its own.