Randomized Maximum Likelihood Solver

This notebook demonstrates the use of the Randomized Maximum Likelihood Solver (RML) implemented in the Pastas-Plugins package. It compares three groundwater model calibration approaches:

  • Least Squares (LS)

  • RML with finite-difference (2-point) Jacobian

  • RML with empirical Jacobian

The workflow includes generating synthetic observations, defining the Pastas model structure, running calibrations, generating parameter ensembles, and comparing confidence intervals and simulations.

Setup

Packages

The notebook imports numerical and hydrological modeling packages:

import warnings
from typing import Literal

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pastas as ps

from pastas_plugins.pest.solver import RandomizedMaximumLikelihoodSolver

ps.set_log_level("DEBUG")
warnings.filterwarnings("ignore", category=DeprecationWarning)
/home/docs/checkouts/readthedocs.org/user_builds/pastas-plugins/envs/latest/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

Data

The notebook loads the Wagna dataset and extracts evaporation and precipitation. These serve as input stresses for the RechargeModel.

df = ps.load_dataset("collenteur_2021")["wagna"]
evap = df["evaporation [mm/d]"].rename("evaporation").dropna()
prec = df["precipitation [mm/d]"].rename("precipitation").loc[evap.index]

Helper functions

Some helper functions to quickly setup models and synthetic series

def get_model(obs: pd.Series) -> ps.Model:
    """Creates a standard Pastas model template.

    Adds a RechargeModel with a Gamma response function.

    Parameters
    ----------
    obs : pd.Series
        Observations series for the model.

    Returns
    -------
    ps.Model
        The model with RechargeModel and noise model configured.
    """
    ml = ps.Model(obs, name="synthetic_model")
    sms = []
    rm = ps.RechargeModel(
        prec=prec,
        evap=evap,
        rfunc=ps.Gamma(),
        name="rch",
    )
    sms.append(rm)
    ml.add_stressmodel(sms)
    ml.initialize()
    ml.set_parameter(
        "constant_d",
        pmin=obs.mean() - 4 * obs.std(),
        pmax=obs.mean() + 4 * obs.std(),
    )
    ml.set_parameter(name="rch_a", initial=100.0)

    return ml


def create_synthetic_series(
    noise_std: float = 0.0, freq: Literal["D", "14D"] = "D"
) -> pd.Series:
    """Generates a synthetic groundwater head time series.

    Builds a model with known parameters, simulates the head time series,
    adds Gaussian noise, and resamples to the chosen frequency.

    Parameters
    ----------
    noise_std : float, optional
        Standard deviation of Gaussian noise to add to observations, by default 0.0
    freq : Literal["D", "14D"], optional
        Resampling frequency ("D" for daily, "14D" for 14-day), by default "D"

    Returns
    -------
    pd.Series
        Synthetic groundwater head observations with realistic but controlled values.
    """
    date_range = pd.date_range(tmin, tmax, freq="D")
    obs = pd.Series(0.0, index=date_range, name="head_obs")
    ml = get_model(obs)
    ml.set_parameter("rch_A", initial=rch_A, move_bounds=True)
    ml.set_parameter("rch_n", initial=rch_n, move_bounds=True)
    ml.set_parameter("rch_a", initial=rch_a, move_bounds=True)
    ml.set_parameter("rch_f", initial=rch_f, move_bounds=True)
    ml.set_parameter("constant_d", initial=constant_d, move_bounds=True)
    sim = ml.simulate(tmin=tmin, tmax=tmax)
    if noise_std > 0.0:
        noise = np.random.default_rng(seed=42).normal(0, noise_std, len(sim))
        sim += noise
    sim = ps.timeseries_utils.pandas_equidistant_sample(sim, freq=freq)

    return sim

Constants

A number of synthetic constants (e.g., rch_A, rch_n, etc.) define the “true” parameters used later in the synthetic model generation.

constant_d = 5.0
rch_A = 0.5
rch_n = 1.1
rch_a = 250.0
rch_f = -0.8
tmin = pd.Timestamp("1980-01-01")
tmax = pd.Timestamp("2019-12-31")

noise_std = 0.025

Create a synthetic head series

head = create_synthetic_series(noise_std=noise_std, freq="14D")
head = head.loc[pd.Timestamp("1990-01-01") : tmax]
head.plot(marker=".", color="k")
Model is not optimized yet, initial parameters are used.
<Axes: >
../_images/604b1b33310460c7f218763f4f69061441427a96bcad391063dbdff4b6f8d8c4.png

Models

Least Squares

A Pastas model (ml_ls) is calibrated with standard least-squares optimization to show as a baseline model to compare the RML uncertainty estimates.

ml_ls = get_model(head)
ml_ls.solve()
axd = ml_ls.plots.results_mosaic(stderr=True)
DEBUG:pastas.model:[Errno 6] No such device or address
DEBUG:pastas.model:Adding LeastSquares as default solver.
Fit report synthetic_model          Fit Statistics
==================================================
nfev     13                     EVP          97.94
nobs     783                    R2            0.98
noise    False                  RMSE          0.03
tmin     1990-01-02 00:00:00    AICc      -5762.22
tmax     2019-12-24 00:00:00    BIC       -5738.98
freq     D                      Obj           0.25
freq_obs None                   ___               
warmup   3650 days 00:00:00     Interp.         No
solver   LeastSquares           weights        Yes

Parameters (5 optimized)
==================================================
               optimal     initial  vary
rch_A         0.499871    0.146585  True
rch_n         1.098129    1.000000  True
rch_a       250.819346  100.000000  True
rch_f        -0.807632   -1.000000  True
constant_d    5.008214    5.505771  True
../_images/01325a05e3a1c32e46821847ece197d2430114119fa0ab2f4ebad325c4850d8c.png

RML with finite difference jacobian

In this section we build a model using the Randomized Maximum Likelihood (RML) approach. We apply parameter bounds to constant_d and instantiate the RandomizedMaximumLikelihoodSolver with:

  • 250 realizations, forming the RML parameter ensemble

  • A three-point finite-difference Jacobian

    • This means each realization performs its own independent optimization starting from its individually drawn initial parameter set.

    • Each realization computes its own finite-difference Jacobian, just like the standard Pastas least-squares solver.

During solver.initialize(), several important steps occur:

  • The standard deviation of the observation noise is set via standard_deviation=noise_std. This creates a unique synthetic observation series for each ensemble member, representing uncertainty in the measurement data.

  • The arguments method="norm" and par_sigma_range=4.0 define how the initial parameter values for each realization are sampled:

    • method="norm" → initial parameters are drawn from a normal distribution

    • par_sigma_range=4.0 → the parameter bounds (pmin, pmax) are interpreted as ±4 standard deviations of this distribution

Together, these settings define the prior parameter ensemble, from which each realization begins its optimization before contributing to the final RML posterior ensemble.

# test finite difference jacobian
ml_rml = get_model(head)
solver = RandomizedMaximumLikelihoodSolver(
    num_reals=250,
    jacobian_method="2-point",  # other option is 3-point which is slower but more accurate for the jacobian
    # num_workers= # if you want to do stuff in parallel, by default yes. if not set num_workers to 1.
    beta=0.0,  # weighting factor between prior and observation residuals
    minimize_method="SLSQP",  # other options are "trust-constr", "L-BFGS-B" and "TNC"
    noptmax=1000,
)
ml_rml.add_solver(solver)
ml_rml.solver.initialize(
    standard_deviation=noise_std,
    method="norm",
    par_sigma_range=4.0,
)
DEBUG:pastas.model:[Errno 6] No such device or address

Let’s have a look at the prior parameter ensemble.

axes = ml_rml.solver.parameter_ensemble_prior.hist(bins=10)
for ax in axes.ravel():
    title = ax.get_title()
    if title != "":
        ax.axvline(ml_rml.parameters.at[title, "initial"], color="k", label="initial")
plt.tight_layout()
../_images/3abedc4f7936a896db04115b400a70562aa19594747b952a5f0e3492d51824fb.png

The observation_noise is stored under the solver.observation_noise:

solver.observation_noise.head()
0 1 2 3 4 5 6 7 8 9 ... 240 241 242 243 244 245 246 247 248 base
1990-01-02 0.006134 0.013955 0.013881 0.004524 0.048632 0.029283 0.071084 0.031424 0.035165 -0.034352 ... 0.013375 0.054584 -0.004218 -0.045778 0.059480 0.034920 0.000397 -0.004454 -0.028481 0.0
1990-01-16 -0.014197 -0.008229 0.019937 -0.012047 -0.006529 0.040086 0.015138 0.013051 0.008746 0.020920 ... -0.000361 -0.031685 0.022145 0.014614 0.010912 -0.025800 -0.030544 0.032909 -0.039762 0.0
1990-01-30 0.019058 -0.010101 -0.010682 0.012991 -0.017553 0.002911 -0.000692 0.010395 -0.029478 0.031204 ... -0.001924 -0.030718 -0.003169 0.048684 -0.035050 -0.037988 0.002104 -0.024062 -0.019232 0.0
1990-02-13 0.004715 -0.020735 0.034227 0.020423 -0.029738 0.001357 0.017137 -0.021774 -0.006873 0.013100 ... -0.022094 0.024990 -0.014745 0.008126 0.019464 0.025633 -0.016064 0.014447 0.024804 0.0
1990-02-27 0.036680 0.001687 -0.004078 -0.023111 0.011612 -0.007555 0.001709 -0.031365 0.025758 0.006708 ... 0.038635 -0.029654 0.017224 -0.017976 0.029786 0.024733 -0.015048 0.017223 0.009537 0.0

5 rows × 250 columns

Together with the ml.observations() the observation_noise forms the solver.observation_ensemble on which each ensemble is fitted.

f, ax = plt.subplots(figsize=(10, 5))
ax.plot(
    ml_rml.observations().index,
    ml_rml.observations().values,
    "k.",
    label="Observations",
)
ax.fill_between(
    ml_rml.solver.observation_ensemble.index,
    ml_rml.solver.observation_ensemble.min(axis=1),
    ml_rml.solver.observation_ensemble.max(axis=1),
    color="lightgrey",
    label="Observation Ensemble Range",
)

ax.set_xlim(ml_rml.observations().index.min(), ml_rml.observations().index.max())
ax.grid(True)
ax.legend(loc="lower right", bbox_to_anchor=(1.0, 1.0), ncol=2, frameon=False)
<matplotlib.legend.Legend at 0x73dc0e5cc810>
../_images/0f3ab54a60128b37ca964af36a99a0df7945fc3069be841679bf7cff5ddc804f.png

Let’s solve this. It might take up to a minute if it is done in parallel.

ml_rml.solve(report=True, initalize=False)
ERROR:pastas_plugins.pest.solver:Unexpected keyword arguments: ['initalize'], to solve method. These kwargs will be ignored for now. See issue https://github.com/pastas/pastas/pull/1031.
RML looping over realizations:   0%|          | 0/250 [00:00<?, ?it/s]
RML looping over realizations:   0%|          | 1/250 [00:00<02:10,  1.91it/s]
RML looping over realizations:   1%|          | 2/250 [00:00<01:40,  2.46it/s]
RML looping over realizations:   1%|          | 3/250 [00:01<01:47,  2.30it/s]
RML looping over realizations:   2%|▏         | 4/250 [00:01<01:24,  2.92it/s]
RML looping over realizations:   2%|▏         | 5/250 [00:01<01:20,  3.06it/s]
RML looping over realizations:   2%|▏         | 6/250 [00:02<01:16,  3.21it/s]
RML looping over realizations:   3%|▎         | 7/250 [00:02<01:05,  3.71it/s]
RML looping over realizations:   3%|▎         | 8/250 [00:02<01:17,  3.12it/s]
RML looping over realizations:   4%|▎         | 9/250 [00:03<01:20,  2.98it/s]
RML looping over realizations:   4%|▍         | 10/250 [00:03<01:04,  3.71it/s]
RML looping over realizations:   4%|▍         | 11/250 [00:03<01:21,  2.93it/s]
RML looping over realizations:   5%|▍         | 12/250 [00:03<01:07,  3.55it/s]
RML looping over realizations:   5%|▌         | 13/250 [00:04<01:25,  2.78it/s]
RML looping over realizations:   6%|▌         | 14/250 [00:04<01:13,  3.23it/s]
RML looping over realizations:   6%|▌         | 15/250 [00:05<01:22,  2.85it/s]
RML looping over realizations:   7%|▋         | 17/250 [00:05<01:17,  3.02it/s]
RML looping over realizations:   8%|▊         | 19/250 [00:06<01:08,  3.38it/s]
RML looping over realizations:   8%|▊         | 21/250 [00:06<00:51,  4.41it/s]
RML looping over realizations:   9%|▉         | 22/250 [00:06<00:55,  4.08it/s]
RML looping over realizations:   9%|▉         | 23/250 [00:07<01:00,  3.74it/s]
RML looping over realizations:  10%|▉         | 24/250 [00:07<00:55,  4.05it/s]
RML looping over realizations:  10%|█         | 25/250 [00:07<00:59,  3.79it/s]
RML looping over realizations:  10%|█         | 26/250 [00:07<00:55,  4.07it/s]
RML looping over realizations:  11%|█         | 27/250 [00:08<01:01,  3.60it/s]
RML looping over realizations:  11%|█         | 28/250 [00:08<00:52,  4.25it/s]
RML looping over realizations:  12%|█▏        | 29/250 [00:08<00:56,  3.89it/s]
RML looping over realizations:  12%|█▏        | 30/250 [00:08<00:51,  4.25it/s]
RML looping over realizations:  12%|█▏        | 31/250 [00:09<01:00,  3.64it/s]
RML looping over realizations:  13%|█▎        | 32/250 [00:09<00:51,  4.22it/s]
RML looping over realizations:  13%|█▎        | 33/250 [00:09<00:55,  3.88it/s]
RML looping over realizations:  14%|█▎        | 34/250 [00:09<00:55,  3.90it/s]
RML looping over realizations:  14%|█▍        | 35/250 [00:10<00:54,  3.94it/s]
RML looping over realizations:  14%|█▍        | 36/250 [00:10<00:58,  3.64it/s]
RML looping over realizations:  15%|█▍        | 37/250 [00:10<00:53,  3.98it/s]
RML looping over realizations:  15%|█▌        | 38/250 [00:10<01:00,  3.49it/s]
RML looping over realizations:  16%|█▌        | 39/250 [00:11<00:55,  3.78it/s]
RML looping over realizations:  16%|█▌        | 40/250 [00:11<01:10,  2.99it/s]
RML looping over realizations:  16%|█▋        | 41/250 [00:11<00:59,  3.51it/s]
RML looping over realizations:  17%|█▋        | 42/250 [00:12<01:15,  2.75it/s]
RML looping over realizations:  17%|█▋        | 43/250 [00:12<01:05,  3.14it/s]
RML looping over realizations:  18%|█▊        | 44/250 [00:12<01:00,  3.39it/s]
RML looping over realizations:  18%|█▊        | 45/250 [00:13<01:07,  3.02it/s]
RML looping over realizations:  19%|█▉        | 47/250 [00:13<00:58,  3.49it/s]
RML looping over realizations:  19%|█▉        | 48/250 [00:13<00:52,  3.86it/s]
RML looping over realizations:  20%|█▉        | 49/250 [00:14<01:02,  3.22it/s]
RML looping over realizations:  20%|██        | 51/250 [00:14<01:01,  3.22it/s]
RML looping over realizations:  21%|██        | 53/250 [00:15<00:52,  3.72it/s]
RML looping over realizations:  22%|██▏       | 54/250 [00:15<00:50,  3.90it/s]
RML looping over realizations:  22%|██▏       | 55/250 [00:15<00:57,  3.39it/s]
RML looping over realizations:  23%|██▎       | 57/250 [00:16<00:58,  3.33it/s]
RML looping over realizations:  24%|██▎       | 59/250 [00:17<01:01,  3.12it/s]
RML looping over realizations:  24%|██▍       | 61/250 [00:17<00:43,  4.35it/s]
RML looping over realizations:  25%|██▍       | 62/250 [00:17<00:48,  3.88it/s]
RML looping over realizations:  25%|██▌       | 63/250 [00:17<00:44,  4.18it/s]
RML looping over realizations:  26%|██▌       | 64/250 [00:18<00:54,  3.40it/s]
RML looping over realizations:  26%|██▌       | 65/250 [00:18<00:53,  3.48it/s]
RML looping over realizations:  26%|██▋       | 66/250 [00:18<00:52,  3.53it/s]
RML looping over realizations:  27%|██▋       | 67/250 [00:19<00:57,  3.16it/s]
RML looping over realizations:  27%|██▋       | 68/250 [00:19<00:54,  3.33it/s]
RML looping over realizations:  28%|██▊       | 69/250 [00:19<00:56,  3.20it/s]
RML looping over realizations:  28%|██▊       | 70/250 [00:20<00:56,  3.16it/s]
RML looping over realizations:  28%|██▊       | 71/250 [00:20<00:45,  3.91it/s]
RML looping over realizations:  29%|██▉       | 72/250 [00:20<01:00,  2.95it/s]
RML looping over realizations:  30%|██▉       | 74/250 [00:21<00:52,  3.38it/s]
RML looping over realizations:  30%|███       | 75/250 [00:21<00:43,  4.01it/s]
RML looping over realizations:  30%|███       | 76/250 [00:22<00:57,  3.01it/s]
RML looping over realizations:  31%|███       | 78/250 [00:22<00:52,  3.30it/s]
RML looping over realizations:  32%|███▏      | 79/250 [00:22<00:49,  3.45it/s]
RML looping over realizations:  32%|███▏      | 80/250 [00:23<00:50,  3.38it/s]
RML looping over realizations:  32%|███▏      | 81/250 [00:23<00:42,  3.93it/s]
RML looping over realizations:  33%|███▎      | 82/250 [00:23<00:50,  3.36it/s]
RML looping over realizations:  33%|███▎      | 83/250 [00:24<00:53,  3.13it/s]
RML looping over realizations:  34%|███▎      | 84/250 [00:24<00:43,  3.81it/s]
RML looping over realizations:  34%|███▍      | 85/250 [00:24<00:51,  3.23it/s]
RML looping over realizations:  35%|███▍      | 87/250 [00:24<00:40,  3.99it/s]
RML looping over realizations:  35%|███▌      | 88/250 [00:25<00:35,  4.56it/s]
RML looping over realizations:  36%|███▌      | 89/250 [00:25<00:43,  3.67it/s]
RML looping over realizations:  36%|███▌      | 90/250 [00:25<00:37,  4.31it/s]
RML looping over realizations:  36%|███▋      | 91/250 [00:26<00:50,  3.17it/s]
RML looping over realizations:  37%|███▋      | 93/250 [00:26<00:41,  3.79it/s]
RML looping over realizations:  38%|███▊      | 95/250 [00:27<00:39,  3.90it/s]
RML looping over realizations:  39%|███▉      | 97/250 [00:27<00:40,  3.81it/s]
RML looping over realizations:  40%|███▉      | 99/250 [00:28<00:38,  3.89it/s]
RML looping over realizations:  40%|████      | 101/250 [00:28<00:41,  3.60it/s]
RML looping over realizations:  41%|████      | 103/250 [00:29<00:37,  3.91it/s]
RML looping over realizations:  42%|████▏     | 104/250 [00:29<00:38,  3.84it/s]
RML looping over realizations:  42%|████▏     | 105/250 [00:29<00:38,  3.80it/s]
RML looping over realizations:  42%|████▏     | 105/250 [00:29<00:41,  3.53it/s]

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[11], line 1
----> 1 ml_rml.solve(report=True, initalize=False)

File ~/checkouts/readthedocs.org/user_builds/pastas-plugins/envs/latest/lib/python3.11/site-packages/pastas/model.py:1017, in Model.solve(self, tmin, tmax, freq, warmup, noise, solver, report, initial, weights, fit_constant, freq_obs, initialize, **kwargs)
   1014     self.add_solver(solver=solver)
   1016 # Solve model
-> 1017 success, optimal, stderr = self.solver.solve(
   1018     noise=self._settings["noise"], weights=weights, **kwargs
   1019 )
   1020 if not success:
   1021     logger.warning("Model parameters could not be estimated well.")

File ~/checkouts/readthedocs.org/user_builds/pastas-plugins/envs/latest/lib/python3.11/site-packages/pastas_plugins/pest/solver.py:1769, in RandomizedMaximumLikelihoodSolver.solve(self, **kwargs)
   1755 if self.jacobian_method in ("2-point", "3-point"):
   1756     func = partial(
   1757         RandomizedMaximumLikelihoodSolver._least_squares_fd,
   1758         parameter_ensemble=self.parameter_ensemble,
   (...)   1766         noptmax=self.noptmax,
   1767     )
-> 1769     results = process_map(
   1770         func,
   1771         range(self.num_reals),
   1772         max_workers=self.num_workers,
   1773         desc="RML looping over realizations",
   1774         chunksize=1,
   1775     )
   1776     # Concatenate all parameter iterations with proper MultiIndex structure
   1777     parameter_iterations = pd.concat(
   1778         [r.parameter_iterations for r in results],
   1779         keys=[r.real for r in results],
   1780         names=["real", "iteration"],
   1781     )

File ~/checkouts/readthedocs.org/user_builds/pastas-plugins/envs/latest/lib/python3.11/site-packages/tqdm/contrib/concurrent.py:105, in process_map(fn, *iterables, **tqdm_kwargs)
    103     tqdm_kwargs = tqdm_kwargs.copy()
    104     tqdm_kwargs["lock_name"] = "mp_lock"
--> 105 return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)

File ~/checkouts/readthedocs.org/user_builds/pastas-plugins/envs/latest/lib/python3.11/site-packages/tqdm/contrib/concurrent.py:51, in _executor_map(PoolExecutor, fn, *iterables, **tqdm_kwargs)
     47 with ensure_lock(tqdm_class, lock_name=lock_name) as lk:
     48     # share lock in case workers are already using `tqdm`
     49     with PoolExecutor(max_workers=max_workers, initializer=tqdm_class.set_lock,
     50                       initargs=(lk,)) as ex:
---> 51         return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))

File ~/checkouts/readthedocs.org/user_builds/pastas-plugins/envs/latest/lib/python3.11/site-packages/tqdm/std.py:1181, in tqdm.__iter__(self)
   1178 time = self._time
   1180 try:
-> 1181     for obj in iterable:
   1182         yield obj
   1183         # Update and possibly print the progressbar.
   1184         # Note: does not call self.update(1) for speed optimisation.

File ~/.asdf/installs/python/3.11.12/lib/python3.11/concurrent/futures/process.py:620, in _chain_from_iterable_of_lists(iterable)
    614 def _chain_from_iterable_of_lists(iterable):
    615     """
    616     Specialized implementation of itertools.chain.from_iterable.
    617     Each item in *iterable* should be a list.  This function is
    618     careful not to keep references to yielded objects.
    619     """
--> 620     for element in iterable:
    621         element.reverse()
    622         while element:

File ~/.asdf/installs/python/3.11.12/lib/python3.11/concurrent/futures/_base.py:619, in Executor.map.<locals>.result_iterator()
    616 while fs:
    617     # Careful not to keep a reference to the popped future
    618     if timeout is None:
--> 619         yield _result_or_cancel(fs.pop())
    620     else:
    621         yield _result_or_cancel(fs.pop(), end_time - time.monotonic())

File ~/.asdf/installs/python/3.11.12/lib/python3.11/concurrent/futures/_base.py:317, in _result_or_cancel(***failed resolving arguments***)
    315 try:
    316     try:
--> 317         return fut.result(timeout)
    318     finally:
    319         fut.cancel()

File ~/.asdf/installs/python/3.11.12/lib/python3.11/concurrent/futures/_base.py:451, in Future.result(self, timeout)
    448 elif self._state == FINISHED:
    449     return self.__get_result()
--> 451 self._condition.wait(timeout)
    453 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
    454     raise CancelledError()

File ~/.asdf/installs/python/3.11.12/lib/python3.11/threading.py:327, in Condition.wait(self, timeout)
    325 try:    # restore state no matter what (e.g., KeyboardInterrupt)
    326     if timeout is None:
--> 327         waiter.acquire()
    328         gotit = True
    329     else:

KeyboardInterrupt: 
_ = ml_rml.plots.results_mosaic(stderr=True)
../_images/934a7b13a98bfc0895319a74bf22135b975d43434ee4923ed3e67857f6514b1b.png

Lets look at the posterior parameter distribution. As you can see the prior is sometimes pretty far away from the posterior. That has to do with how we sample the parameters. Pastas guesses a starting value for pmin, pmax and initial for each parameter (excluding constant_d). If dist= norm then the bounds are sampled as a normal distribution where min(initial-pmin, pmax-initial) is 2 times the standard deviation (par_sigma_range=4.0). This means that if initial is not halfway between pmin and pmax the full extent of the original parameter bounds of Pastas are not sampled. I’m also not sure if I’m fully allowed to call it the posterior.

axes = ml_rml.solver.parameter_ensemble_posterior.hist(bins=20, color="C1")

for i, ax in enumerate(axes.ravel()[:-1]):
    ml_rml.solver.parameter_ensemble_prior.iloc[:, [i]].hist(
        bins=20, ax=ax, zorder=0, color="C0"
    )
axes[0, 0].legend(["Posterior", "Prior"])
plt.tight_layout()
../_images/5d255da1f577de3b3fcad83e57473511fb78e452c380bcd23a55088fc8761090.png

We can also see that some of the posterior values are at bounds or other places. We can have a look at the convergence of realizations and the objective function value.

convergence_rml = ml_rml.solver.convergence_ensemble
print(convergence_rml.value_counts())
converged
True    250
Name: count, dtype: int64

According to SciPy, all realizations converged. If they did not converge (try minimize_method="TNC" for instance), we can get rid of those.

objfuncs_rml = ml_rml.solver.obj_func_ensemble.unstack().loc[convergence_rml].T
min_per_real = (
    ml_rml.solver.obj_func_ensemble.groupby("real").min().loc[convergence_rml]
)
bad_fit_reals = min_per_real[min_per_real > 10.0].index
good_fit_reals = min_per_real[min_per_real <= 10.0].index
f, axd = plt.subplot_mosaic(
    [["iter", "hist"]], figsize=(7.0, 3.0), layout="tight", width_ratios=[1.6, 1]
)
axd["iter"].plot(objfuncs_rml.index, objfuncs_rml.values, "-", color="k", alpha=0.3)
axd["iter"].plot(
    objfuncs_rml.loc[:, bad_fit_reals].index,
    objfuncs_rml.loc[:, bad_fit_reals].values,
    "-",
    color="C1",
    alpha=0.7,
)
axd["iter"].semilogy()
axd["iter"].set_xlabel("Iteration")
axd["iter"].set_ylabel("Objective Function Value")
_, bins, _ = axd["hist"].hist(min_per_real, bins=50, color="k", label="All reals")
axd["hist"].hist(
    min_per_real.loc[bad_fit_reals],
    bins=bins,
    color="C1",
    label="Reals with bad fits",
    alpha=0.5,
)
axd["hist"].set_xlabel("Minimum Objective\nFunction Value")
axd["hist"].set_ylabel("Number of realizations")
axd["hist"].legend()
<matplotlib.legend.Legend at 0x7d7f58838a50>
../_images/1ac20e44cd6259bd4d28b060e28c8d80f0148da1e7c0251d6cff0a9604c0bb77.png

Also looking at the fit of the realizations that did converge; there is a fraction that is in a local minimum. We can get rid of those.

ax = ml_rml.plot()
simulations_rml = ml_rml.solver.simulation_ensemble.loc[:, convergence_rml].loc[
    :, good_fit_reals
]
ax.fill_between(
    simulations_rml.index,
    simulations_rml.quantile(0.025, axis=1),
    simulations_rml.quantile(0.975, axis=1),
    alpha=0.2,
    color="C0",
)
<matplotlib.collections.FillBetweenPolyCollection at 0x7d7f58618050>
../_images/41dcb636c953db7b360656ac96995c560a3c246ed4edfc6d1e16e127dd43786a.png

Confidence interval is fairly small as is to be expected.

RML with empirical jacobian

There is also an option to try the randomized maximum likelihood solver with the emperical jacobian. Here the jacobian is computed from the parameter ensemble and the simulation ensemble. This method is still a bit experimental because the parameter update is done without lambda searching and lambda is just guessed via lambda = 1e-7 * max(diag(J.T@J)). Also there is no stopping criteria so the number of optimization steps has to be chosen by hand (noptmax).

# test empirical jacobian
ml_rml_em = get_model(head)
solver = RandomizedMaximumLikelihoodSolver(
    num_reals=250,
    jacobian_method="empirical",
    noptmax=100,
    tol=1e-8,
)
ml_rml_em.add_solver(solver)
ml_rml_em.solver.initialize(standard_deviation=noise_std, method="norm")
ml_rml_em.solve(report=True, initialize=False)
DEBUG:pastas.model:[Errno -25] Unknown error -25
RML looping over noptmax:  37%|███▋      | 37/100 [01:21<02:18,  2.19s/it]
Fit report synthetic_model                        Fit Statistics
================================================================
nfev     37                     EVP                        97.94
nobs     783                    R2                          0.98
noise    False                  RMSE                        0.03
tmin     1990-01-02 00:00:00    AICc                    -5762.15
tmax     2019-12-24 00:00:00    BIC                     -5738.91
freq     D                      Obj                         0.03
freq_obs None                   ___                             
warmup   3650 days 00:00:00     Interp.                       No
solver   RandomizedMaximumLikelihoodSolverweights                      Yes

Parameters (5 optimized)
================================================================
               optimal     initial  vary
rch_A         0.500026    0.146585  True
rch_n         1.097864    1.000000  True
rch_a       251.028050  100.000000  True
rch_f        -0.807683   -1.000000  True
constant_d    5.008110    5.505771  True

ml_rml.parameters
initial pmin pmax vary name dist stderr optimal
rch_A 0.146585 0.000010 14.658474 True rch uniform 0.076473 0.499872
rch_n 1.000000 0.100000 5.000000 True rch uniform 0.079312 1.098125
rch_a 100.000000 0.010000 10000.000000 True rch uniform 38.796115 250.822144
rch_f -1.000000 -2.000000 0.000000 True rch uniform 0.183148 -0.807632
constant_d 5.505771 4.805792 6.205749 True constant uniform 0.077206 5.008213
axes = ml_rml_em.solver.parameter_ensemble_posterior.hist(bins=20, color="C1")

for i, ax in enumerate(axes.ravel()[:-1]):
    ml_rml_em.solver.parameter_ensemble_prior.iloc[:, [i]].hist(
        bins=20, ax=ax, zorder=0, color="C0"
    )
axes[0, 0].legend(["Posterior", "Prior"])
plt.tight_layout()
../_images/7e66f18e228fa5153c592065ccebe42bcb805b1b0543b2327e37d218381ac7b6.png
objfuncs_rml_em = ml_rml_em.solver.obj_func_ensemble.unstack().loc[convergence_rml].T
min_per_real_em = (
    ml_rml_em.solver.obj_func_ensemble.groupby("real").min().loc[convergence_rml]
)
f, axd = plt.subplot_mosaic(
    [["iter", "hist"]], figsize=(7.0, 3.0), layout="tight", width_ratios=[1.6, 1]
)
axd["iter"].plot(
    objfuncs_rml_em.index, objfuncs_rml_em.values, "-", color="k", alpha=0.3
)

axd["iter"].semilogy()
axd["iter"].set_xlabel("Iteration")
axd["iter"].set_ylabel("Objective Function Value")
_ = axd["hist"].hist(min_per_real_em, bins=50, color="k", label="All reals")
axd["hist"].set_xlabel("Minimum Objective\nFunction Value")
axd["hist"].set_ylabel("Number of realizations")
axd["hist"].legend()
<matplotlib.legend.Legend at 0x7d7f5b744410>
../_images/4330462742260eb33074b1a5f6678f85c8d4f02039b4eb5bfe6e4f3740ecfc11.png
simulations_rml_em = ml_rml_em.solver.simulation_ensemble
ax = ml_rml_em.plot()
ax.fill_between(
    simulations_rml_em.index,
    simulations_rml_em.quantile(0.025, axis=1),
    simulations_rml_em.quantile(0.975, axis=1),
    alpha=0.2,
    color="C0",
)
<matplotlib.collections.FillBetweenPolyCollection at 0x7d7f584f2fd0>
../_images/51ddd9db09ef87bfbc108ecd697beebde344423a92f9379e141e9675d3e4af92.png

Compare CI

Lets compare the confidence for the different solvers. It can be clearly seen that the confidnece intervalls are fairly similar for the three different solvers. For the emperical jacobian there are some bigger differences that are observed but pretty much negligable.

ls_ci = pd.DataFrame(
    [
        ml_ls.parameters["optimal"] - 1.96 * ml_ls.parameters["stderr"],
        ml_ls.parameters["optimal"],
        ml_ls.parameters["optimal"] + 1.96 * ml_ls.parameters["stderr"],
    ],
    index=[0.025, 0.5, 0.975],
)
rml_ci = (
    ml_rml.solver.parameter_ensemble_posterior.loc[(convergence_rml, slice(None)), :]
    .loc[good_fit_reals]
    .quantile([0.025, 0.5, 0.975])
)
rmlem_ci = ml_rml_em.solver.parameter_ensemble_posterior.quantile([0.025, 0.5, 0.975])
df = pd.concat([ls_ci, rml_ci, rmlem_ci], keys=["LS", "RML-2P", "RML-EM"], axis=1).T
df.swaplevel(0, 1).sort_index().T.round(3)
constant_d rch_A rch_a rch_f rch_n
LS RML-2P RML-EM LS RML-2P RML-EM LS RML-2P RML-EM LS RML-2P RML-EM LS RML-2P RML-EM
0.025 4.990 4.989 4.990 0.492 0.492 0.493 242.486 242.457 243.739 -0.823 -0.820 -0.821 1.083 1.081 1.081
0.500 5.008 5.008 5.008 0.500 0.500 0.500 250.819 250.704 250.980 -0.808 -0.807 -0.807 1.098 1.098 1.098
0.975 5.026 5.023 5.023 0.508 0.508 0.508 259.152 259.591 259.420 -0.792 -0.791 -0.791 1.113 1.113 1.112

Simulations and confidence intervals in figures also all overlap.

f, ax = plt.subplots(figsize=(10, 6))
ax.plot(
    ml_ls.observations(), label="Observations", color="k", linestyle="None", marker="."
)
ax.plot(ml_ls.simulate(), color="C0", linestyle="-", label="LS")
ax.plot(ml_rml.simulate(), color="C1", linestyle="--", label="RML-2P")
ax.plot(ml_rml_em.simulate(), color="C2", linestyle=":", label="RML-EM")
ax.legend()
<matplotlib.legend.Legend at 0x7d7f584e2990>
../_images/9b6a717c8ec05a2de106a18f68298c62251abcb02ea756f1ccea18303d599a08.png
f, ax = plt.subplots(figsize=(10, 6))
ax.plot(
    ml_ls.observations(), label="Observations", color="k", linestyle="None", marker="."
)
ax.fill_between(
    simulations_rml.index,
    simulations_rml.quantile(0.025, axis=1),
    simulations_rml.quantile(0.975, axis=1),
    alpha=1.0,
    color="C0",
    label="RML-2P 95% CI",
)
ax.fill_between(
    simulations_rml_em.index,
    simulations_rml_em.quantile(0.025, axis=1),
    simulations_rml_em.quantile(0.975, axis=1),
    alpha=0.8,
    color="C2",
    label="RML-EM 95% CI",
)
ci = ml_ls.solver.ci_simulation(n=100)
ax.fill_between(
    ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color="C1", label="LS 95% CI", alpha=0.5
)
ax.legend()
<matplotlib.legend.Legend at 0x7d7f0d203750>
../_images/df431a0eca9032c3612ad382adebffbb7a1fea906f0703a9df9fda384d9d3a00.png