Dealing with time series (SDE model)

Hi,

I am trying to come up with a bf framework for estimation of the Ornstein-Uhlenbeck parameters (and eventually a hierarchical extension). The parameter recovery for 2 (out of 3) parameters is not so nice. Wondering if you have suggestions on how to go about it. Sharing here a colab link of what I have so far. I’ve tried inference using JAGS and considering that as baseline performance. Here’s a screenshot of JAGS result:

The BF recovery is nowhere near to JAGS:

2 Likes

Welcome to the BayesFlow Forums, thanks for posting your question.

I am currently looking into the notebook and will test some tweaks regarding the NN settings and training hyperparameters.

In the meantime, could you please verify that you’re correctly standardizing/destandardizing the parameters through your configure/deconfigure methods and plot the variables on the correct scales?

Best,
Marvin

One remark: For standardizing the parameters, you’re calling prior.estimate_means_and_stds(n_draws=1000) inside the configurator. That means you draw 1000 prior samples in every single batch, which has three issues:

  • It slows down the forward simulation a lot
  • You standardize every batch with different values.
  • (similar to 2) You standardize the test set differently than the train set

I suggest computing prior_means, prior_stds once at the start of the script and then using these values to standardize all training and test sets. I’ll add that to my test clone of your Colab.

1 Like

Oh thanks very much for catching that. The codes in the colab I shared was modified a little (since I was experimenting with different things). In the original code, after running once I stored the prior means and stds as attribute of the generative model class which the configure method (within that class) should pick up.

1 Like

Ok, great! The Colab code also stored prior_means, prior_stds as attributes but kept overriding them in every configurator call. So the test set used the mean/std values of the last training batch.

Here’s a sneak peek into an initial adjustment that drastically improves recovery on \sigma. I’ll continue running some tweaks and will send you an update ASAP.

Great thanks so much!

1 Like

I changed some things and the recovery started looking noticeably better. You could maybe use a “Jupyter Notebook Diff” tool to automatically get a visualization of the code changes I made.

Some noteworthy adjustments:

  • Standardize the time to [0, 1]
  • Move computation of parameter standardization to the __init__ such that we have constant standardization parameters across the entire workflow
  • Use affine coupling flow (maybe try 'spline' as well)
  • Deactivate kernel regularization and dropout, there’s no need to do that if you’re doing online training
  • Train for longer (probably the most important bit)
  • I used 10 summary dimensions for testing, but better summary network settings may likely make a difference in recovery. That might be the next thing to try IMHO

I hope that helped a bit. Please feel free to ask more questions and post updates!

Cheers,
Marvin

@marvinschmitt Thanks for the good inputs! Where can I access the final state of your notebook?

1 Like

@marvinschmitt Thanks for the good inputs! Where can I access the final state of your notebook?

It’s at the bottom of my last post, here’s the link again:

1 Like

Hi Marvin,

That looks good. I noticed this result in my experimentation too, high values of beta is not recovered well. Do you suggest longer training?

Also, do you have an intuition why “affine” worked better than “interleaved”?

1 Like

high values of beta is not recovered well.

The recovery of \beta starts to really struggle beyond 2 standard deviations of the truncated normal prior. Based on a rough back-of-the-envelope calculation for that truncated normal distribution, that corresponds to <10% of the training examples. The really bad performance around 4 standard deviations can be explained by these values being very very rare. I suspect that the dynamics in the OU process make extrapolation on \beta very hard – after all, \beta occurs inside an exponential in the equations.

Do you suggest longer training?

Generally, yes. The bulk of the space should be rather quick to learn, but the tails on \beta and \sigma need quite long training. You could also consider adjusting the priors, if that is compatible with eventual theory you’re guided by.

Also, do you have an intuition why “affine” worked better than “interleaved”?

I didn’t experiment with this setting in isolation. I generally prefer starting with the “pure” layouts (spline or affine) and increase complexity from there, but that’s purely based on ~vibes~. Maybe @KLDivergence has some intuition on tweaking settings further?

Cheers,
–Marvin

2 Likes

@KLDivergence suggested offline training (for speed improvements) and then just training for more epochs. The recovery on \beta further improves! I have updated the Colab (link above).

Note: Maybe training for even more epochs would further improve recovery. Just remember to plot the validation losses and beware of overfitting. If you observe any overfitting, turning regularization/dropout on should help.

1 Like

Yes, transformers especially profit from longer training times which shouldn’t be a problem in our amortized context. Here as some further suggestions for hyperparameters:

Inference network:

coupling_design = "spline"

Summary network:

bidirectional=True,
template_dim=256
3 Likes