Preferred way to deal with time series with non-equidistant time steps

I am currently testing bayesflow with different ODE-based models. Recently I turned my attention to model outputs with varying number of outputs and varying time step points. For example one model output could be at times t=1,3,4,5 and another output could use the times t =2,3. Is there a preferred way to deal with such data? Extending the data to uniform timesteps (t=1,2,3,4,5 in this example) might lead to undesirable behavior. Has anyone encountered a similiar problem and can offer some advice?

Hi, welcome to the BayesFlow Forums!

I would suggest adding the original time information (t=1,3,4,...) as a dimension on the last axis of your data, and then using a TimeSeriesTransformer as a summary network. If you need support setting this up, you can provide a minimal reproducible example and I’m happy to help.


PS: You can also experiment with custom time encodings (e.g., sinusoidal or some learned time embedding), but I wouldn’t recommend starting with that if the simple time+transformer approach works.

Thanks for the welcome and the quick response! This is really helpful.I think I didn’t get this across in my initial post but my output is two time-series with potentially different time data. So a time series with t=1,3,4,5 and one time series with t = 2 and 3. Is it possible to adapt the TimeSeriesTransformer for this purpose? A minimal reproducible example (if possible) would indeed be nice.

Thanks for the clarification. Just to make sure: Given one parameter vector \theta, your model outputs:

  • x with t_x=1,3,4,5 and also
  • y with t_y=2,3,

making your simulator have a signature like \theta\mapsto[x,y]?

Or is it \theta\mapsto x but sometimes t_x=1,3,4,5 and sometimes t_x=2,3?

If you indeed have 2 time series outputs [x, y] from a single call to the simulator, then you could make use of our recent work on multimodal fusion for neural posterior estimation (Link to preprint). Once we have completely clarified your exact problem setting, I’m happy to help more!

1 Like

For now I want to restrict my application to the former case. That is, I have one output indeed an two outputs x and y with t_x = 1,3,4,5 and t_y = 2,3. I have taken a look at the preprint, it seems exactly like what I need. A small example would also be very helpful!

Thank you for taking the time helping me!

Hi, apologies for the delay. I set up a small example for you. The setup is very simple:

  • 2-dimensional target parameter \theta\sim\mathrm{Normal}(0, 1)
  • source x: random walk with drift \theta, noise \sigma=0.3, 10 discretization steps, but only t_x=5 randomly selected observations
  • source y: random walk with drift \theta, noise \sigma=0.8, 10 discretization steps but only t_y=3 randomly selected observations


The shapes for a batch of forward samples are:

  • \theta: (batch_size, 2)
  • x: (batch_size, 5, 3)
  • y: (batch_size, 3, 3)

Both x and y are 3-dimensional in the last axis because we have two data dimensions (2D random walk) and one time dimension which contains the actual time stamps of the observations.

Summary networks

Both sources are processed with a TimeSeriesTransformer, which learns 10 summary dimensions each. 10+10=20 summaries are likely an overkill, but that’s not the point here :slight_smile:

General thoughts

I don’t know about the exact structure of your data, but it might make sense to share information between the summary networks. Also, the specifics of the networks (number of layers, units per layer, …) and regularization (in case of offline learning on a fixed data set) are the first things I’d change for an actual application. If you want additional eyes on your specific modeling example, let me know (here or via email).

Link to notebook

Here’s the Google Colab link for you to clone and play with it: Google Colab


Thank you very much! I was on a short trip so also apologies for the delay. I will play around with it and if I have questions I might have to impose on you again.