Setting up TimeSeriesTransformer

I am trying to compare the performance of SequenceNetwork and TimeSeriesTransformer for summary networks for time series data (e.g. the simulation model generates 3 time series of 300 time points each), but have a hard time setting up the TimeSeriesTransformer correctly and can’t find any examples online. Any tips or examples on how to set up this network structure effectively? Thanks!

1 Like


What’s the exact output shape of your simulator? If I understand you correctly, I would assume (batch_size, 300, 3).

Do the 300 time steps of all three time series correspond to the same points in time? In other words: Does the 50th time step of all time series happen simulateneously in the real world?

Yes, exactly as you guessed, think about data on times 1…300 for three different variables in the model, e.g. cases, deaths, and hospitalizations in an epidemic. Thanks Marvin!

Thanks for the info! That means you just have to add a (linear) time embedding to the last axis of the data, so that you get (batch_size, 300, 4).

You can do that in the configurator so that your generative model remains a pure implementation of your scientific model.

I’ll send you a code snippet, probably by tomorrow.

def configurator(input_dict):
   out_dict = {}

   # prior draws are the parameters we want to estimate
   theta = input_dict['prior_draws']
   out_dict['parameters'] = theta

   x = input_dict['sim_data']
   # add time encoding to the data x
   batch_size, num_timesteps, data_dim = x.shape
   time_encoding = np.linspace(0, 1, num_timesteps)
   time_encoding_batched = np.tile(time_encoding[np.newaxis, :, np.newaxis], (batch_size, 1, 1))

   out_dict['summary_conditions'] = np.concatenate((x, time_encoding_batched), axis=-1)

   return out_dict

Thanks so much Marvin; I will try this. Also the TimeSeriesTransformer seems to have a couple of optional settings (attention_settings and dense_settings); do you typically find value in customizing/fine-tuning those?

You can try tuning those if default performance is not satisfying. The most useful hyperparameters in my mind would be the dimensionality of the template template_dim, the actual number of summary statistics, summary_dim, and the number of heads for Multihead Attention.

1 Like

Thanks; I will experiment with those. In the meantime, there seems to be some inconsistency in the TimeSeriesTransformer when setting ‘bidirectional’ flag to True, leading to an error about mismatch of vector sizes somewhere in the network setup.

Nice spot! This has been fixed in the Development branch, but hasn’t made it into the official new release yet. You can run the following to get the development version without cracking up the already installed dependencies:

pip uninstall bayesflow
pip install git+ --no-deps
1 Like