Setting up TimeSeriesTransformer

hazhir · February 12, 2024, 4:57pm

I am trying to compare the performance of SequenceNetwork and TimeSeriesTransformer for summary networks for time series data (e.g. the simulation model generates 3 time series of 300 time points each), but have a hard time setting up the TimeSeriesTransformer correctly and can’t find any examples online. Any tips or examples on how to set up this network structure effectively? Thanks!

marvinschmitt · February 12, 2024, 7:48pm

Hi,

What’s the exact output shape of your simulator? If I understand you correctly, I would assume (batch_size, 300, 3).

Do the 300 time steps of all three time series correspond to the same points in time? In other words: Does the 50th time step of all time series happen simulateneously in the real world?

hazhir · February 12, 2024, 9:19pm

Yes, exactly as you guessed, think about data on times 1…300 for three different variables in the model, e.g. cases, deaths, and hospitalizations in an epidemic. Thanks Marvin!

marvinschmitt · February 12, 2024, 9:53pm

Thanks for the info! That means you just have to add a (linear) time embedding to the last axis of the data, so that you get (batch_size, 300, 4).

You can do that in the configurator so that your generative model remains a pure implementation of your scientific model.

I’ll send you a code snippet, probably by tomorrow.

marvinschmitt · February 13, 2024, 10:31am

def configurator(input_dict):
   out_dict = {}

   # prior draws are the parameters we want to estimate
   theta = input_dict['prior_draws']
   out_dict['parameters'] = theta

   x = input_dict['sim_data']
 
   # add time encoding to the data x
   batch_size, num_timesteps, data_dim = x.shape
   time_encoding = np.linspace(0, 1, num_timesteps)
   time_encoding_batched = np.tile(time_encoding[np.newaxis, :, np.newaxis], (batch_size, 1, 1))

   out_dict['summary_conditions'] = np.concatenate((x, time_encoding_batched), axis=-1)

   return out_dict

hazhir · February 13, 2024, 12:04pm

Thanks so much Marvin; I will try this. Also the TimeSeriesTransformer seems to have a couple of optional settings (attention_settings and dense_settings); do you typically find value in customizing/fine-tuning those?

KLDivergence · February 13, 2024, 1:34pm

You can try tuning those if default performance is not satisfying. The most useful hyperparameters in my mind would be the dimensionality of the template template_dim, the actual number of summary statistics, summary_dim, and the number of heads for Multihead Attention.

hazhir · February 14, 2024, 2:26pm

Thanks; I will experiment with those. In the meantime, there seems to be some inconsistency in the TimeSeriesTransformer when setting ‘bidirectional’ flag to True, leading to an error about mismatch of vector sizes somewhere in the network setup.

KLDivergence · February 14, 2024, 10:46pm

Nice spot! This has been fixed in the Development branch, but hasn’t made it into the official new release yet. You can run the following to get the development version without cracking up the already installed dependencies:

pip uninstall bayesflow
pip install git+https://github.com/stefanradev93/bayesflow@Development --no-deps

Topic		Replies	Views
Attention mask for the TimeSeriesTransformer summary network General	5	185	December 15, 2023
Adding manual summary statistics to summary network General	5	161	August 6, 2024
Preferred way to deal with time series with non-equidistant time steps General	10	192	August 8, 2024
Time Series Dimension Errors in HierarchicalNetwork General	3	64	January 17, 2025
SetTransformer Dimensions General	1	26	March 14, 2025

Setting up TimeSeriesTransformer

Related topics