Cannot do offline training with summary network

Hello!

I am using BayesFlow for the first time, and having some issues doing offline training with a summary network (everything works fine if I exclude it and just supply the data to the inference network). Basically, I am simulating from an agent-based model in R, and then bringing the data (13 parameters and 1921 output features) into Python so that it has the format required by BayesFlow.

{'prior_non_batchable_context': None, 'prior_batchable_context': None, 'prior_draws': array([[ 0.76718193, -0.87377334,  0.22366658, ..., -0.39844744,
        -0.35801412,  0.04897532],
       [ 1.30669447,  0.66729612,  1.7323655 , ..., -1.21459231,
         0.74958571, -0.25844436],
       [ 0.90680171,  1.29136708,  0.70601619, ..., -1.28499242,
         0.98797805,  0.6543645 ],
       ...,
       [-1.57379467, -0.29633931, -0.97821418, ..., -2.09261737,
        -0.91399776,  1.30333936],
       [ 0.45297908, -0.91439699,  0.79108664, ..., -0.16306107,
        -0.50990459, -0.60457136],
       [-0.80346153,  0.05948135, -1.28584127, ...,  0.532937  ,
         0.54166668, -0.12752289]]), 'sim_non_batchable_context': None, 'sim_batchable_context': None, 'sim_data': array([[ 0.44540442,  0.40116571, -0.54365648, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [-0.84753774, -1.34450615, -1.15397759, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [ 0.44540442,  0.40116571,  1.01192892, ..., -0.01023475,
        -0.01002833, -0.01001542],
       ...,
       [-0.19984821, -0.77635165, -1.15397759, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [-0.60475894, -0.00569413, -1.00236414, ..., -0.01023475,
        -0.01002833, -0.01001542],
       [ 0.44540442,  0.40116571,  1.01192892, ..., -0.01023475,
        -0.01002833, -0.01001542]])}

Here is my configure input script:

#define function that configures the input for the amortizer
def configure_input(forward_dict):
    #prepare placeholder dict
    out_dict = {}

    #add to keys
    out_dict["parameters"] = forward_dict["prior_draws"].astype(np.float32)
    out_dict["summary_conditions"] = forward_dict["sim_data"].astype(np.float32)
    out_dict["direct_conditions"] = None

    #return the output dictionary
    return out_dict

Then, if I run the following code:

summary_net = bf.networks.DeepSet()
inference_net = bf.networks.InvertibleNetwork(num_params = 13)
amortized_posterior = bf.amortizers.AmortizedPosterior(inference_net, summary_net)
trainer = bf.trainers.Trainer(amortizer = amortized_posterior, configurator = configure_input, memory = True)
offline_training = trainer.train_offline(simulations_dict = data, epochs = 2, batch_size = 32)

I get the following error:

ValueError: Exception encountered when calling layer 'sequential_549' (type Sequential).
                    
                    Input 0 of layer "dense_1530" is incompatible with the layer: expected min_ndim=2, found ndim=1. Full shape received: (64,)
                    
                    Call arguments received by layer 'sequential_549' (type Sequential):
                      • inputs=tf.Tensor(shape=(64,), dtype=float32)
                      • training=True
                      • mask=None

Do you know what the issue might be?

2 Likes

Hey @masonyoungblood,

The DeepSet summary network assumes that your data has the shape (num_simulations, num_observations, num_data_dimensions), and also that each simulation consists of num_observations IID realizations. My intuition is that this is not the case for your model, but please correct me if I am wrong.

I believe your data currently has the shape (num_simulations, 1921), so you may want to use a simple fully connected network as a summary network, unless there is a special structure to the 1921 features. Note also, that there is a good idea to have more summary outputs than model parameters, otherwise the inference network may not have enough information to infer the parameters. For the custom summary nets in BayesFlow, this can be set via the summary_dim argument.

Edit: Make sure you train considerably longer than 2 epochs for the actual application. :slight_smile:

3 Likes

Thank you for the quick reply!

Okay this makes sense. My data does currently have the shape (num_simulations, 1921). It has more structure than that (it is a set of observations from different years with varying size), but I’m hoping to keep it simple for now while I learn BayesFlow.

How would one implement a simpler network? I can’t find anything in summary_networks.py, including for custom networks.

Definitely planning to use more epochs, two is just for troubleshooting :slight_smile:

1 Like

Here is a simple example of a custom fully-connected network which will learn 32 summary statistics from your data. I have also added some dropout between the layers to control for overfitting.

import tensorflow as tf
summary_net = tf.keras.Sequential([
        tf.keras.layers.Dense(512, activation="relu"),
        tf.keras.layers.Dropout(0.05),
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dropout(0.05),
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dropout(0.05),
        tf.keras.layers.Dense(32)
 ])

Edit: BayesFlow interacts with TensorFlow, so you can use any feature from TensorFlow in these custom networks.

2 Likes

Thank you very much for the help, this is working now!

I’ve also figured out how to format my data for offline training with the TimeSeriesTransformer architecture, but I have one question about the documentation. summary.py reads:

Important: Assumes that positional encodings have been appended to the input time series.

Can you provide can example of what this would look like in practice?

Awesome!

This is not a strict requirement for the most recent transformer, as it will try to learn the positional encodings adaptively. However, it may be something to try if you feel you can squeeze out more performance: Add an additional dimension to the data denoting time indices from 0 [min_time] to 1[max_time]. Something along the lines of:

time_idx = [np.linspace(0, 1, num_time_points, dtype=np.float32)[:, None]]*num_sims
sim_data = np.c_[sim_data, time_idx]

2 Likes