Simple negative binomial inference

Dear BayesFlow team,

I am trying to get a feel for how to work with the codebase by using a very simple example related to the real inference I want to work on. The simple exercise consists of training a neural network (NN) to perform inference over the parameters of a negative binomial distribution. There are many moving parts and options that I do not understand, and none of the examples in the documentation helped me overcome the bump. Here is the basic setup:

After the imports, I define my prior function as

def prior_fn():
    # Prior on p for negative binomial parameter
    p = RNG.beta(1, 10)
    # Prior on log of r parameter
    logr = RNG.normal(0, 1)

    # Return the prior
    return np.float32(np.array([p_hat, logr]))

with the corresponding simulation.Prior object

prior = bf.simulation.Prior(
    prior_fun=prior_fn,
    param_names=param_names,
)

Next, I define the likelihood function in which I convert the integer outputs to float32. For now, I’m using a fixed number of observations (100), but I would like to later understand how to train the NN varying this number.

def likelihood_fn(params, n_obs=100):
    # Unpack parameters
    p = params[0]
    r = np.exp(params[1])

    # Sample from the negative binomial distribution
    u = RNG.negative_binomial(r, p, size=n_obs)

    # Add an extra dimension to make the output a 3D tensor
    u = np.expand_dims(u, axis=1)

    return np.float32(u)

Again, I define the corresponding Simulator and GenerativeModel objects

# Define Likelihood simulator function for BayesFlow
simulator = bf.simulation.Simulator(simulator_fun=likelihood_fn)

# Build generative model
model = bf.simulation.GenerativeModel(prior, simulator)

I define the summary network as a DeepSet (but I have also tried a SetTransforme without much success)

# Define summary network as a Deepset
summary_net = bf.networks.DeepSet(summary_dim=4)

For the inference NN, I don’t have a good intuition of which arguments I should modify or not. So far, one of the failed attempts I have looks something like this:

# Define the conditional invertible network with affine coupling layers
inference_net = bf.inference_networks.InvertibleNetwork(
    num_params=prior(1)["prior_draws"].shape[-1],
    num_coupling_layers=6,
    coupling_design="affine",
    permutation="learnable",
    use_act_norm=True,
    coupling_settings={
        "mc_dropout": True,
        "dense_args": dict(units=32, activation="relu"),
    }
)

I put everything together

# Assemble the amoratizer that combines the summary network and inference
# network
amortizer = bf.amortizers.AmortizedPosterior(
    inference_net, summary_net,
)

# Assemble the trainer with the amortizer and generative model
trainer = bf.trainers.Trainer(amortizer=amortizer, generative_model=model)

and train the model using

# Train the model
history = trainer.train_online(
    epochs=20,
    iterations_per_epoch=128,
    batch_size=64,
    validation_sims=200,
)

However, as shown in the figure below, it doesn’t matter what I try—albeit with very little understanding of the specific details of what each argument does—the loss function explodes into some non-sensical values to quickly come back.

I am sure that I am doing something naively wrong, but I have struggled enough with it to ask for help. So, I hope you can guide me to what I’m doing wrong here.

Thanks in advance for your invaluable help!

Dear Manuel,

welcome to the BayesFlow forum! Thanks for providing code along with your question. I reproduced your setup (only change: using default DeepSet and InvertibleNetwork architectures for simplicitiy) and while I observed initial loss spikes, the training nevertheless quickly converged to a stable solution with nice diagnostics:



I think the issue that causes the exploding loss in the beginning of the training are the extreme outliers produced by your generative model (e.g. stds around 13,000 while means are around 200; see also worse recovery for higher values of r). During the training, some of the minibatches then have extreme values present. You could stabilize it by using standardization or normalization of the input data via a custom configurator (see here) - but if you use simulation-based estimation (of for example data mean and std) that is generalizable to more complex settings, this would also be a bit tricky and require very large simulation sizes due to the tail behavior of your simulator. So in case the model itself is modifiable, constraining the prior to produce a less extreme tail behavior would be the easiest solution here.

One small note: prior_fn currently expects p_hat in the return statement, but only p is defined before.

Hope that helps!
Lasse

2 Likes

Hey Lasse,

Thank you so much for such a quick response! This is a total game-changer. I tried changing my prior to something more reasonable, which now seems to work for this simple example.

Now, I’m ready to try the real inference I’m after. I might come back to the forum if I get stuck again. But this is already incredibly promising!

Thanks again for your help, and congratulations to the whole team on the library. The idea is really neat, and I’m glad to see this principled application of neural networks for such an important application.

3 Likes