Warning during online training leading to NaN loss

Hi!

I’m currently working on the dev branch and training a FlowMatching Inference network using an OnlineDataset. However, I occasionally encounter the following warnings:

/data/homes/reiser/.conda/envs/sabi_env/lib/python3.11/site-packages/bayesflow/utils/numpy_utils.py:7: RuntimeWarning: divide by zero encountered in log
  return np.log(x / (1 - x))
/data/homes/reiser/.conda/envs/sabi_env/lib/python3.11/site-packages/numpy/core/_methods.py:173: RuntimeWarning: invalid value encountered in subtract
  x = asanyarray(arr - arrmean)
/data/homes/reiser/.conda/envs/sabi_env/lib/python3.11/site-packages/bayesflow/adapters/transforms/standardize.py:86: RuntimeWarning: invalid value encountered in subtract  
return (data - mean) / std

This sometimes occurs after 71 epochs, and other times after 300 epochs, eventually leading to NaN values in the loss.

Do you have any suggestions on how to resolve this issue?
Thanks in advance!

Here is an example where this issue occurred for me in Epoch 71:

import os

if "KERAS_BACKEND" not in os.environ:
    # set this to "torch", "tensorflow", or "jax"
    os.environ["KERAS_BACKEND"] = "torch"

import bayesflow as bf
import torch as to
import keras
import numpy as np

# set random seed
to.manual_seed(0)
np.random.seed(0)

def prior():
    # beta: regression coefficients (intercept, slope)
    w = np.float32(np.random.uniform(0.6, 1.4))
    return dict(w=w)

def likelihood(w):
    N = 10
    # x: predictor variable
    x = np.float32(np.random.uniform(1, 200, N))
    # y: response variable
    y = np.random.normal(w * np.log(x) + 0.01 * x + 1 + np.sin(0.05 * x), 0.1, size=N)
    return dict(y=y, x=x)

simulator = bf.simulators.make_simulator([prior, likelihood])

adapter = (
    bf.Adapter()
    .constrain("w", lower=0.6, upper=1.4)
    .as_set(["x", "y"])
    .standardize()
    .concatenate(["w"], into="inference_variables")
    .concatenate(["x", "y"], into="summary_variables")
)
inference_network = bf.networks.FlowMatching()
summary_network = bf.networks.DeepSet(depth=2)
approximator = bf.ContinuousApproximator(
   inference_network=inference_network,
   summary_network=summary_network,
   adapter=adapter,
)
epochs = 500
num_batches = 100
batch_size = 64
optimizer = keras.optimizers.Adam(learning_rate=5e-4, clipnorm=1.0)
approximator.compile(optimizer=optimizer)
history = approximator.fit(
    epochs=epochs,
    num_batches=num_batches,
    batch_size=batch_size,
    simulator=simulator,
)

Hey Philipp, this seems to be coming from the Adapter. I suspect that the constrain method is to blame. Could you try removing it and see if the warnings disappear?

Looks like the inverse sigmoid transform from the adapter.constrain() is at fault here. When the sampled w is sufficiently close to 0.6 or 1.4, we get division by zero and infinite return values.

Since we don’t support making the boundaries inclusive yet, can you please add a small value to either end of the constraint and check if this helps:

epsilon = 1e-9
adapter.constrain("w", lower=0.6 - epsilon, upper=1.4 + epsilon)
3 Likes

Adding the small epsilon solved the problem. Thank you!