Warning during online training leading to NaN loss

philippreiser · February 25, 2025, 12:56pm

Hi!

I’m currently working on the dev branch and training a FlowMatching Inference network using an OnlineDataset. However, I occasionally encounter the following warnings:

/data/homes/reiser/.conda/envs/sabi_env/lib/python3.11/site-packages/bayesflow/utils/numpy_utils.py:7: RuntimeWarning: divide by zero encountered in log
  return np.log(x / (1 - x))
/data/homes/reiser/.conda/envs/sabi_env/lib/python3.11/site-packages/numpy/core/_methods.py:173: RuntimeWarning: invalid value encountered in subtract
  x = asanyarray(arr - arrmean)
/data/homes/reiser/.conda/envs/sabi_env/lib/python3.11/site-packages/bayesflow/adapters/transforms/standardize.py:86: RuntimeWarning: invalid value encountered in subtract  
return (data - mean) / std

This sometimes occurs after 71 epochs, and other times after 300 epochs, eventually leading to NaN values in the loss.

Do you have any suggestions on how to resolve this issue?
Thanks in advance!

philippreiser · February 25, 2025, 1:14pm

Here is an example where this issue occurred for me in Epoch 71:

import os

if "KERAS_BACKEND" not in os.environ:
    # set this to "torch", "tensorflow", or "jax"
    os.environ["KERAS_BACKEND"] = "torch"

import bayesflow as bf
import torch as to
import keras
import numpy as np

# set random seed
to.manual_seed(0)
np.random.seed(0)

def prior():
    # beta: regression coefficients (intercept, slope)
    w = np.float32(np.random.uniform(0.6, 1.4))
    return dict(w=w)

def likelihood(w):
    N = 10
    # x: predictor variable
    x = np.float32(np.random.uniform(1, 200, N))
    # y: response variable
    y = np.random.normal(w * np.log(x) + 0.01 * x + 1 + np.sin(0.05 * x), 0.1, size=N)
    return dict(y=y, x=x)

simulator = bf.simulators.make_simulator([prior, likelihood])

adapter = (
    bf.Adapter()
    .constrain("w", lower=0.6, upper=1.4)
    .as_set(["x", "y"])
    .standardize()
    .concatenate(["w"], into="inference_variables")
    .concatenate(["x", "y"], into="summary_variables")
)
inference_network = bf.networks.FlowMatching()
summary_network = bf.networks.DeepSet(depth=2)
approximator = bf.ContinuousApproximator(
   inference_network=inference_network,
   summary_network=summary_network,
   adapter=adapter,
)
epochs = 500
num_batches = 100
batch_size = 64
optimizer = keras.optimizers.Adam(learning_rate=5e-4, clipnorm=1.0)
approximator.compile(optimizer=optimizer)
history = approximator.fit(
    epochs=epochs,
    num_batches=num_batches,
    batch_size=batch_size,
    simulator=simulator,
)

KLDivergence · February 25, 2025, 1:32pm

Hey Philipp, this seems to be coming from the Adapter. I suspect that the constrain method is to blame. Could you try removing it and see if the warnings disappear?

LarsKue · February 25, 2025, 1:40pm

Looks like the inverse sigmoid transform from the adapter.constrain() is at fault here. When the sampled w is sufficiently close to 0.6 or 1.4, we get division by zero and infinite return values.

Since we don’t support making the boundaries inclusive yet, can you please add a small value to either end of the constraint and check if this helps:

epsilon = 1e-9
adapter.constrain("w", lower=0.6 - epsilon, upper=1.4 + epsilon)

philippreiser · February 25, 2025, 4:00pm

Adding the small epsilon solved the problem. Thank you!

Topic		Replies	Views
Cannot generate posterior samples after training General	21	283	March 11, 2024
Cannot do offline training with summary network General	5	196	December 9, 2023
Getting error running Linear Regression Example General	8	46	April 18, 2025
Confused in training `AmortizedPosteriorEstimator` General	6	196	April 23, 2024
How to Prevent Overfitting with a Very Small Dataset? General	7	51	March 26, 2025

Warning during online training leading to NaN loss

Related topics