How to Prevent Overfitting with a Very Small Dataset?

philippreiser · March 25, 2025, 2:43pm

Hi everyone,

I’m working with a very small dataset (only 16 model evaluations: four prior evaluations and for each four likelihood evaluations). With this setup the networks overfit quickly.

Here’s an example:

import os
os.environ["KERAS_BACKEND"] = "jax"
os.environ["JAX_PLATFORMS"] = "cpu"
import bayesflow as bf
import numpy as np
import keras

np.random.seed(0)

def prior():
    return {"w": np.random.normal(1, 0.2)}

def likelihood(w):
    N = 4
    x = np.float32(np.random.uniform(1, 200, N))

    y = np.random.normal(w * np.log(x) + 0.01 * x + 1 + np.sin(0.05 * x), 0.3, size=N)
    return dict(y=y, x=x)

simulator = bf.simulators.make_simulator([prior, likelihood])

adapter = (
    bf.Adapter()
    .as_set(["x", "y"])
    .standardize()
    .concatenate(["w"], into="inference_variables")
    .concatenate(["x", "y"], into="summary_variables")
)
inference_network = bf.networks.CouplingFlow()
summary_network = bf.networks.DeepSet(depth=10)
approximator = bf.ContinuousApproximator(
   inference_network=inference_network,
   summary_network=summary_network,
   adapter=adapter,
)

train_data = simulator.sample(4)
val_data = simulator.sample(20)
train_dataset = bf.OfflineDataset(train_data, batch_size=4, adapter=adapter)
val_dataset = bf.OfflineDataset(val_data, batch_size=20, adapter=adapter)

optimizer = keras.optimizers.Adam(learning_rate=5e-4, clipnorm=1.0)
approximator.compile(optimizer=optimizer)
history = approximator.fit(
    epochs=50,
    dataset=train_dataset,
    validation_data=val_dataset
)
fig = bf.diagnostics.plots.loss(history, train_key="loss")
fig.savefig("loss.png")

I noticed that dropout is already activated by default in the networks. Are there other strategies that could help prevent overfitting in this scenario?
Thank you!

marvinschmitt · March 25, 2025, 3:02pm

Hi Philipp,

You could try adding L2 weight regularization to your setup. This penalizes large weights and can help mitigate overfitting. Also, you can add an early stopping callback that aborts training when the validation loss starts rising consistently.

Cheers!

valentin · March 25, 2025, 3:36pm

Hi Philipp,
what are you trying to achieve in your setup? With only 4 prior evaluations in the training set, most machine learning methods struggle, because so few data points just do not offer a lot of information. It depends on the data you have how to proceed best, but my intuition would be to not use BayesFlow here (if others disagree, please chime in). If possible, I would try to specify a way to measure the distance between data points, which tells you which training sample is closest to your observed data.
What the most sensible approach would be depends on your goals, but even without overfitting BayesFlow will probably not give a meaningful result here.

philippreiser · March 25, 2025, 4:46pm

Hi Marvin,
thank you for your suggestions. Do you know if L2 weight regularization is implemented as an option in the new BayesFlow version?

philippreiser · March 25, 2025, 4:59pm

Hi Valentin,
my motivation for this setup is to explore the best possible performance that BayesFlow can achieve under extreme data limitations. I’m aware that training neural networks with such a small dataset is typically not considered and that overfitting is expected. The goal is to see how well we can avoid overfitting using different techniques and see what performance is achievable with ABI as a baseline.

valentin · March 25, 2025, 5:16pm

Ok, that sounds good. As far as I can tell, we currently do not use weight regularizers, though Keras offers them for many types of layers (see here). I do not know if there are plans to include those. The fastest (but not super easy) way forward would probably be to customize the MLP (with ConfigurableHiddenBlock) and the DeepSet (with InvariantModule and EquivariantModule) and to include the weight regularizers in their Dense layers. How familiar are you with Keras and the BayesFlow code base? Is this something you could adapt yourself or would you need support for that?

philippreiser · March 25, 2025, 6:26pm

Thank you for your suggestions and references. I’ll try to implement them and let you know if I need any help.

elseml · March 26, 2025, 12:36pm

Hi Philipp,

Maybe you already do it, but since I did not see it in your example code: In addition to the weight regularization suggestions, I would also test much higher dropout values than the BayesFlow defaults for your extremely small data setting (e.g., .3 - .5).

Also as an interesting sidenote: Your attached validation loss is the first example of a pattern roughly resembling double descent I see in amortized inference

Topic		Replies	Views
How to correctly load a trained ContinuousApproximator model to sample from it? General	4	97	February 16, 2025
Error when using loaded approximator General	9	43	June 17, 2025
Confused in training `AmortizedPosteriorEstimator` General	6	204	April 23, 2024
How to correctly load a trained ModelComparisonApproximator model? General	7	35	June 17, 2025
Some questions related to Likelihood network General	8	229	February 12, 2024

How to Prevent Overfitting with a Very Small Dataset?

Related topics