I’m working with a very small dataset (only 16 model evaluations: four prior evaluations and for each four likelihood evaluations). With this setup the networks overfit quickly.
I noticed that dropout is already activated by default in the networks. Are there other strategies that could help prevent overfitting in this scenario?
Thank you!
You could try adding L2 weight regularization to your setup. This penalizes large weights and can help mitigate overfitting. Also, you can add an early stopping callback that aborts training when the validation loss starts rising consistently.
Hi Philipp,
what are you trying to achieve in your setup? With only 4 prior evaluations in the training set, most machine learning methods struggle, because so few data points just do not offer a lot of information. It depends on the data you have how to proceed best, but my intuition would be to not use BayesFlow here (if others disagree, please chime in). If possible, I would try to specify a way to measure the distance between data points, which tells you which training sample is closest to your observed data.
What the most sensible approach would be depends on your goals, but even without overfitting BayesFlow will probably not give a meaningful result here.
Hi Valentin,
my motivation for this setup is to explore the best possible performance that BayesFlow can achieve under extreme data limitations. I’m aware that training neural networks with such a small dataset is typically not considered and that overfitting is expected. The goal is to see how well we can avoid overfitting using different techniques and see what performance is achievable with ABI as a baseline.
Ok, that sounds good. As far as I can tell, we currently do not use weight regularizers, though Keras offers them for many types of layers (see here). I do not know if there are plans to include those. The fastest (but not super easy) way forward would probably be to customize the MLP (with ConfigurableHiddenBlock) and the DeepSet (with InvariantModule and EquivariantModule) and to include the weight regularizers in their Dense layers. How familiar are you with Keras and the BayesFlow code base? Is this something you could adapt yourself or would you need support for that?
Maybe you already do it, but since I did not see it in your example code: In addition to the weight regularization suggestions, I would also test much higher dropout values than the BayesFlow defaults for your extremely small data setting (e.g., .3 - .5).
Also as an interesting sidenote: Your attached validation loss is the first example of a pattern roughly resembling double descent I see in amortized inference