Neual likelihood estimation

Jice · May 13, 2025, 8:55pm

Hi everyone,
I’m currently building a likelihood estimator using BayesFlow v1. The model setup is relatively simple — we aim to estimate the likelihood p(y∣θ), where θ is a two-dimensional parameter vector and y is a scalar (one-dimensional) observation. The structure of the model is outlined below:

def simulator(theta): 
    x1 = np.random.default_rng().normal(loc=theta[0], scale=0.3)
    x2 = np.random.default_rng().normal(loc=theta[1], scale=0.3)
    g1x = (x1**2 * x2)/20-1
    return g1x

def prior_m1(D=2, n_samples=2000):
    return np.random.default_rng().uniform(low=0, high=10, size=(n_samples, D))

theta_samples = prior_m1(D=2, n_samples=20000)
data_samples = np.array([simulator(theta) for theta in theta_samples]).reshape(-1, 1)

I created 20,000 sets of training samples for this problem and trained the likelihood model using the following setup:

SETTINGS_LIK = {
        "dense_args": dict(units=128, activation="relu", kernel_regularizer=tf.keras.regularizers.l2(1e-4)),
        "num_dense": 2,
        "spec_norm": False,
        "mc_dropout": False,
        "dropout": True,
        "residual": False,
        "dropout_prob": 0.2,
}
inference_net = InvertibleNetwork(num_params=1, num_coupling_layers=10,coupling_design='affine',
                                  coupling_settings=SETTINGS_LIK,permutation="learnable")
amortizer = AmortizedLikelihood(inference_net, name="simple").

The model was successfully trained. The goal is to **compare the estimated distribution of y from the likelihood model to the true distribution obtained via Monte Carlo simulations, given the same θ. For example:

sims = {}
theta1,theta2 = 4,	7
sims["conditions"] = np.array([[theta1,theta2]]).astype(np.float32) # 5,1.5
sims["observables"] = np.zeros((1,180,1)).astype(np.float32)
sims["observables"][0,:,0] = np.arange(-14,4,0.1)
x_sim_s = amortizer.sample(sims, n_samples=2000)

from scipy.stats import gaussian_kde
from sklearn.mixture import GaussianMixture
N = int(1e4)
x1 = np.random.normal(theta1, 0.3, size=(N,))
x2 = np.random.normal(theta2, 0.3, size=(N,))

gx = (x1**2 * x2)/20-1

kde_true = gaussian_kde(gx)
x_vals = np.linspace(gx.min(), gx.max(), 500)
pdf_vals = kde_true(x_vals)

x_sim_s_flat = x_sim_s.ravel()  
kde_predict = gaussian_kde(x_sim_s_flat)
x_kde = np.linspace(np.min(x_sim_s_flat), np.max(x_sim_s_flat), 500)
kde_vals = kde_predict(x_kde)

plt.figure(figsize=(8, 5))
plt.plot(x_vals, pdf_vals, color='k', label='True', linewidth=2)
plt.plot(x_kde, kde_vals, color='red', linestyle='--', label='KDE (Predicted)', linewidth=2)
plt.xlabel('gx', fontsize=14)
plt.ylabel('Density', fontsize=14)
plt.title('PDF Estimate of Composite Variable gx', fontsize=16)
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()

The model performance over test data is acceptable, as you see:

However, when I tried new theta values, the estimated pdf of y deviates far from the true pdf:

I’m currently investigating potential reasons behind this issue. Since the observation y is one-dimensional, I suspect that applying an affine coupling flow in the neural likelihood model may be problematic—specifically, it might not be meaningful to split a 1D input into two parts, in standard affine coupling layers. This could be one of the underlying causes.

I also experimented with neural spline flows and interleaved flow architectures, but none of them were able to train the model effectively.

Does anyone have suggestions or insights on how to handle likelihood modeling for scalar outputs using BayesFlow or similar flow-based approaches?

valentin · May 14, 2025, 7:42am

Hi Jice,
I think your idea regarding the 1D input is correct, I have made similar experiences when using Coupling Flows.
One option is to train on an augmented data distribution, where you add an artificial second input variable which is independent from the one of interest. So you would specify the simulator as:

def simulator(theta):
    x1 = np.random.default_rng().normal(loc=theta[0], scale=0.3)
    x2 = np.random.default_rng().normal(loc=theta[1], scale=0.3)
    g1x = (x1**2 * x2)/20-1
    return np.stack([g1x, np.random.default_rng().normal()], axis=-1)

During evaluation, you set this additional variable to zero. If you only care for the samples from the likelihood, you can just ignore the additional dimension. If you care for the density, this will introduce a shift in the log-prob by log N(x=0|mu=0, sigma=0), which you have to correct for (note that this is an approximation that is only correct when the density of the additional variables was learned correctly).

Does this help you? If not, please provide a complete reproducible example (including the training code), so that we can help you to investigate this in detail. If you have any further questions, feel free to reach out.

P.S.: Code formatting here uses code fences (triple backticks), it would be great if you could edit your post accordingly to improve readability.

KLDivergence · May 14, 2025, 8:59am

In addition to Valentin’s helpful points, I would suggest switching to 2.0, as the legacy release is only minimally supported.

Jice · May 14, 2025, 5:45pm

Hi Valenti,.
Thank you very much for the suggestion! I will try to run the model as you suggested.
I also edited the posts using triple backticks.

Jice · May 14, 2025, 11:36pm

Hi Valentin,
Thanks for your suggestion! The results improved significantly after I added an artificial second input variable. With this additional “nuance” variable, I applied the interleaved flow architecture, which combines coupling and spline flows. This design yielded the best performance so far.

In practice, I simply ignore the extra dimension when sampling from the likelihood. Overall, the estimated density aligns with the true one. However, for some parameters, there are still noticeable, as you see

You mentioned that I had to correct for. Do you any suggestions for correction and improvenment?
Thanks!

valentin · May 15, 2025, 5:07am

Nice that you see improved results now. The correction I mentioned is a correction that will only affect the normalization of the density, so it will not improve differences like the one you showed in the last plot.
To further improve results, there are a few things you can try:

use more training data: likelihood estimation usually requires many training samples to yield good result over the whole prior range, so increasing the number of simulations (if they are not too costly) will probably improve the results
train longer: if the validation loss does not indicate overfitting, you can increase the number of epochs
adjust hyperparameters (try different learning rates and batch sizes)
use a more expressive architecture (requires moving to v2): depending how important inference speed is to you, you can try out FlowMatching, which is a more expressive network, at the cost of much higher sampling time. If you want to try this and need help porting your code to v2, let me know.
standardizing your parameters and data before plugging them into the neural network might help as well. This is easy to do using the adapters in v2.

Jice · May 15, 2025, 5:37am

Thanks for the advice! I believe using more expressive architecture would be more useful. As I have already tried to use only affine flow, only spline flow, and combination of both. The combination type is more expressive and gives the best results so far.
I read some papers that argue the flow matching is much more expressive than discrete normalizing flow. It is a good try to use it. Can I ask flow matching or diffusion model in BayesFlow v2 allows to neural likelihood estimation? Cause I see that many studies used flow matching or diffusion model for posterior estimation.

valentin · May 15, 2025, 5:49am

Likelihood estimation and posterior estimation are very similar for training, only the roles of parameters and observations are switched. For v2, this means that you would:

not use a summary network (just as in v1)
put your parameters in "inference_conditions"
put your observations in "inference_variables"

You can find an example of how to specify this in an adapter here (the notebook is new and not on the website yet, and also not proof-read yet, so please reach out if you encounter issues there).

Apart from that, you would proceed with training just as with posterior estimation. Unfortunately, we do not have a complete example for likelihood estimation available yet for v2. If one of us finds the time, we might add one soon, I’ll open an issue for it.

Jice · May 15, 2025, 5:57am

Thanks for the prompt response! I haven’t try the V. 2 yet even for posterior estimate, but definitely would like to play it later.
Again, thank you for the help!

valentin · May 30, 2025, 1:11pm

I have now written a very basic example for likelihood estimation in v2 in this notebook, feel free to take a look and reach out if you encounter problems.

Jice · May 30, 2025, 5:51pm

Thanks for the work on the example of likelihood estimation.
PS: does the BayesFlow v2 support using diffusion model, besides Flowmatching?

valentin · May 31, 2025, 10:00am

We recently added support for diffusion models. They are already present in the development version (see the dev branch of the repository) under bayesflow.experimental.DiffusionModel, if you want to try them out already. We will probably stabilize them soon, and they will be in the next release.

KLDivergence · June 2, 2025, 1:08pm

I suggest moving to v2 asap, as the networks are more powerful and better customizable, and you also have a variety of continuous models (flow matching, consistency, diffusions - experimental until the next release).

Topic		Replies	Views
Some questions related to Likelihood network General	8	243	February 12, 2024
Simple negative binomial inference General	6	167	January 23, 2025
Confused in training `AmortizedPosteriorEstimator` General	6	236	April 23, 2024
Our new preprint on consistency models for posterior estimation Announcements research	2	499	December 21, 2023
Difficulty Estimating a Multivariate Normal Outcome General	28	180	August 7, 2025

Neual likelihood estimation

Related topics