Dear all,

we are currently trying to apply Bayesflow to a reaction time and error rate based cognitive model (Diffusion model for conflict tasks, Ulrich et al., 2015). Despite very good parameter recoveries (r > .85), we experience a few difficulties concerning the fitting routine and model misspecification.

**Model misspecification**

In line with the model misspecification workflow on the bayesflow.org site, we plotted the summary statistics derived from the summary network and calculated the maximum mean discrepancy. We used several prior distributions (beta, uniform and normal) within approximately the same range. The Observed data comprises three flanker data sets from different studies (A1, B1, C2). We split the observed data sets randomly into batches of 200 observations and applied the summary network to each of these batches. The number of simulated batches as well as the number of observations per batch match approximately the resulting number of observed batches. The results (see attached figure) are somewhat unsatisfactory, since the model seems to produce far more extreme summary statistics than the observed data. However, the data space looks quite good. At the moment we are concerned with the following questions:

Is the procedure of batching the observed data legit? How should the observed data structure correspond with the simulated data in terms of numbers of batches and observations per batch?

How exactly should we interpret these plots? Can we derive any direction in which we could adapt the priors, prior ranges or model specification that is more promising than just randomly trying different things?

In the âDetecting Model Misspecificationâ workflow, subsection âHypothesis testâ it says: âIt is important that the number of simulated data sets to estimate the sampling distribution of the summary under the null hypothesis matches the number of observed data sets.â The code is as follows:

observed_data = trainer.configurator(trainer.generative_model(10))

MMD_sampling_distribution, MMD_observed = trainer.mmd_hypothesis_test(observed_data, num_reference_simulations=1000, num_null_samples=500, bootstrap=False)

_ = bf.diagnostics.plot_mmd_hypothesis_test(MMD_sampling_distribution, MMD_observed)

So does âmatchâ mean that the ratio between number of observed batches and reference simulations has to be 1:100 or does this refer to a another number of simulations?

**Training phase**

We are currently running the online training in 100 epochs with 1000 iterations each. Each iteration takes about 5-10 seconds (with a batch size of 16 and 200 to 1000 observations in each batch), resulting in a total time of about 280 hours. Here is our code:

```
summary_net = bf.networks.SetTransformer(input_dim = 4,
summary_dim = 32,
name = "dmc_summary")
inference_net = bf.networks.InvertibleNetwork( num_params = len(prior.param_names),
num_coupling_layers = 12,
coupling_settings = {
"dropout_prob": 0.1,
'bins': 64},
name = "dmc_inference")
amortizer = bf.amortizers.AmortizedPosterior(inference_net,
summary_net,
name = 'dmc_amortizerâ,
summary_loss_fun = "MMD")
trainer = bf.trainers.Trainer(
generative_model = model,
amortizer = amortizer,
configurator = configurator,
checkpoint_path = model_dir,
memory = True)
h = trainer.train_online(epochs = 100, iterations_per_epoch = 1000,
batch_size = 16,
save_checkpoint = True)
```

Do you have any suggestions how to speed up the training phase? We are planning to try offline training, but arenât sure if that is significantly more efficient or might deteriorate the results.

Any thoughts or suggestions are highly appreciated, thank you in advance!

Best,

Simon Schaefer