Any feedback is greatly appreciated. We are also currently working on introducing a nice interface for multilevel models for the soon to be released Bayesflow version.
I have read the paper and it is very important work. By the way, it seems that the relevant code is not available in the paper or the BayesFlow library. Is there any plan to develop the code or when will it be available?
Great work!
I also tried applying normalizing flow, like current version of BayesFlow(cINN), to two-level model as in the paper. Only some global parameters are well-recovered. You used two summary networks and two inference networks, can you give me some insights why you did this and why it did not work well when applying traditional BayeFlow work (one summary and inference work)?
Thanks very much!
@Wanke Yes, definitively! The code for multilevel models is already in Bayesflow, we just haven’t advertised it yet and the new release is right around the corner. If you want to fit multilevel models in Bayesflow today, have a look at bayesflow.amortizers.TwoLevelAmortizedPosterior and bayesflow.summary_networks.HierarchicalNetwork.
@Jice Thanks! Two inference networks are necessary to amortize over the number of groups. Imagine fitting a mean and standard deviation for each group (e.g. body weights of mice in different groups). For 10 groups, that would be 10*2+4 parameters: the local mean and standard deviation for each group + a hierarchical mean and standard deviation for each of the two local parameters.
In principle, we could fit such a model with a single network only. We define our data generating process, sample training data and condition the network on these 24 parameters.
This will work, but has the disadvantage that we can only use this network for datasets with the same number of groups. It wouldn’t work with 9 groups or 11 or 100. This is particularly important if we want to perform leave-group-out cross-validation.
There is also an efficiency aspect. Conditioning on 24 inputs might not be too much, but for 100 groups we would now have 204 parameters which is super wide for such a simple model.
If we use two networks, we can split the problem into two parts: One network estimates the global parameters (4 parameters), while another network estimates local parameters for each individual group (2 parameters). This means we no longer scale linearly with the number of groups and can use networks with smaller input dimensions, which reduces training time and probably also the required simulation budget.
As for your previous issues, feel free to post your network config so we can have a look. In my experience, this is most often an issue of either training time or simulation budget.