Hi,
I’ve been having problems when trying to train a simple inference network with a 3D dataset. The problem arises here when the target and conditions are being concatenated in the dense network. I think what is happening is that the tiling on the conditions here isn’t behaving as I’m expecting. I’ll give a bit more detail on my problem below.
So I’ve using an offline trainer where I’m passing in a dictionary with many simulations of 10,000 data events each just with a single data feature (I will add more features in the future). So my “sim_data” has dimensions (N_simulations, 1E3, 1) and then I’m just using a couple of parameters so my “prior_draws” is of shape (N_simulations, 2). Crucially though in my setup I’m not using a summary network and I want to train directly on all the 10k events in each simulation.
I’ve also been able to reproduce this error in one of the tutorials. So the Linear ODE system example the sim data has a similar structure to what I’m doing so something like (N_simulations, 32, 2). If I remove the summary_net from the ammortizer (and change the “summary_conditions” to “direct_conditions” in the configurator) I see the same error that I’m encountering. I guess the other relevant example is the 2 moons example where no summary network is used but the sim data has dimensions (N_simulations, 2) so it’s a 2D data set and works without a problem. So this all makes me think that the 3D case in the dense network isn’t being handled correctly? This is the actual error message I’m getting:
ConfigurationError: Could not carry out computations of generative_model ->configurator -> amortizer -> loss! Error trace:
Exception encountered when calling layer 'dense_coupling_net' (type DenseCouplingNet).
{{function_node __wrapped__ConcatV2_N_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Ranks of all input tensors should match: shape[0] = [2,3] vs. shape[1] = [2,32,2] [Op:ConcatV2] name: concat
Call arguments received by layer 'dense_coupling_net' (type DenseCouplingNet):
• target=tf.Tensor(shape=(2, 3), dtype=float32)
• condition=tf.Tensor(shape=(2, 32, 2), dtype=float32)
• kwargs={'training': 'None'}
Any suggestions on what to do would be really appreciated!
Also I realise I’m not using v2 of the code (which looks really great btw!) and I do intend on updating to this in the future. So if part of the answer here is that this is all fine in v2 then this probably pushes me to switch to v2 sooner (although then I might then reach out about something else).
All the best,
Ed
Hi Ed,
I think the way to go here would be to manually convert your 3D dataset to a 2D dataset, for example using a simple reshape
call to flatten the last to dimensions, so that you end up with the shape (N_simulations, n_data_events * n_features)
. If you do not use a summary network, the order of the conditions that you give to the inference network should not matter (because in the beginning of the training, the network does not know which input is which anyway) so it is not relevant how exactly you reshape, as long as you always do it in the same way. Does this make sense, or should I try to give a more detailed explanation?
One thing to keep in mind is that the inference network might have difficulties with a condition that large. Could you explain your reasoning on why you do not want to use one here?
I think the question is mainly a conceptual one and the version of BayesFlow is not relevant for this.
Hi Valentin,
Thanks for the speedy response!! Yeah that makes sense and I should be able to do that in a pre-processing step with my simulations. I didn’t really think about it but now you say it, it’s quite an obvious work-around!
I guess some context for the area I’m using this in and what I would ideally like to get out at the end of things. So I’m a researcher in particle physics so my “sim_data” is lots (often of the order of millions) of Monte Carlo simulated events. Each one of these Monte-Carlo events gets an effective weight based on the parameter values and is really an effective probability for that event having occurred. In our “traditional” analyses we basically fill histograms with these weighted events and then calculate a likelihood compared to some binned data and use MCMC to sample posterior distributions. To be clear in these analyses each set of parameter draw changes the weights of each one of these monte carlo events and hence the number and shape of the histograms we are using for the likelihood calculation (we call this “re-weighting” and is a crude approximation of re-running the full Monte Carlo simulation with different input parameter values which isn’t computationally feasible.). So my plan was to replace the MCMC part of with an inference network and train it where I condition on all these weighted Monte Carlo events and my parameter values. This also moves away from using histograms and a likelihood based analysis which should have greater statistical power.
I think the crucial part of why I don’t want to use a summary network is that when I come to run this on real data this will be in the form of individual events. So if I have trained my individual monte carlo events (or small batches) I should have a network which can work on single data events and output posteriors. Then I can send many real data events through my network to build posteriors for a full dataset (by adding together the log of the posteriors for individual events).
Sorry if this is a lot of jargon or is unclear btw. I haven’t seen too many examples of Ammortized Bayesian Inference being used in particle physics before so maybe I’m going about this or thinking about this the wrong way. The only example I’ve seen (which does use Bayesflow and a summary network) was here but it’s quite a simple example to be honest.
I hope this explains why I was planning on not using a summary network? But I might have just misunderstood something. I think the main thing is that in the end I would like to be able to send individual data events through the inference network.
Also for the record I am using a fork of Bayesflow which a colleague is using which has some minor modifications to add in loss weights for each event here. We were actually planning on reaching out to see if this might be something that could be implemented in the main repository in the future.
This turned into a bit of a wall of text but I hope this explains what I’m trying to do some more detail? I’m happy to have a chat about this any time as I think there’s huge potential to use BayesFlow for Particle Physics analyses.
Thanks again for the help, I should be able to keep going with the solution you proposed.
All the best,
Ed
Hi Ed,
thanks for the detailed response. I don’t have domain knowledge, and not the time to dig deeper, so please let me know if I misunderstand something. I’ll try to summarize what I understand, maybe this makes it easier to spot misunderstandings on my part.
- the idea is to replace the part of the pipeline where the histograms were used with BayesFlow
- posteriors are calculated based on Monte-Carlo events (with a corresponding weight for each, which are different for each parameter draw). Would it be correct to say that the Monte-Carlo events are independent of the parameters we try to infer, and the only difference lies in the weights? Or is there another link between parameters and simulations? Phrased differently, are the different weights the only difference in the data between parameter draws?
- simulated data:
n_events ~ 1e6
Monte carlo simulated events, each with a feature dimension n_features
(identical for each event?). So a “single” simulator output in the BayesFlow sense has a shape (n_events, n_feature)
. So each training batch has shape (batch_size, n_events, n_features)
. The Monte carlo simulations are exchangeable (i.e., their order does not matter). Generally, one would try to incorporate this knowledge into the network, e.g. by using a DeepSet
/SetTransformer
architecture (maybe not applicable here).
The next part is not clear to me yet, I think partly because it is a bit in “conflict” to the previous one conceptually, see below. You want to do inference on individual monte carlo events, right? So an example of the shape of the condition you supply to the network would be (in the case of one event) (batch_size, 1, n_feature)
. One limitation that we have (especially when you do not have a summary network) is that the shapes for training and for inference have to be identical (except for the batch size). So to obtain a posterior for a single event, you also need to train with one event (i.e., your shape for each training batch would also be (batch_size, 1, n_feature)
, this is the “conflict” to the previous paragraph I mean). Is this something that still would make sense in your setup?
Another difficulty here is that doing a Bayesian update to combine multiple observations is not always straight-forward, though as far as I know it is possible with score-based models when your observations are exchangeable. The relevant paper for this is Compositional Score Modeling for Simulation-based Inference. There is an ongoing project to explore and scale this in the context of BayesFlow, depending on the kind of problem this might be interesting here.
My current impression is that you might have some conceptual misunderstandings regarding what SBI methods can do for you, and how you would have to plug together the different parts. As I lack domain knowledge on your setup, I cannot really pinpoint it and propose solutions. Maybe you can comment on what I have written above (feel free to ask clarifying questions) and maybe it leads to a more concrete idea on how your setup would have to look like to try this out.
Best,
Valentin