Hi,
I am considering applications of Bayesflow to dynamic models (typically differential equations, but also agent based), where the amortization can add value when estimating the model for a large number of units (e.g. same model for different organizations, or subjects in an experiment). However, each unit (e.g. subject) is exposed to different ‘driving’ data (e.g. inventory time series feeding into decisions in a Beer Game experiment where we estimate the effect of inventory on subjects’ ordering; or customer demands data going into a retail store model where we are estimating production function for the store). In these settings the core model is the same, but different input data (which could also be seen as different constants in the model) are driving simulations for different units. What is the best way of tackling these setups in Bayesflow? I can see two different ways, and suspect there may be others I am missing, so very much welcome suggestions on the most viable choices (and why :)):

We can define each subject to have a separate ‘context’, and go through training by offering samples from all those contexts to the algorithm.

We can define auxiliary (unknown) parameters, to be estimated, that map perfectly into the ‘driving’ data variables. Then feeding the driving data as summary stats to be matched, the learning of those parameters would be trivial (just pick the data point corresponding to each summary stat); then those parameters will do the work of impacting the rest of model dynamics. This method does not require any context variable, but comes with much larger dimensionality for the estimation problem.
Thanks a lot!

Without further details, I would go for option 1., as this is the way we typically ensure that the networks generalize over a wide variety of configurations. You may want to check out our recent pre-print which takes the idea of “context” one step further than the standard amortization over different data set sizes:

Thanks. Very helpful Stefan. The Sensitivity-aware paper discusses data sensitivity more in the context of minor differences in input data, compared to distinct driving data going into the model. I guess I should first do some experimentation to have better intuitions, but I am wondering where the efficiency gains come from in using contexts, e.g. vs. separately training networks for each unit, when driving data is fairly different for different units. The reason is that even though the underlying parameters to be estimated may not be that different across units, the outputs of the model may be, if they are heavily driven by the exogenous data. As such, the shape of the network transforming data into posterior would look rather different across subjects and thus my question. As an extreme case, consider a simple linear regression (independent variables X are the ‘driving data’ where as the output of the model is the response variable Y) with multiple data point for each subject, and multiple subjects in the dataset. Lets say we only care about d fixed effects for N subjects. We could 1) estimate the whole model for all subjects in one network (with N*d parameters), 2) think of each subject as a context (estimating d parameters over N contexts), or 3) estimate d parameters for each subject in N separate estimation tasks. Would we expect notable advantages for any of these alternative setups?

I believe what you are describing boils down to a difference between a hierarchical model and non-hierarchical model in a Bayesian context, and will be determined by how your simulator generates the data and whether there are shared parameters across the subjects or not. Concretely,

This makes sense only if you care about the joint posterior of all N*d parameters and the estimates of the subjects are mutually informative. Also, your summary network(s) needs to take care of the hierarchical structure of the data accordingly (e.g., exchangeable subjects, time series data for each subject, etc).

This makes sense if there are no shared parameters between subjects and estimating the posterior for subject A delivers no information gain regarding the posterior for subject B.

This is already subsumed under 2), unless the context is a more special variable, such as a design matrix, that varies across subjects and you also need the networks to generalize to unseen contexts (as in the sensitivity paper).