I’m about to engage some catch-up reading on the Bayesflow papers (2020, 2023a, 2023b). Would someone be keen to provide a brief description of what was added from paper to paper?

ex. I think 2020 is mostly about the theory/implementation of normalizing flows for Bayes and amortized inference, while 2023a adds the work you’ve done in BayesFlow to create an API for such inference; 2023b seems to present a new architecture, flow matching, that augments the normalizing flows in some manner?

Edit: Oh, no, flow-matching is a predecessor method to 2023b which describes the title method, CMPE. Flow matching is the evolution of the discrete set of transforms manifest by normalizing flows to a continuous space (presumably eliminating sensitivity to choice of number of transforms, types of transforms, etc of standard normalizing flows). Continuing to read to discern what CMPE does instead…

And in case other learners could benefit from it, here’s a summary of the 2020 paper for folks like myself coming from an MCMC background (& thereby leaves out details like the summary network, how the transform network implies a series of jacobians added to the sum log likelihood, the myriad options for structuring the transform network, etc):

In a probabilistic model, the model structure from priors to likelihood determine a mapping wherein we can determine, for a given set of candidate parameter values and data, the posterior density for that candidate set of parameter values. To characterize the full posterior distribution, we could evaluate this density at all possible sets of candidate parameter values, but this is both intractable (at least, for models with non-discrete parameters) and would yield the posterior in the form of a lookup table that would be difficult to derive meaningful inferences from (ex. expectations, marginal quantiles, etc).

Traditional MCMC provides a more tractable and useful approach to characterization of the posterior by seeking to (eventually) yield draws from the posterior (wherein expectations, marginal quantiles, etc, are easily computed). This is achieved by an iterative process involving generation of candidate parameters values, assessing the posterior density given those values, and using that posterior density to guide subsequent choice of candidate parameters values in a manner that should eventually yield a recent history of candidate values that reflect samples from the posterior. Thus, MCMC can be thought of as a method for wherein computational effort is directed at learning how to generate candidates that are consistent with the posterior, given a specific set of data.

Normalizing flows can be used in a similar manner, to learn how to generate samples from the posterior given a specific data set. Flows achieve this not by a guided/markovian exploration of the parameter space, but instead by sampling an easy-to-sample distribution like the multivariate normal then passing these values through a transform network to yield candidate parameter values. During training, these candidate parameter values and the data are used to compute the posterior density, and this value is used to define an error signal that is back-propagated through the transform network. Repeating this across many samples from the easy-to-sample distribution trains the transform network how to transform samples from the easy-to-sample network to parameter values that are consistent with the posterior distribution. Thus, after training posterior samples are obtained by generating many samples from the easy-to-sample distribution and pushing each through the transform network, yielding a distribution of parameter values that should reflect samples from the posterior.

So both MCMC and Flows can be used to expend compute to learn how to sample the posterior for a given data set, with new compute necessary to sample the posterior given a new data set. Since MCMC is inherently memory-free, learning occurring through evolution of the sampling process, there is no possibility of generalizing learning from data set to data set. However, with a simple modification, Flows can be trained such that, after training, they can be used on any data set without re-training. This is achieved by training using many simulated datasets generated by the model, where a given simulated dataset is used as before in computing the posterior density for a given set of parameter values output from the transform network, but additionally the simulated data are used as inputs to the transform network. The transform network thus learns how to generate posterior samples conditional on arbitrary data.