Prior function and random seeds

Hi,

I’m new to BayesFlow and am trying to get it set up with some other Bayesian analysis software that I use so sorry if this is a basic question. At the moment, I have come up against a pretty simple error which is that when I call the prior I get identical parameter values for each batch e.g. below:

{'prior_draws': array([[0.93742176, 0.89759925, 0.77244278],
        [0.93742176, 0.89759925, 0.77244278],
        [0.93742176, 0.89759925, 0.77244278],
        [0.93742176, 0.89759925, 0.77244278],
        [0.93742176, 0.89759925, 0.77244278]]),
 'batchable_context': None,
 'non_batchable_context': None}

If I call the prior function on it’s own in a loop I get the behaviour I expect:

[0.87786037 0.80013516 0.5558559 ]
[0.87927099 0.80244343 0.5609854 ]
[1.18831819 1.30815704 1.68479343]
[0.98828857 0.98083585 0.95741299]
[1.04127246 1.06753675 1.15008166]

Has anyone else seen an issue like this? I assume that under the hood the prior_fun is being called for all the batches all at once so this means all of them use the same random seed but I’ve been struggling to find a way around this. I got the impression that maybe the context_generator might be what I need to be using to avoid this problem but couldn’t quite figure out how?

Thanks for any help :slight_smile:

Hi,
I think your guess with a random seed being set somewhere is a good lead, but I have not encountered this problem myself, yet. Could you please provide a stand-alone code example so that we can reproduce the behavior? Alternatively, as you are new to BayesFlow, you could also try out the upcoming refactored 2.0 version from the dev branch here. It is still under development, but we will hopefully be able release a first stable version soon. As the new version has a somewhat different API, it might be worthwhile to directly start working with this, as this is where future development will happen.

Hi, I think this is a problem in the definition of the prior generator itself. Could you please post the code? Also, we suggest switching to the newer version:

Hi, thanks for the replies! So the use case here is a little bit weird. I am actually calling a function from a larger C++ analysis framework (called MaCh3) that I’ve setup python bindings to so it’s a little hard for me to write down how I’m doing things in great detail without sending links to another C++ repo. Below is a slightly simplified version of how I am setting things up on the python side of things though.

import _pyMaCh3Tutorial as m3

class CustomObject:
    """ A simple class to hold all the necessary objects from MaCh3 
    and a place to define the prior and simulation functions
    """
    def __init__(self, covariance_config):
        self.cov = m3.covariance.Covariance(covariance_config)
    
    def prior_throw(self):
        # Random through to propose new parameter values
        # Throw_par_prop is a python binding to a C++ function
        self.cov.throw_par_prop()

        # then actually get the proposed steps as a np array
        # again, get_proposal_array is a python binding to a C++ function
        return np.asarray(self.cov.get_proposal_array())

Then I am setting up my prior object like this:

custom_obj = CustomObject(["Inputs/ParametersForMyModel.yaml"])
prior = bf.simulation.Prior(prior_fun=custom_obj.prior_throw)
prior(batch_size=10)

This will give me 10 identical draws from the prior. Whereas the following code will give be 10 different draws from the prior.

for _ in range(0,10):
    print(custom_object.prior_throw())

I am a little unsure of why the two cases should be giving me different results. Looking to the prior class in the simlulator it should be doing something extremely similar to my simple for loop. I will continue some debugging on the C++ side of the functions to understand where things are going wrong but I am a little confused. For now I have been able to work around by making simulations offline and putting them into a dictionary but ideally I would like to use the online training.

Let me know if you can see that I am doing anything immediately wrong though, as I say I am new to BayesFlow (and not a fantastic python programmer) so I do not rule out that I’ve done something silly!

If it helps for more information each simulation should have a unique throw from the prior as each of my parameters in non_batchable.

In terms of trying on the dev branch, I can try this to see if this fixes my problems with the prior function. However, I am actually using a fork at the minute (here) which has some functionality which I need for my use case. Basically this fork has the functionality to apply individual weights to your training data and take this into account in your loss function in the inference network (see here). We should have a separate discussion about whether it would be possible to include this feature into develop as it is fairly common use case in Particle Physics. Myself and the owner of that fork are both based at Imperial College London and in Particle Physics, hence me piggy-backing off of his fork. As I say though this is probably best for a separate discussion to my current problem.

Thanks again for the replies and I’m happy to talk more about my use case if it’s helpful :slight_smile:

1 Like

Hi,
this is really strange, so I can only offer vague guesses. As you say, the Prior.__call__ function here should essentially call

np.array([custom_object.prior_throw() for _ in range(batch_size)])

which is just a condensed form of writing the for loop that works for you, so unfortunately I cannot see a logical reason why it does not work. Could you use a debugger to verify that this is the code that is actually executed?

From the behavior I would guess that somehow in the first case, the random number generator is not correctly updated between calls, which could point to some sort of concurrency going on (though none is visible in the code…). If you repeatedly call prior(batch_size=10), do you get the same values or do the values differ between calls?

1 Like

If the problem persists, you may try bypassing the simulation wrappers altogether and simply ensure that your custom generators return dictionaries with the appropriate keys.