Context arrays being flattened when passed from meta() to simulator function

Hi everyone,

I’m encountering what appears to be a bug where multi-dimensional numpy arrays returned from the meta() function are being flattened when passed to the simulator function. Here are the details:

Problem Description

I have a meta() function that generates a 2D matrix and a simulator function that expects to receive this matrix with its original shape. However, the 2D array is being flattened to 1D somewhere in the BayesFlow pipeline.

Code Setup

Meta function:

def meta(batch_size):
    num_obs = random_num_obs()  # Returns e.g., 60
    outcomes = generate_context_matrix(num_obs)  # Returns (60, 2) array
    print(f"meta: outcomes shape = {outcomes.shape}")  # Prints (60, 2)
    return dict(num_obs=num_obs, outcomes=outcomes)

def generate_context_matrix(num_obs, probs=[0.8, 0.2]):
    outcome_mat = np.zeros([num_obs, 2])
    for n in range(num_obs):
        if (n > 1) & (n % 12 == 0):
            probs = [probs[1], probs[0]]
        outcome_mat[n, 0] = np.random.binomial(n=1, p=probs[0])
        outcome_mat[n, 1] = np.random.binomial(n=1, p=probs[1])
    return outcome_mat

Simulator function:

def simulate_trials(params, outcomes, num_obs):
    print(f"simulate_trials: outcomes shape = {outcomes.shape}")  # Prints (2,)
    print(f"simulate_trials: outcomes = {outcomes}")  # Prints [0. 0.]
    data = np.zeros((num_obs, 3))
 for n in range(num_obs):
     data[n, :2] = gen_trial(params)
     choice = int(data[n, 1])
     # This fails because outcomes is 1D instead of 2D
     data[n, 2] = outcomes[n, choice]  # IndexError: too many indices for array

Debug Output

meta: outcomes shape = (60, 2)
simulate_trials: outcomes shape = (2,)
simulate_trials: outcomes = [0. 0.]

The 2D array (60, 2) is somehow being flattened to 1D (2,) between the meta() function and the simulator function.

Error

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

This occurs when trying to access outcomes[n, choice] where the simulator expects a 2D array but receives a 1D array.

Environment

  • BayesFlow version: 2.0.3
  • Python version: 3.11.13

Questions

  1. Is this a known issue with how BayesFlow handles multi-dimensional context arrays?
  2. Is there a specific way that context arrays should be structured or returned to preserve their dimensionality?
  3. Are there any workarounds for this issue?

Any help would be greatly appreciated!

1 Like

Hi,
welcome to the forum and thanks for the report. The issue is caused by the way we check whether a variable is batched or not in batched_call (mostly relevant to the devs reading this).

For now, I think there are two options to work around this.
a) Edit: This option does not seem to work :wink:
b) you can work around this by adding an additional dimension of length batch_size, which just contains batch_size copies of your multi-dimensional arrays:

def meta(batch_size):
    num_obs = 60  # Returns e.g., 60
    outcomes = generate_context_matrix(num_obs)  # Returns (60, 2) array
    # copy batch_size times
    outcomes = np.repeat(outcomes[None], batch_size, axis=0)
    print(f"meta: outcomes shape = {outcomes.shape}")  # Prints (batch_size, 60, 2)
    return dict(num_obs=num_obs, outcomes=outcomes)

I hope this helps. Maybe also open an issue on GitHub so that we can think about a proper way to handle this case…

2 Likes

Wouldn’t you want the context do differ between data sets? In that case, I would simply treat it as another “subsimulator” in the make_simulator utility. meta should be used only for quantities like num_obs that would otherwise make the simulator’s outputs non-rectangular.