Adding manual summary statistics to summary network

hazhir · February 17, 2024, 10:18am

Hi,
In some experiments with stochastic ODE models we are finding it take a lot of data and training for the summary networks (either TimeSeriesTransformer or SequenceNetwork) to learn the process and measurement noise parameters, even if they have rather clear and intuitive signatures in the outputs which we can visually inspect. That motivates the idea of augmenting automated summary statistics with generic (for time series) manually crafted ones. What is the best way to do that? SplitNetwork? Is there any example out there for how to do that?

Thanks,
hazhir

KLDivergence · February 17, 2024, 1:48pm

Hi Hazhir,

We are currently working on a generic and sophisticated solution to this problem. For now, you can simply achieve what you want with a simple configurator that returns the manually crafted summary statistics into the direct_conditions dictionary key. The raw data stays into the summary_conditions key. They will be combined automatically. Here is an example with the toy model from GitHub:

import numpy as np
import bayesflow as bf

def simulator(theta, n_obs=50, scale=1.0):
    return np.random.default_rng().normal(loc=theta, scale=scale, size=(n_obs, theta.shape[0]))

def prior(D=2, mu=0., sigma=1.0):
    return np.random.default_rng().normal(loc=mu, scale=sigma, size=D)

def configurator(input_dict):
    
    # Example hand crafted statistics: sample average of shape (batch_size, D)
    stats = np.mean(input_dict['sim_dict'], axis=1).astype(np.float32)

    # Raw data will still be processed by the summary network
    raw_data = input_dict['sim_data'].astype(np.float32)

    output_dict = {
        'summary_conditions': raw_data,
        'direct_conditions': stats,
        'parameters': input_dict['prior_draws'].astype(np.float32)
    }
    return output_dict

generative_model = bf.simulation.GenerativeModel(prior, simulator)

# Inspect output
configurator(generative_model(batch_size=3))

# Workflow as usual...

Don’t forget to pass your custom configurator to the Trainer.

hazhir · February 17, 2024, 5:13pm

Great, thanks, and looking forward to your new solution for this problem.

Jice · February 17, 2024, 8:44pm

Hi Hazhir,
I also encountered this situation like yours. In my case, simulation budget is limited due to very expensive forward simulation. What I did is to manually transform the time-series data to frequency-domain data, such as extracting natural frequency from acceleration time-series data. The use of summary statiscs of natural frequency to train the model is very efficient and requires less training data.

ali · August 6, 2024, 10:04pm

I have a question on this. Does this mean that the hand-crafted summary statistics (direct_conditions) are passed to the summary network and learned by the neural net too?

marvinschmitt · August 6, 2024, 10:16pm

Hi,

Direct conditions are not passed to the summary network. Only the “summary conditions” go into the summary network. Then, the output of the summary network is concatenated with the direct conditions, and that’s the conditioning input to the normalizing flow.

See this figure for a conceptual overview:

Topic		Replies	Views
Setting up TimeSeriesTransformer General	8	214	February 14, 2024
Cannot do offline training with summary network General	5	200	December 9, 2023
Time Series Dimension Errors in HierarchicalNetwork General	3	69	January 17, 2025
Attention mask for the TimeSeriesTransformer summary network General	5	186	December 15, 2023
Some questions about the split network General	9	168	January 16, 2024

Adding manual summary statistics to summary network

Related topics