Handling missing data in summary and inference networks

marvinschmitt · January 15, 2024, 12:53am

Hi,

Thanks for posting the question on the BayesFlow Forums, I appreciate it.

There has been scholar work on dealing with missing data for Neural Posterior Estimation (NPE): https://www.biorxiv.org/content/10.1101/2023.01.09.523219v1

In a nutshell, they argue for encoding the missing data with some specific (impossible) value and adding a missingness mask. For instance, if your data is known to be positive real-valued, y>0, y\in\mathbb{R}, you would encode missing data as y=-5 and additionally add a mask (additional data dimension) that contains m=0 for existing observations and m=1 for missing ones. To this end, it’s important that you also include missing data in the NN training phase. You can achieve this with a configurator – this way, your current simulator can remain as-is.

I have personally used this technique in the context of multimodal NPE, where we additionally want to integrate data from heterogeneous data sources. See Experiment 2 in the paper: [2311.10671] Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference
As described above, I use a normal simulator and handle all missing data in the configurator. The code is currently closed-source but we’ll release it in the future. In the meantime, you can reach out to me and I’m happy to share the code with you.

Cheers,
Marvin

Topic		Replies	Views
Preferred way to deal with time series with non-equidistant time steps General	10	199	August 8, 2024
Cannot do offline training with summary network General	5	205	December 9, 2023
Adding manual summary statistics to summary network General	5	164	August 6, 2024
Handle 3D dataset General	3	33	April 29, 2025
Setting up TimeSeriesTransformer General	8	217	February 14, 2024

Handling missing data in summary and inference networks

Related topics