Attention mask for the TimeSeriesTransformer summary network

I am trying to use time series data of different lengths with the TimeSeriesTransformer as summary network. To avoid having to make batches of time series of the same length I was thinking of padding them to a fixed length and then use an attention mask. Is there an easy way to use an attention mask in the TimeSeriesTransformer summary network or would I have to rewrite the class?


Hi Leonardo,

This may be tricky to do currently and requires some modifications where we allow masks in the configured inputs. Let’s discuss the possibility of adding this functionality.

@marvinschmitt @elseml @paul.buerkner @valentin What do you think about:

inference_mask: ...,
summary_mask: ...,

In the configurator keys, which, when present, are propagated to the appropriate networks?

@leo To get you started quickly, you can do a custom modification to the existing time series transformer.

Can you write out some example code to showcase the syntax you have in mind?

@paul.buerkner The syntax will be the one above. Currently, our configuration dictionaries have some combination of the keys:

conf = {
    "parameters": ...,

I am proposing to allow for additional optional keys:

conf = {
    "parameters": ...,

which can hold masks for each of the networks’ outputs and are propagated to the associated networks by the Amortizer, if they are present.

Thanks for the details. That looks reasonable to me!

1 Like

Agree, optional keys for specific use cases are an elegant solution.