Low training speed with training on GPU

I am currently trying to shift the training from CPU to GPU. In the test with some simple tensorflow benchmark models, tensorflow behaves as expected with a decent improvement in training speed. However when I try to use my GPU for training in bayesflow the training speed seems to slow down roughly by a factor of 2. I checked it with one of the example notebooks (6. Posterior Estimation for SIR-like Models — BayesFlow: Amortized Bayesian Inference) with the same result.

Does anyone have insight in why this is the case? Are my “baseflow networks” possibly just not big enough to fully take advantage of the GPU?

My system configuration is as follows:

CPU: i9-10900
GPU: RTX 3080-Ti

Hi Niels,

If the GPU is configured correctly, then it’s probably the case that your networks are too small to take advantage of efficient large matrix multiplications as offset by copy operations. I would also suggest wrapping the configurator using a @tf.function decorator and possibly switching to tf operations, whenever possible, since fetching and configuring the data at each step are limiting factors.

Hi Stefan,

thank you very much for the reply. The network does only have 762,392 parameters so it being to small to take advantage of the GPU makes sense.

Wrapping the configurator did increase performance by a bit but in the end CPU still seems to be faster in problem, even if I increase the net artificially to close to 8 million parameters (I did this quite naively by just adding more coupling layers and such, this probably does not have the needed effect on matrix sizes).

The configurator basically only normalizes prior draws and extracts simulated data from the input dictionary. I will take a deeper look into tensorflow operations but from my understanding the configurator is not called that often during offline training so I am not sure if this will result in desired speed increases.

Just a quick note: I can reproduce this, for that notebook I see a drop from 103 iterations/s to 75 iterations/s when activating the GPU. For testing the speed, I ran the trainer without the configurator and it does not improve, so I agree that the configurator is not to blame here.
So it is either TensorFlow not being able to take advantage of the GPU in this case, or BayesFlow doing some non-optimal moving of data internally.
Thanks for reporting it, we will keep an eye on it and at some point test it with the upcoming version of BayesFlow.

1 Like