In the tutorials, there are several examples of different training losses. In some cases, the loss reaches -2, but in others (e.g., the ODE example), it reaches -15. What are the criteria for an acceptable loss value? If the training loss is -4, but the model cannot properly recover the ground truth for some parameters, what does that imply?
Hi Ali, because the loss lacks the constant entropy factor that would make the minimum 0, the optimal value will depend on the problem.
In general, the more parameters (i.e., latent dimensions) you have, the smaller will be the loss, and the less diagnostic for recovery of individual parameters. Within an application, however, you can compare and rank different architectures, for example, a spline vs affine, but you need additional diagnostics for determining actual performance in terms of posterior fidelity or precision / contraction.