Question about ordered model output and context variables

You can continue training by setting the learning rate of the optimizer to (approximately) match the learning rate you had when you stopped training (e.g., by computing the cosine decayed lr at step t using the keras scheduler).