5,337
edits
Line 20: | Line 20: | ||
* Train for 4x as long until the training loss and validation loss both flatten. | * Train for 4x as long until the training loss and validation loss both flatten. | ||
* Increase or decrease the learning rate one magnitude. | * Increase or decrease the learning rate one magnitude. | ||
* Make sure the batch size is a multiple of 2. Try increasing it to get more stable gradient updates or decreasing it to get faster iterations. | * Make sure the batch size is a multiple of 2. Try increasing it to get more stable gradient updates or decreasing it to get faster iterations with more noise. | ||
* Try disabling any tricks you have like dropout. | * Try disabling any tricks you have like dropout. | ||