- Exploding gradient occurs when the gradient of the weights become so large that it becomes
NaNdue to overflow
- Why it occurs?
- If the gradient is greater than 1 and the network is too deep, then the gradient accumulates to a very large number
- How to identify?
- The model weights quickly become very large during training
- Model weights go to
- The error gradient is always above 1.0 for each node and layer during traning
- What to do?
- Revise what to do
- Create flash card