Exploding Gradient
- Exploding gradient occurs when the gradient of the weights become so large that it becomes
NaN
due to overflow - Why it occurs?
- If the gradient is greater than 1 and the network is too deep, then the gradient accumulates to a very large number
- How to identify?
- The model weights quickly become very large during training
- Model weights go to
NaN
- The error gradient is always above 1.0 for each node and layer during traning
- What to do?
- Decrease the depth
- LSTM
- Gradient Clipping
- L1 or Lasso Regression
- L2 or Ridge Regression
TODO
- Revise what to do
- Create flash card