Exploding Gradient

  • Exploding gradient occurs when the gradient of the weights become so large that it becomes NaN due to overflow
  • Why it occurs?
    • If the gradient is greater than 1 and the network is too deep, then the gradient accumulates to a very large number
  • How to identify?
    • The model weights quickly become very large during training
    • Model weights go to NaN
    • The error gradient is always above 1.0 for each node and layer during traning
  • What to do?

TODO

  1. Revise what to do
  2. Create flash card