Stochastic Gradient Descent Or SGD

  • SGD is a variation of Gradient Descent
  • As in modern world there can be million of data, so like Gradient Descent, models can't see all the data once and find the slope (though that would be the optimal)
    • Memory issue
    • Compute Issue
  • So SGD look at one data at a time
  • The problem is if the data is noisy, then it can take a lot of time to converge or not converge to Global Minima (Local Minima)
  • Suffers from high variance