Stochastic Gradient Descent Or SGD
- SGD is a variation of Gradient Descent
- As in modern world there can be million of data, so like Gradient Descent, models can't see all the data once and find the slope (though that would be the optimal)
- Memory issue
- Compute Issue
- So SGD look at one data at a time
- The problem is if the data is noisy, then it can take a lot of time to converge or not converge to Global Minima (Local Minima)
- Suffers from high variance
- For both, the solution is Mini Batch SGD