Gradient Descent

  • Gradient descent finds the Derivative or slope of loss ($\frac{\partial loss}{\partial w}$) on a point
  • Make it 0 to minimize it
  • Then update the weight by this formula
    $$
    w_{new} = w_{old} - \alpha * \frac{\partial loss}{\partial w}
    $$
  • When the point is far from minimum, it gives big loss values
  • When the point is closer to minimum, it gives small loss values
  • Gradient descent can be stuck to local minima
    Pasted image 20231021131531.png
    • One solution is to randomly initialize and run again