Gradient Descent
- Gradient descent finds the Derivative or slope of loss ($\frac{\partial loss}{\partial w}$) on a point
- Make it 0 to minimize it
- Then update the weight by this formula
$$
w_{new} = w_{old} - \alpha * \frac{\partial loss}{\partial w}
$$ - When the point is far from minimum, it gives big loss values
- When the point is closer to minimum, it gives small loss values
- Gradient descent can be stuck to local minima
- One solution is to randomly initialize and run again