Sigmoid Function

[!def] Sigmoid Rule
$$
f(x) = \frac{1}{1+e^{-x}}
$$

  • Range: $[0, 1]$
  • Used for Binary Cross Entropy
  • We use it if we need probabilities on the node
  • Can lead to Vanishing Gradient as the function become saturated for very big or small values
  • Not zero centered, so gradient will always have same sign; hence might occur slower convergence

Pasted image 20231103173221.png