Sigmoid Function
[!def] Sigmoid Rule
$$
f(x) = \frac{1}{1+e^{-x}}
$$
- Range: $[0, 1]$
- Used for Binary Cross Entropy
- We use it if we need probabilities on the node
- Can lead to Vanishing Gradient as the function become saturated for very big or small values
- Not zero centered, so gradient will always have same sign; hence might occur slower convergence