Activation Function

  • AKA transfer Function

In the neural layer with a weight and bias is simply can be defined as,
$$
y = Wx + b
$$

[!question] Why do we use an activation function?
Without the activation function, no matter how much layers we add, indeed all are just a linear regression model and fails to learn complex patterns. In deep learning, non-linear activation functions are mostly use as without the non-linearity all the layer becomes one linear combination of parameters.

Non-Linear Activation Functions

  1. Sigmoid Function
  2. ReLU
  3. Tanh
  4. Softmax
  5. Softplus
  6. Softsign

How to choose one over others?

  1. Zero-centricity - Fast convergence
  2. Computation cost - Simple gradient
  3. Gradient Anomalies - Vanishing Gradient, Exploding Gradient

Pasted image 20231103174830.png

Pasted image 20231108004119.png

References