Sigmoid Activation Function
Math Equation
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$
Python Code
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Example usage:
x = np.array([-2, -1, 0, 1, 2])
sigmoid_values = sigmoid(x)
print(sigmoid_values)
Notes
- Commonly used in binary classification tasks as the output values are constrained to the range (0, 1), making it interpretable as probabilities.
- Smooth gradients, which helps in smooth optimization and prevents sharp jumps in weights during backpropagation.
- However, the Sigmoid function suffers from the vanishing gradient problem for large positive or negative values of $(x)$. This occurs because the gradients approach zero, making it difficult for the model to learn.
- Not zero-centered, which may cause slower convergence in training due to inconsistent weight updates. When weights are initialized symmetrically, the non-zero-centered output can lead to problematic optimization.
- Best used in the output layer for binary classification models, where the interpretation of the output as a probability is necessary.
Hyperbolic Tangent (Tanh) Activation Function
Math Equation
$$
\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$
Python Code
import numpy as np
def tanh(x):
return np.tanh(x)
# Example usage:
x = np.array([-2, -1, 0, 1, 2])
tanh_values = tanh(x)
print(tanh_values)
Notes
- The Tanh function outputs values in the range (-1, 1), making it zero-centered. This characteristic helps in faster convergence compared to Sigmoid, as weight updates tend to be more balanced.