Activation Functions

Sigmoid Activation Function

Math Equation

$$ \sigma(x) = \frac{1}{1 + e^{-x}}

Python Code

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Example usage:
x = np.array([-2, -1, 0, 1, 2])
sigmoid_values = sigmoid(x)
print(sigmoid_values)

Notes

Commonly used in binary classification tasks as the output values are constrained to the range (0, 1), making it interpretable as probabilities.
Smooth gradients, which helps in smooth optimization and prevents sharp jumps in weights during backpropagation.
However, the Sigmoid function suffers from the vanishing gradient problem for large positive or negative values of $(x)$. This occurs because the gradients approach zero, making it difficult for the model to learn.
Not zero-centered, which may cause slower convergence in training due to inconsistent weight updates. When weights are initialized symmetrically, the non-zero-centered output can lead to problematic optimization.
Best used in the output layer for binary classification models, where the interpretation of the output as a probability is necessary.

Hyperbolic Tangent (Tanh) Activation Function

Math Equation

$$ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}

Python Code

import numpy as np

def tanh(x):
    return np.tanh(x)

# Example usage:
x = np.array([-2, -1, 0, 1, 2])
tanh_values = tanh(x)
print(tanh_values)

Notes

The Tanh function outputs values in the range (-1, 1), making it zero-centered. This characteristic helps in faster convergence compared to Sigmoid, as weight updates tend to be more balanced.