Hello readers! In the last article, we looked briefly at the sigmoid activation function. In this article, we’ll be looking at the **Tanh Activation Function** in Python, in regards to Neural Networks.

Let’s get started!

## The Tanh Activation Function

We often use activation functions when we want to “turn on” specific layers depending on the input, in terms of a mathematical function.

Tanh is one such function, which is very popular in Machine Learning literature, since it is a continuous and differential function.

The tanh function is of the below form, across the Real Number space:

`f(x) = tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)`

This function can have values ranging from (-1, 1), making the output *normalized* with respect to the input. Due to the above properties, tanh is a very good choice for backpropagation.

To get a visual understanding, here is the graph of Tanh(x):

The graph is very similar to the sigmoid activation function (S-shaped), which is another popular choice.

Here, if you can observe from the graph, tanh can correlate inputs â†’ outputs very well. Strongly positive inputs are normalized and mapped closer to 1, while strongly negative inputs are mapped close to -1.

This makes it a very suitable choice for performing **binary classification**.

## A simple implementation of the Tanh Activation Function in Python

Let’s quickly go through a sample `tanh`

function in Python, using numpy and matplotlib.

```
import numpy as np
import matplotlib.pyplot as plt
def tanh(x):
return np.tanh(x) # We can use numpy's builtin tanh
def generate_sample_data(start, end, step):
# Generates sample data using np.linspace
return np.linspace(start, end, step)
x = generate_sample_data(-5, 5, 10)
y = tanh(x)
# Now plot
plt.xlabel("x")
plt.ylabel("tanh(x)")
plt.plot(x, y)
plt.show()
```

*Output*

As you can see, the curve does resemble the original graph closely, even for this small dataset!

## Limitations of tanh Activation Function

While the tanh has a lot of good properties for building classifier networks, one must always be careful when using it.

This is still a non linear activation function, which means that it can be prone to the *vanishing gradient problem,* when training on a large number of epochs.

The vanishing gradient problem is a situation where the derivatives become 0 (vanish) even for a large change in the input.

This becomes a problem when you’re dealing with a large number of layers on your Network, so one must always be careful about using these functions.

## Conclusion

In this article, we learned about understanding the tanh activation function in Machine Learning.

## References

- Wolfram Alpha Page on Tanh function
- JournalDev article on Sigmoid Activation Function