What is a primary cause of the vanishing gradients problem in deep neural networks?

Question

Seekh · Accepted Answer

The main reason gradients disappear in deep networks is that the chain rule multiplies many small numbers when back‑propagating through many layers, so each weight update becomes tiny. In practice, activation functions like sigmoid or tanh squish large inputs into a narrow range, producing derivatives less than one; multiplying these derivatives repeatedly makes the gradient shrink toward zero. Because each layer’s gradient is a product of all previous derivatives, a deep network can see gradients that are effectively zero at the earlier layers, preventing learning. For example, if every layer’s derivative is 0. 5, after ten layers the gradient is \(0.

What is a primary cause of the vanishing gradients problem in deep neural networks?

Learning Path

Choose the Best Answer

Understanding the Answer

Key Concepts

Deep Dive: Vanishing/Exploding Gradients Problem

Definition

Topic Definition

Ready to Master More Topics?