Why do deep neural networks suffer from the vanishing/exploding gradients problem during training?

Question

Seekh · Accepted Answer

During back‑propagation, the gradient of the loss with respect to each weight is computed by multiplying partial derivatives along each layer; if each derivative is slightly less than one, repeated multiplication shrinks the gradient toward zero, a phenomenon called vanishing gradients, while if each derivative is slightly greater than one the product explodes, blowing up the gradient. This happens especially in deep nets where many layers stack, and activation functions like sigmoid or tanh saturate so their derivatives are small, amplifying the shrinkage. For example, in a 10‑layer network each layer’s derivative might be 0. 5, so the gradient after 10 layers becomes 0. 5¹⁰ ≈ 0.

Why do deep neural networks suffer from the vanishing/exploding gradients problem during training?

Learning Path

Choose the Best Answer

Understanding the Answer

Key Concepts

Deep Dive: Vanishing/Exploding Gradients Problem

Definition

Topic Definition

Ready to Master More Topics?