📚 Learning Guide
Vanishing/Exploding Gradients Problem
medium

What is a primary cause of the vanishing gradients problem in deep neural networks?

Master this concept with our detailed explanation and step-by-step learning approach

Learning Path
Learning Path

Question & Answer
1
Understand Question
2
Review Options
3
Learn Explanation
4
Explore Topic

Choose the Best Answer

A

Weight initialization that leads to small gradients

B

Overfitting due to excessive training

C

Using too many hidden layers without activation functions

D

Insufficient data for training

Understanding the Answer

Let's break down why this is correct

Answer

The main reason gradients disappear in deep networks is that the chain rule multiplies many small numbers when back‑propagating through many layers, so each weight update becomes tiny. In practice, activation functions like sigmoid or tanh squish large inputs into a narrow range, producing derivatives less than one; multiplying these derivatives repeatedly makes the gradient shrink toward zero. Because each layer’s gradient is a product of all previous derivatives, a deep network can see gradients that are effectively zero at the earlier layers, preventing learning. For example, if every layer’s derivative is 0. 5, after ten layers the gradient is \(0.

Detailed Explanation

When weights are set too small at the start, each layer multiplies the gradient by a number less than one. Other options are incorrect because Overfitting happens when a model learns training data too well, but it does not stop gradients from flowing; Hidden layers alone do not cause vanishing gradients.

Key Concepts

Vanishing Gradients Problem
Deep Neural Networks
Gradient Descent Optimization
Topic

Vanishing/Exploding Gradients Problem

Difficulty

medium level question

Cognitive Level

understand

Ready to Master More Topics?

Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.