Learning Path
Question & Answer1
Understand Question2
Review Options3
Learn Explanation4
Explore TopicChoose the Best Answer
A
Weight initialization that leads to small gradients
B
Overfitting due to excessive training
C
Using too many hidden layers without activation functions
D
Insufficient data for training
Understanding the Answer
Let's break down why this is correct
Answer
The main reason gradients disappear in deep networks is that the chain rule multiplies many small numbers when back‑propagating through many layers, so each weight update becomes tiny. In practice, activation functions like sigmoid or tanh squish large inputs into a narrow range, producing derivatives less than one; multiplying these derivatives repeatedly makes the gradient shrink toward zero. Because each layer’s gradient is a product of all previous derivatives, a deep network can see gradients that are effectively zero at the earlier layers, preventing learning. For example, if every layer’s derivative is 0. 5, after ten layers the gradient is \(0.
Detailed Explanation
When weights are set too small at the start, each layer multiplies the gradient by a number less than one. Other options are incorrect because Overfitting happens when a model learns training data too well, but it does not stop gradients from flowing; Hidden layers alone do not cause vanishing gradients.
Key Concepts
Vanishing Gradients Problem
Deep Neural Networks
Gradient Descent Optimization
Topic
Vanishing/Exploding Gradients Problem
Difficulty
medium level question
Cognitive Level
understand
Practice Similar Questions
Test your understanding with related questions
1
Question 1In the context of training deep neural networks, which of the following scenarios best illustrates the impact of the vanishing/exploding gradients problem on backpropagation, training stability, and the risk of overfitting?
hardComputer-science
Practice
2
Question 2Which of the following scenarios best exemplifies the vanishing/exploding gradients problem in neural networks?
easyComputer-science
Practice
3
Question 3Which of the following techniques can help mitigate the vanishing or exploding gradients problem in deep neural networks? Select all that apply.
hardComputer-science
Practice
4
Question 4Why do deep neural networks suffer from the vanishing/exploding gradients problem during training?
easyComputer-science
Practice
5
Question 5What is the primary cause of the degradation problem in deep networks as they increase in depth?
easyComputer-science
Practice
Ready to Master More Topics?
Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.