Learning Path
Question & Answer1
Understand Question2
Review Options3
Learn Explanation4
Explore TopicChoose the Best Answer
A
Learning rate
B
Weight initialization
C
Activation function derivatives
D
Batch size
Understanding the Answer
Let's break down why this is correct
Answer
The gradient of the loss with respect to each weight depends on the derivative of the activation function. If the activation’s derivative is very small (e. g. , sigmoid or tanh squashing), successive layers multiply these tiny numbers, causing the gradient to shrink toward zero—this is vanishing gradients. Conversely, if the derivative is large (e.
Detailed Explanation
The derivative of an activation function tells how much the output changes when the input changes. Other options are incorrect because Learning rate controls how big a step we take when updating weights, not how the gradient itself behaves; Weight initialization sets the starting values of weights.
Key Concepts
backpropagation
activation functions
Topic
Vanishing/Exploding Gradients Problem
Difficulty
medium level question
Cognitive Level
understand
Practice Similar Questions
Test your understanding with related questions
Ready to Master More Topics?
Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.