📚 Learning Guide
Vanishing/Exploding Gradients Problem
medium

Which factor in backpropagation is significantly affected by the choice of activation functions, leading to issues like vanishing or exploding gradients?

Master this concept with our detailed explanation and step-by-step learning approach

Learning Path
Learning Path

Question & Answer
1
Understand Question
2
Review Options
3
Learn Explanation
4
Explore Topic

Choose the Best Answer

A

Learning rate

B

Weight initialization

C

Activation function derivatives

D

Batch size

Understanding the Answer

Let's break down why this is correct

Answer

The gradient of the loss with respect to each weight depends on the derivative of the activation function. If the activation’s derivative is very small (e. g. , sigmoid or tanh squashing), successive layers multiply these tiny numbers, causing the gradient to shrink toward zero—this is vanishing gradients. Conversely, if the derivative is large (e.

Detailed Explanation

The derivative of an activation function tells how much the output changes when the input changes. Other options are incorrect because Learning rate controls how big a step we take when updating weights, not how the gradient itself behaves; Weight initialization sets the starting values of weights.

Key Concepts

backpropagation
activation functions
Topic

Vanishing/Exploding Gradients Problem

Difficulty

medium level question

Cognitive Level

understand

Ready to Master More Topics?

Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.