Which factor in backpropagation is significantly affected by the choice of activation functions, leading to issues like vanishing or exploding gradients?

Learning Path

Question & Answer

Understand Question

Review Options

Learn Explanation

Explore Topic

Choose the Best Answer

Learning rate

Weight initialization

Activation function derivatives

Batch size

Understanding the Answer

Let's break down why this is correct

Answer

The gradient of the loss with respect to each weight depends on the derivative of the activation function. If the activation’s derivative is very small (e. g. , sigmoid or tanh squashing), successive layers multiply these tiny numbers, causing the gradient to shrink toward zero—this is vanishing gradients. Conversely, if the derivative is large (e.

Detailed Explanation

The derivative of an activation function tells how much the output changes when the input changes. Other options are incorrect because Learning rate controls how big a step we take when updating weights, not how the gradient itself behaves; Weight initialization sets the starting values of weights.

Key Concepts

backpropagation

activation functions

Topic

Vanishing/Exploding Gradients Problem

Difficulty

medium level question

Cognitive Level

understand

Practice Similar Questions

Test your understanding with related questions

Question 1

In the context of training deep neural networks, which of the following scenarios best illustrates the impact of the vanishing/exploding gradients problem on backpropagation, training stability, and the risk of overfitting?

hardComputer-science

Practice

View All Vanishingexploding-gradients-problem Questions

Ready to Master More Topics?

Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.

Start Learning with Seekh Explore More Vanishing/Exploding Gradients Problem Questions