Which factor in backpropagation is significantly affected by the choice of activation functions, leading to issues like vanishing or exploding gradients?

Question

Seekh · Accepted Answer

The gradient of the loss with respect to each weight depends on the derivative of the activation function. If the activation’s derivative is very small (e. g. , sigmoid or tanh squashing), successive layers multiply these tiny numbers, causing the gradient to shrink toward zero—this is vanishing gradients. Conversely, if the derivative is large (e.

Which factor in backpropagation is significantly affected by the choice of activation functions, leading to issues like vanishing or exploding gradients?

Learning Path

Choose the Best Answer

Understanding the Answer

Key Concepts

Deep Dive: Vanishing/Exploding Gradients Problem

Definition

Topic Definition

Ready to Master More Topics?