Learning Path
Question & Answer1
Understand Question2
Review Options3
Learn Explanation4
Explore TopicChoose the Best Answer
A
Attention
B
Context
C
Output
D
Input
Understanding the Answer
Let's break down why this is correct
Answer
In a transformer, the encoder uses self‑attention to let each token look at every other token in the input. The decoder also uses self‑attention, but it is masked so that a token can only attend to previous tokens, preventing it from peeking ahead. In addition, the decoder has a second attention layer that lets each token attend to the encoder’s output; this is the encoder‑decoder (cross) attention. Thus the decoder’s attention mechanisms are masked self‑attention plus cross‑attention, mirroring the encoder’s self‑attention but adapted for generation.
Detailed Explanation
The decoder’s main job is to create the next word in the output sequence. Other options are incorrect because The attention part is a tool inside the decoder, not the thing the decoder produces; Context is what the decoder receives from the encoder, not what it creates.
Key Concepts
Transformer Architecture
Attention Mechanism
Sequence Generation
Topic
Transformer Architecture
Difficulty
hard level question
Cognitive Level
understand
Practice Similar Questions
Test your understanding with related questions
Ready to Master More Topics?
Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.