Learning Path
Question & Answer1
Understand Question2
Review Options3
Learn Explanation4
Explore TopicChoose the Best Answer
A
Attention
B
Context
C
Output
D
Input
Understanding the Answer
Let's break down why this is correct
Answer
In a transformer encoder the only attention used is self‑attention, where each token attends to every other token in the input. The decoder also uses self‑attention, but it is masked so that each position can only look at earlier positions. In addition, the decoder has a second attention layer that lets each decoder token attend to all encoder outputs; this is the encoder‑decoder (or cross‑attention) layer. Thus the decoder’s attention mechanism is a combination of masked self‑attention and encoder‑decoder attention, allowing it to incorporate both past generated tokens and the encoded source sequence. For example, when generating the word “cat” the decoder first attends to the previously generated word “the” and then attends to the encoder’s representation of the input sentence to decide that “cat” is the correct next word.
Detailed Explanation
The decoder receives the context produced by the encoder. Other options are incorrect because Attention is a method, not the decoder’s purpose; Context is part of what the decoder uses, but it is not the decoder’s creation.
Key Concepts
Transformer Architecture
Attention Mechanism
Sequence Generation
Topic
Transformer Architecture
Difficulty
hard level question
Cognitive Level
understand
Practice Similar Questions
Test your understanding with related questions
Ready to Master More Topics?
Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.