📚 Learning Guide
Transformer Architecture
hard

Attention:Encoder :: Decoder:?

Master this concept with our detailed explanation and step-by-step learning approach

Learning Path
Learning Path

Question & Answer
1
Understand Question
2
Review Options
3
Learn Explanation
4
Explore Topic

Choose the Best Answer

A

Attention

B

Context

C

Output

D

Input

Understanding the Answer

Let's break down why this is correct

Answer

In a transformer encoder the only attention used is self‑attention, where each token attends to every other token in the input. The decoder also uses self‑attention, but it is masked so that each position can only look at earlier positions. In addition, the decoder has a second attention layer that lets each decoder token attend to all encoder outputs; this is the encoder‑decoder (or cross‑attention) layer. Thus the decoder’s attention mechanism is a combination of masked self‑attention and encoder‑decoder attention, allowing it to incorporate both past generated tokens and the encoded source sequence. For example, when generating the word “cat” the decoder first attends to the previously generated word “the” and then attends to the encoder’s representation of the input sentence to decide that “cat” is the correct next word.

Detailed Explanation

The decoder receives the context produced by the encoder. Other options are incorrect because Attention is a method, not the decoder’s purpose; Context is part of what the decoder uses, but it is not the decoder’s creation.

Key Concepts

Transformer Architecture
Attention Mechanism
Sequence Generation
Topic

Transformer Architecture

Difficulty

hard level question

Cognitive Level

understand

Ready to Master More Topics?

Join thousands of students using Seekh's interactive learning platform to excel in their studies with personalized practice and detailed explanations.