Attention:Encoder :: Decoder:?

Question

Seekh · Accepted Answer

In a transformer encoder the only attention used is self‑attention, where each token attends to every other token in the input. The decoder also uses self‑attention, but it is masked so that each position can only look at earlier positions. In addition, the decoder has a second attention layer that lets each decoder token attend to all encoder outputs; this is the encoder‑decoder (or cross‑attention) layer. Thus the decoder’s attention mechanism is a combination of masked self‑attention and encoder‑decoder attention, allowing it to incorporate both past generated tokens and the encoded source sequence. For example, when generating the word “cat” the decoder first attends to the previously generated word “the” and then attends to the encoder’s representation of the input sentence to decide that “cat” is the correct next word.

Attention:Encoder :: Decoder:?

Learning Path

Choose the Best Answer

Understanding the Answer

Answer

Detailed Explanation

Key Concepts

Practice Similar Questions

Attention:Encoder :: Decoder:?

Ready to Master More Topics?