Attention:Encoder :: Decoder:?

Question

Seekh · Accepted Answer

In a transformer, the encoder uses self‑attention to let each token look at every other token in the input. The decoder also uses self‑attention, but it is masked so that a token can only attend to previous tokens, preventing it from peeking ahead. In addition, the decoder has a second attention layer that lets each token attend to the encoder’s output; this is the encoder‑decoder (cross) attention. Thus the decoder’s attention mechanisms are masked self‑attention plus cross‑attention, mirroring the encoder’s self‑attention but adapted for generation.

Attention:Encoder :: Decoder:?

Learning Path

Choose the Best Answer

Understanding the Answer

Answer

Detailed Explanation

Key Concepts

Practice Similar Questions

Attention:Encoder :: Decoder:?

Ready to Master More Topics?