
Beyond the Thought Vector: The Evolution of Attention in Deep Learning
Sequence-to-sequence (seq2seq) models without attention compress an entire source sentence into a single fixed-length vector, then feed that into a decoder to produce the target sentence. While t...








