Context window is the maximum token sequence length a Transformer can process in one forward pass, set at training time.

type: glossary title: "Context Window" tags: [inference, sequence-length] created: 2023-01-27

Context Window

Definition: The maximum number of tokens a Transformer model can process in a single forward pass, determined at training time and acting as a hard upper bound on sequence length during standard inference.

Used in: GPT-2, Transformer-XL, Sparse Attention, ALiBi, Rotary Position Embedding, Adaptive Attention Span

Do not confuse with: Attention span — the number of tokens a particular attention head actually attends to, which may be less than the full context window (as in Sparse Attention or Adaptive Attention Span); effective context — which, in architectures like Transformer-XL, can exceed the context window through recurrence.