Primacy and recency · NorthGradient

A prompt is not processed with uniform attention. The model gives more weight to certain positions than others. Put a critical instruction in the middle of a long prompt, surrounded by context, and the model is more likely to underweight it than if it appeared at the beginning or end. This is not a bug to work around but a predictable property you can use deliberately.

Information at the start and end of a prompt receives stronger attention than information in the middle.

The primacy effect

Primacy is the elevated weight the model gives to content at the beginning. The role slot and core instruction both benefit. When the model reads the first few lines, it builds an initial framing of the task that influences how it interprets everything after.

The consequence is direct: if you want the model to behave as a domain expert, the role must appear before any content it will process. A role buried after three paragraphs of context arrives too late. The model has already begun interpreting that context through a neutral lens, and the late role has to fight that framing rather than set it.

The recency effect

Recency is the elevated weight the model gives to content at the end, immediately before it starts generating. Output format instructions benefit most. A format constraint at the very end is harder to ignore than the same constraint in the middle.

[Role]
[Instruction]
[Context, several paragraphs]
[Examples]
Return your answer as a JSON object with keys "bug", "impact", and "fix".

That final line carries disproportionate weight precisely because it is last. If strict format compliance matters, put the format instruction there.

The lost-in-the-middle problem

Research on long-context models documents a consistent pattern: when relevant information appears in the middle of a long context, models retrieve it less reliably than when it appears at the start or end. The effect has a name: lost in the middle.

Diagram showing retrieval accuracy across prompt positions: high at the start, high at the end, and lowest in the middle.

The implication is straightforward. Do not bury critical instructions between two large blocks of context. If you have a long document to analyse and a short but important instruction, do not place the instruction in the middle. Put it before, restate it after, or both.

Three rules that follow from this

From primacy, recency, and lost-in-the-middle, the right approach is clear:

State the core instruction early, before the context it applies to.
Place output format constraints at the end of the prompt, as close to the generation point as possible.
If a critical instruction must appear in the middle of a long prompt, restate it at the end.

The third rule feels redundant until you need it. A short restatement costs a few tokens; skipping it and getting the wrong output costs a full re-run. In long-context situations, restating is almost always worth it.

Why the slot order makes sense now

The slot order from the previous lesson (role, instruction, context, examples, output format) is not arbitrary. Role and instruction go first to benefit from primacy. Output format goes last to benefit from recency. Context and examples sit in the middle because they are large and their exact position matters less than that of the instructions around them.

In the next lesson, we’ll move from how a prompt is structured to how the instructions inside it should be written.