Formatting your examples · NorthGradient

Once you decide to use few-shot prompting, the next question is how to write the examples, where most of the craft lives. The number of examples matters less than people expect; their quality and structure matter more.

A few well-chosen, consistently formatted examples outperform many poorly chosen ones. The model infers the pattern from what you give it, including the parts you did not intend to signal.

Consistency in format

The model treats every aspect of your examples as potential signal. If the formatting varies, it infers that variation is acceptable, and inconsistent punctuation may appear in the output. If some examples include a label prefix (“Sentiment:”) and others do not, the model may not apply the label reliably.

The fix: every example follows exactly the same template.

Review: "Fantastic sound quality, worth every penny."
Sentiment: positive

Review: "Stopped working after two weeks. Very disappointed."
Sentiment: negative

Review: "Does the job, nothing special."
Sentiment: neutral

The structure is identical across all three: Review: label, quoted text, Sentiment: label, single-word answer. The model extracts the pattern cleanly because nothing varies to confuse it.

Choosing representative examples

Examples should represent the typical case, not the unusual one. If you classify customer reviews and ninety percent are clearly positive or negative, your examples should reflect that. Leading with an edge case teaches the model that edge cases are the norm.

A useful check: if someone who had never seen your task read only your examples, would they know what a typical correct output looks like? If not, they are not representative enough.

Covering the output space

When the task has multiple valid output types, your examples should demonstrate each one. For sentiment classification, showing only positive and negative leaves the model no signal for a neutral input. It will either force one of the two demonstrated categories or produce output that does not match your format.

Review: "Fantastic sound quality, worth every penny."
Sentiment: positive

Review: "Stopped working after two weeks. Very disappointed."
Sentiment: negative

Review: "Does the job, nothing special."
Sentiment: neutral

Three examples, three output types. One of each is enough for a well-defined task. The goal is not exhaustiveness but coverage of the output space.

Ordering examples

Example order has a small but real effect, for the same reason slot position affects instruction following: recency matters. The model weights the last example slightly more than the earlier ones, so placing your most representative example last is a minor advantage.

More importantly, avoid placing your most unusual or edge-case example last. It carries the most weight, and the model may over-apply it to inputs where it does not belong.

Separating examples clearly

When examples run together, the model can misread where one ends and the next begins. A blank line is usually enough. For tasks where inputs or outputs span multiple lines, a consistent delimiter makes the boundary unambiguous.

---
Input: "The product arrived damaged and customer service was unhelpful."
Output: negative
---
Input: "Good value for money, arrived on time."
Output: positive
---
Input: "The product arrived damaged and customer service was unhelpful."
Output:

The delimiter makes the structure explicit, so the model need not infer the boundary.

A diagram showing two sets of examples side by side: one with inconsistent formatting producing variable output, and one with consistent formatting producing uniform output.

A checklist before you finalize your examples

Before including examples, check four things:

Are all examples formatted identically, including punctuation and label names?
Do the examples cover the full range of output types the model will encounter?
Are the examples representative of the typical case, not edge cases?
Are the examples clearly separated so the model can read their boundaries cleanly?

If all four are yes, the examples will do their job. If any is no, the model fills the gap with its own judgment, and the output will be less consistent.

In the next lesson, we’ll move to Chapter 5: what to do when a single prompt, however well structured, is not enough for a complex task.