Code hygiene · NorthGradient

Agent codebases rot fast. Prompts hidden in functions go unreviewed. Unstated type assumptions break deep in a run. One flat file with four agents in it becomes unsafe to change. None of that is inevitable. These five habits are the difference between a codebase you can still work in six months from now and one you have to rewrite.

A prompt buried in code is logic no one reviews. An unannotated node is a contract no tool can check. Without smoke tests, any change might break anything.

Type mismatches show up at runtime

Without type hints, the first sign a function got the wrong type is a failure deep in a run, far from the real mistake.

Type hints let mypy and pyright catch these before the code runs. Annotate state, nodes, and tool inputs:

from typing import TypedDict, Annotated, Optional
import operator

class AgentState(TypedDict):
    task:    str
    results: Annotated[list[str], operator.add]   # list field with a reducer
    done:    bool
    error:   Optional[str]                         # None when there is no error

Add mypy to your pre-commit hook and run it on every file you touch. A type error caught now takes minutes. The same error caught in production takes hours.

Prompts in code cannot be reviewed or rolled back

A prompt is logic: it decides what the agent does. A prompt buried in a function cannot be diffed, reviewed on its own, or rolled back without a code deploy.

Keep each prompt in its own versioned file and load it by name:

from pathlib import Path
import json

# prompts/prompts.json maps a name to a file:
#   {"search": "v1/search.txt", "summarize": "v1/summarize.txt"}

def load_prompt(name: str, **values) -> str:
    index    = json.loads(Path("prompts/prompts.json").read_text())
    template = Path("prompts", index[name]).read_text()
    return template.format(**values)        # fills {task}, {context}, etc.

To change a prompt, add v2/search.txt and point the index at it. Rolling back is a one-line edit, and git diff shows exactly what changed.

The same tool call repeats across agents

In a multi-agent system, several agents may run the same search. Without caching, each one pays for its own API call. With caching, the first call stores the result and the rest return it instantly:

import time
from functools import wraps

_cache = {}   # use Redis in production

def cached_tool(ttl: int = 300):
    """Cache a read-only tool's result for ttl seconds. Never use on writes."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args):
            key = (func.__name__, args)
            if key in _cache:
                saved_at, value = _cache[key]
                if time.time() - saved_at < ttl:
                    return value            # cache hit
            value = func(*args)             # cache miss
            _cache[key] = (time.time(), value)
            return value
        return wrapper
    return decorator

@cached_tool(ttl=600)
def web_search(query: str) -> list:
    return real_search_api(query)

Only cache tools that read. Never cache one that sends email or writes data. Track your hit rate: above 30% is real savings, below 5% means the cache is not helping.

Several agents in one file get unsafe to change

When agents share a file, a change to one can break another, ownership blurs, and testing one means loading all of them.

Give each agent its own file with its graph, state, nodes, and tools. Shared helpers go in a shared/ folder:

# researcher.py : everything for the researcher agent
from typing import TypedDict
from langgraph.graph import StateGraph, END
from agents.shared.tools import web_search

class ResearcherState(TypedDict):
    query:   str
    summary: str

def fetch(state): ...
def summarize(state): ...

graph = StateGraph(ResearcherState)
graph.add_node("fetch", fetch)
graph.add_node("summarize", summarize)
graph.set_entry_point("fetch")
graph.add_edge("fetch", "summarize")
graph.add_edge("summarize", END)

researcher_agent = graph.compile()   # the only thing other files import

Other files import researcher_agent and nothing else. Each file is a small unit with one public interface.

Untested behavior breaks silently

Agents break in surprising ways when prompts, dependencies, or the graph change. A smoke test catches that in seconds instead of hours into a run.

A smoke test does not check exact wording. It checks the agent runs and returns the right shape:

from agents.researcher import researcher_agent

def test_researcher_basic():
    result = researcher_agent.invoke({"query": "What is Python?", "summary": ""})
    assert isinstance(result["summary"], str)
    assert len(result["summary"]) > 20      # not empty

def test_researcher_empty_query():
    # regression: this used to crash on empty input
    result = researcher_agent.invoke({"query": "", "summary": ""})
    assert result is not None               # returns, does not raise

One smoke test per agent is the floor. Aim for three: the happy path, one odd input, and one for the last bug you fixed. Write that last one the moment you fix the bug, before you close the issue.

These habits add no features. They keep the features you built understandable, checkable, and recoverable when they fail.