Building Production Agents with LangGraph · Building Production Agents with LangGraph · 6 min read
Code hygiene
Agent codebases rot fast. Prompts hidden in functions go unreviewed. Unstated type assumptions break deep in a run. One flat file with four agents in it becomes unsafe to change. None of that is inevitable. These five habits are the difference between a codebase you can still work in six months from now and one you have to rewrite.
A prompt buried in code is logic no one reviews. An unannotated node is a contract no tool can check. Without smoke tests, any change might break anything.
Type mismatches show up at runtime
Without type hints, the first sign a function got the wrong type is a failure deep in a run, far from the real mistake.
Type hints let mypy and pyright catch these before the code runs. Annotate state, nodes, and tool inputs:
from typing import TypedDict, Annotated, Optional
import operator
class AgentState(TypedDict):
task: str
results: Annotated[list[str], operator.add] # list field with a reducer
done: bool
error: Optional[str] # None when there is no error
Add mypy to your pre-commit hook and run it on every file you touch. A type error caught now takes minutes. The same error caught in production takes hours.
Prompts in code cannot be reviewed or rolled back
A prompt is logic: it decides what the agent does. A prompt buried in a function cannot be diffed, reviewed on its own, or rolled back without a code deploy.
Keep each prompt in its own versioned file and load it by name:
from pathlib import Path
import json
# prompts/prompts.json maps a name to a file:
# {"search": "v1/search.txt", "summarize": "v1/summarize.txt"}
def load_prompt(name: str, **values) -> str:
index = json.loads(Path("prompts/prompts.json").read_text())
template = Path("prompts", index[name]).read_text()
return template.format(**values) # fills {task}, {context}, etc.
To change a prompt, add v2/search.txt and point the index at it. Rolling back is a one-line edit, and git diff shows exactly what changed.
The same tool call repeats across agents
In a multi-agent system, several agents may run the same search. Without caching, each one pays for its own API call. With caching, the first call stores the result and the rest return it instantly:
import time
from functools import wraps
_cache = {} # use Redis in production
def cached_tool(ttl: int = 300):
"""Cache a read-only tool's result for ttl seconds. Never use on writes."""
def decorator(func):
@wraps(func)
def wrapper(*args):
key = (func.__name__, args)
if key in _cache:
saved_at, value = _cache[key]
if time.time() - saved_at < ttl:
return value # cache hit
value = func(*args) # cache miss
_cache[key] = (time.time(), value)
return value
return wrapper
return decorator
@cached_tool(ttl=600)
def web_search(query: str) -> list:
return real_search_api(query)
Only cache tools that read. Never cache one that sends email or writes data. Track your hit rate: above 30% is real savings, below 5% means the cache is not helping.
Several agents in one file get unsafe to change
When agents share a file, a change to one can break another, ownership blurs, and testing one means loading all of them.
Give each agent its own file with its graph, state, nodes, and tools. Shared helpers go in a shared/ folder:
# researcher.py : everything for the researcher agent
from typing import TypedDict
from langgraph.graph import StateGraph, END
from agents.shared.tools import web_search
class ResearcherState(TypedDict):
query: str
summary: str
def fetch(state): ...
def summarize(state): ...
graph = StateGraph(ResearcherState)
graph.add_node("fetch", fetch)
graph.add_node("summarize", summarize)
graph.set_entry_point("fetch")
graph.add_edge("fetch", "summarize")
graph.add_edge("summarize", END)
researcher_agent = graph.compile() # the only thing other files import
Other files import researcher_agent and nothing else. Each file is a small unit with one public interface.
Untested behavior breaks silently
Agents break in surprising ways when prompts, dependencies, or the graph change. A smoke test catches that in seconds instead of hours into a run.
A smoke test does not check exact wording. It checks the agent runs and returns the right shape:
from agents.researcher import researcher_agent
def test_researcher_basic():
result = researcher_agent.invoke({"query": "What is Python?", "summary": ""})
assert isinstance(result["summary"], str)
assert len(result["summary"]) > 20 # not empty
def test_researcher_empty_query():
# regression: this used to crash on empty input
result = researcher_agent.invoke({"query": "", "summary": ""})
assert result is not None # returns, does not raise
One smoke test per agent is the floor. Aim for three: the happy path, one odd input, and one for the last bug you fixed. Write that last one the moment you fix the bug, before you close the issue.
These habits add no features. They keep the features you built understandable, checkable, and recoverable when they fail.