When things go wrong · NorthGradient

The agent from lesson 4 works when everything goes right. Production conditions are less cooperative. This lesson covers the three failures you will hit first, what they look like, and how to fix each one.

A crashing tool kills the run. A missing stop condition runs forever. A vague task description makes the LLM skip the tools.

Failure 1: a tool crash kills the entire run

The get_weather function calls an external API. External APIs time out, return unexpected data, and occasionally go down. Without error handling, any exception inside the tool propagates up and crashes the agent with no useful output:

# This will raise an exception if the API is down or the city name is invalid
def get_weather(city: str) -> str:
    url  = f"https://wttr.in/{city}?format=j1"
    data = requests.get(url, timeout=5).json()   # raises if timeout or bad JSON
    ...

Wrap every external call in a try/except and return a string that describes the failure. The LLM can read that string and tell the user something went wrong, rather than the agent crashing silently:

def get_weather(city: str) -> str:
    """Get the current temperature and weather conditions for a city."""
    try:
        url      = f"https://wttr.in/{city}?format=j1"
        response = requests.get(url, timeout=5)
        response.raise_for_status()          # raises on 4xx/5xx HTTP errors
        data     = response.json()
        temp_c   = data["current_condition"][0]["temp_C"]
        desc     = data["current_condition"][0]["weatherDesc"][0]["value"]
        return f"{city}: {desc}, {temp_c}°C"
    except requests.Timeout:
        return f"Error: weather service timed out for '{city}'"
    except requests.HTTPError as e:
        return f"Error: weather service returned {e.response.status_code} for '{city}'"
    except (KeyError, IndexError):
        return f"Error: unexpected data format from weather service for '{city}'"

Now when the API fails, the LLM receives a readable error string and can respond to the user accordingly instead of the whole agent crashing.

Failure 2: the loop runs forever

The agent in lesson 4 runs exactly one tool call and stops. In a more capable agent, the LLM might decide it needs to call multiple tools before it has enough information to answer. A naive loop handles this:

# Naive multi-step loop without stop condition
while True:
    response = client.chat.completions.create(...)
    if response.choices[0].message.tool_calls:
        # call the tool and continue
        ...
    else:
        return response.choices[0].message.content  # done

This works if the LLM always eventually decides to stop. It does not if the task is ambiguous, the tool keeps returning unhelpful results, or the LLM enters a pattern of calling the same tool repeatedly. Every iteration costs tokens and money.

Always set a hard limit on the number of iterations:

MAX_STEPS = 5  # hard ceiling: the agent stops after this many tool calls

def run_agent(task: str) -> str:
    messages = [{"role": "user", "content": task}]

    for step in range(MAX_STEPS):
        response = client.chat.completions.create(
            model    = "gpt-4o-mini",
            messages = messages,
            tools    = TOOLS,
        )
        message = response.choices[0].message

        if not message.tool_calls:
            return message.content  # LLM is done

        # Keep the assistant turn, then run every tool it asked for. run_tool_call
        # (from lesson 4) parses the arguments safely, so a bad call cannot crash
        # the loop, and iterating handles the LLM requesting several tools at once.
        messages.append(message)  # assistant turn, with all its tool_calls
        for tool_call in message.tool_calls:
            messages.append({
                "role":         "tool",
                "tool_call_id": tool_call.id,
                "content":      run_tool_call(tool_call),
            })

    # reached the step limit without a final answer
    return f"Agent reached the {MAX_STEPS}-step limit without completing the task."

Start with a low limit (5 is reasonable). Real data from your logs will show you whether tasks need more, and you can raise it with evidence.

Failure 3: the LLM ignores the tools

You give the LLM two tools and a task. It answers from its own knowledge instead of using the tools. This is not a bug in your code; it is a mismatch between the task phrasing and the tool descriptions.

Two things cause this. First, the task does not signal that current information is needed:

# LLM may answer from training data: "Tokyo is typically warm in summer..."
run_agent("What is the weather in Tokyo?")

# LLM is more likely to call get_weather: the word "right now" signals recency
run_agent("What is the weather in Tokyo right now?")

Second, the tool description does not match how the task is phrased. If your tool is described as “Get weather conditions for a city” but the user asks “Is it raining in Paris?”, the LLM may not connect them. Make descriptions broad enough to match natural phrasing:

# Too narrow: only matches exact phrasing
"description": "Get the current temperature for a city."

# Better: matches temperature, conditions, rain, forecast questions
"description": (
    "Get the current weather for a city, including temperature, "
    "conditions, and whether it is raining or sunny."
)

If the LLM still skips the tools, add a system message that explicitly instructs it to use them for current information:

messages = [
    {
        "role":    "system",
        "content": (
            "You have tools available to get current weather and news. "
            "Always use them when the user asks about current conditions or recent events."
        ),
    },
    {"role": "user", "content": task},
]

In the next lesson, we look at what this plain Python approach cannot handle well, and why that points toward a graph-based framework for more complex agents.