Remembering context · NorthGradient

The agent built in previous lessons executes a task and immediately stops. If you ask it for the weather in Tokyo, it answers correctly. If your next prompt is “What about tomorrow?”, it fails. It has no memory of the word “Tokyo”.

The reason is not that the LLM is forgetful. It is that the API works the way it does, and our code is not taking advantage of how it works.

The OpenAI API has no built-in memory. Context is preserved by sending the full conversation history as a list of messages on every call.

How the API actually works

Every call to client.chat.completions.create is stateless. The API does not remember previous calls. It receives a list of messages, generates the next message, and stops. Nothing persists on the server between calls.

That list of messages is the entire context the LLM has access to. The API defines three roles for messages:

"user": something the human said
"assistant": something the LLM said
"tool": the result returned by a tool call

When you include all previous turns in that list, the LLM can refer back to anything that was said earlier. When you include only the current message, each call is isolated and the LLM cannot refer to anything.

Here is what the messages list looks like across two turns of a conversation:

# After the first exchange ("What is the weather in Tokyo?")
[
    {"role": "user",      "content": "What is the weather in Tokyo right now?"},
    {"role": "assistant", "content": "It is currently partly cloudy in Tokyo, 22°C."},
]

# After the second exchange ("What about Berlin?"). The first turn is still there
[
    {"role": "user",      "content": "What is the weather in Tokyo right now?"},
    {"role": "assistant", "content": "It is currently partly cloudy in Tokyo, 22°C."},
    {"role": "user",      "content": "What about Berlin?"},
    {"role": "assistant", "content": "Berlin is currently overcast, 14°C."},
]

The LLM sees the entire list on every call. That is how it knows what “the other city” refers to when you ask “which one is warmer?”

The problem with local state

The agent in lesson 5 created the messages list inside the function. Every call overwrote it:

def run_agent(task: str) -> str:
    # Created fresh on every call: previous turns are gone
    messages = [{"role": "user", "content": task}]
    ...

The LLM received only the current message. It had no access to previous turns, so it could not answer questions that referred to them.

Keeping the history alive between calls

The fix is to move the messages list outside the function so it accumulates across calls, and to append every turn to it before the next call is made:

from openai import OpenAI

client = OpenAI()

# Lives outside the function: persists across multiple calls
conversation_history = []

def run_agent(task: str) -> str:
    """Run one agent turn, keeping the full conversation history."""

    # Step 1: add the new user message to the running history
    conversation_history.append({"role": "user", "content": task})

    # Step 2: send the full history to the LLM
    response = client.chat.completions.create(
        model    = "gpt-4o-mini",
        messages = conversation_history,   # full history, not just this message
        tools    = TOOLS,
    )
    message = response.choices[0].message

    if not message.tool_calls:
        # No tool needed: store the reply and return it
        conversation_history.append({"role": "assistant", "content": message.content})
        return message.content

    # Tool call path: keep the assistant turn, then run every tool it asked
    # for. run_tool_call (from lesson 4) parses the arguments safely and
    # returns a string, so one bad tool call cannot crash the agent.
    conversation_history.append(message)  # assistant turn, with all its tool_calls
    for tool_call in message.tool_calls:
        conversation_history.append({
            "role":         "tool",
            "tool_call_id": tool_call.id,
            "content":      run_tool_call(tool_call),
        })

    # Step 3: send the updated history (now including the tool results) for the final reply
    final = client.chat.completions.create(
        model    = "gpt-4o-mini",
        messages = conversation_history,
    )
    reply = final.choices[0].message.content

    # Append the final reply so future turns can reference it
    conversation_history.append({"role": "assistant", "content": reply})
    return reply

The key change is that conversation_history is declared outside the function, so every message, user, assistant, and tool, is appended to it and survives between calls instead of being thrown away each time.

Now the conversation accumulates correctly:

print(run_agent("What is the weather in Tokyo right now?"))
# It is currently partly cloudy in Tokyo with a temperature of 22°C.

print(run_agent("What about Berlin?"))
# Berlin is currently overcast with a temperature of 14°C.

print(run_agent("Which of those two cities is warmer?"))
# Tokyo is warmer at 22°C compared to Berlin at 14°C.

The third question contains no city names. The LLM answers it correctly because the full history is in the context window and “those two cities” is unambiguous.

Resetting the conversation

A persistent history is useful within a session. At the end of a session it becomes noise: it inflates token usage and can cause the LLM to confuse context from a previous topic with the current one. Provide an explicit reset:

def reset_conversation():
    """Clear the conversation history to start a fresh session."""
    conversation_history.clear()

Call it at the start of each new user session. In a web application, each user would have their own separate conversation_history list rather than sharing one global variable.

What this still does not solve

A global list works for one user in one session. It disappears when the process restarts, cannot be shared across multiple servers, and grows without limit in a long-running process. These are the problems that structured agent frameworks address with persistent storage and state management, which you will encounter in the next course.

In the next lesson, we look at what this plain Python approach cannot handle at scale, and why that points toward a graph-based framework for more complex agents.