Gray Sky AI

Large Language Models (LLMs) are inherently stateless, meaning they don't have a built-in mechanism to remember or store information from one interaction to the next. Each request to the model is processed independently, without any knowledge of previous requests or responses. This stateless nature is a fundamental characteristic of LLMs, which can pose challenges when developing applications that require context or memory.

Example Conversation with Context

Let's consider an LLM that is an expert on the Roman Empire. We'll ask a series of questions about the fifth emperor, Nero. Below is an example of how the conversation might look in JSON format:

[
    {
        "role": "system",
        "content": "You are an expert on the Roman Empire."
    },
    {
        "role": "user",
        "content": "Who was the fifth emperor of the Roman Empire?"
    },
    {
        "role": "assistant",
        "content": "The fifth emperor of the Roman Empire was Nero."
    },
    {
        "role": "user",
        "content": "What was he known for?"
    },
    {
        "role": "assistant",
        "content": "Nero is known for his tyrannical rule and the Great Fire of Rome."
    },
    {
        "role": "user",
        "content": "Tell me more about the Great Fire?"
    },
    {
        "role": "assistant",
        "content": "The Great Fire of Rome in 64 CE destroyed much of the city over six days. While Nero provided relief efforts afterward, he was accused of starting the fire to clear land for his Domus Aurea palace, though he was actually in Antium when it began. He blamed Christians for the fire and persecuted them, making this one of the most controversial events of his reign."
    }
]

Stateless Response

If we send just the last message to the LLM, it will not have any context of the previous responses and will be unable to respond in any meaningful way.

[
    {
        "role": "user",
        "content": "Tell me more about the Great Fire?"
    },
    {
        "role": "assistant",
        "content": "I'm not sure which fire you are referring to. Could you provide more context?"
    }
]

Importance of Context

When interacting with an LLM, it's important to provide the entire conversation history to maintain context.

Diagram

Langchain on Managing Context Windows

To maintain context in a conversation with an LLM, developers typically need to send the relevant conversation history back to the model with each new request. This approach is confirmed by LangChain, a popular framework for working with LLMs:

In conversational applications, this is especially important because the context window determines how much information the model can 'remember' throughout a conversation. Developers often need to manage the input within the context window to maintain a coherent dialogue without exceeding the limit. [1]

However, simply sending the entire conversation history can quickly become inefficient due to token limits, increased costs, and higher latency. To address this, developers can employ various strategies:

Summarization: Recursively summarize the conversation when it exceeds a certain threshold of messages.
Buffering: Only pass the last N messages to the model.
Hybrid approach: Summarize the conversation up to the Nth message and pass in the last N messages.
Specific context retrieval: Store long conversations in a vector database and add the most relevant pieces based on recent messages. [2]

These strategies help balance the need for context with the practical limitations of LLM APIs, allowing developers to create more effective and efficient conversational AI applications.

Sources:

https://python.langchain.com/docs/concepts/chat_models
https://www.vellum.ai/blog/how-should-i-manage-memory-for-my-llm-chatbot

This article is licensed under CC BY-SA 4.0.