Skip to content

Memory Module

File: src/selectools/memory.py Classes: ConversationMemory

Table of Contents

  1. Overview
  2. Memory Management
  3. Integration with Agent
  4. Implementation
  5. Summarize-on-Trim
  6. Best Practices
  7. Related Memory Modules

Overview

The ConversationMemory class maintains dialogue history across multiple agent interactions, implementing a sliding window that keeps the most recent messages when limits are exceeded.

Purpose

  • Multi-Turn Conversations: Enable context retention across calls
  • Memory Management: Prevent token limit explosions
  • History Access: Retrieve conversation state for debugging/logging

Memory Management

Configuration

memory = ConversationMemory(
    max_messages=20,    # Keep last 20 messages
    max_tokens=4000     # Optional token-based limit
)

Sliding Window

Initial: []

Add: USER("Hello")
└─→ [USER("Hello")]

Add: ASSISTANT("Hi!")
└─→ [USER("Hello"), ASSISTANT("Hi!")]

Add: USER("What's 2+2?")
└─→ [USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]

... continues until limit ...

At limit (max_messages=3):
[USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]

Add: ASSISTANT("4")
└─→ Remove oldest: USER("Hello")
└─→ [ASSISTANT("Hi!"), USER("What's 2+2?"), ASSISTANT("4")]

Tool-Pair-Aware Trimming

After the sliding window trim, the memory scans forward to find the first safe boundary. This prevents orphaning a TOOL result without its preceding ASSISTANT tool-use message, which would violate provider API contracts.

def _fix_tool_pair_boundary(self) -> None:
    while len(self._messages) > 1:
        first = self._messages[0]
        if first.role == Role.TOOL:
            self._messages.pop(0)
            continue
        if first.role == Role.ASSISTANT and first.tool_calls:
            self._messages.pop(0)
            continue
        break

Before fix: Trim might leave [TOOL("result..."), USER("next question")] — invalid.

After fix: Advances past orphaned messages to [USER("next question")] — valid.

Observer Notifications

When an AgentObserver is registered, the agent fires on_memory_trim whenever trimming occurs — both for messages added during the run (via _memory_add) and for the initial user messages added at the start of run()/arun()/astream() (via _memory_add_many):

from selectools import AgentObserver

class MemoryWatcher(AgentObserver):
    def on_memory_trim(self, run_id, messages_removed, messages_remaining, reason):
        print(f"[{run_id}] Trimmed {messages_removed} messages, {messages_remaining} remaining")

The reason parameter is "enforce_limits" for sliding window / max-tokens trimming.

Implementation

def _enforce_limits(self) -> None:
    # 1. Enforce message count limit
    if len(self._messages) > self.max_messages:
        excess = len(self._messages) - self.max_messages
        self._messages = self._messages[excess:]

    # 2. Enforce token count limit (if configured)
    if self.max_tokens is not None:
        while len(self._messages) > 1:  # Keep at least 1
            total_tokens = sum(
                estimate_tokens(msg.content)
                for msg in self._messages
            )

            if total_tokens <= self.max_tokens:
                break

            # Remove oldest message
            self._messages.pop(0)

    # 3. Fix tool-pair boundary
    self._fix_tool_pair_boundary()

Integration with Agent

With Memory

from selectools import Agent, ConversationMemory, Message, Role

memory = ConversationMemory(max_messages=20)
agent = Agent(tools=[...], provider=provider, memory=memory)

# Turn 1
response1 = agent.run([
    Message(role=Role.USER, content="My name is Alice")
])

# Turn 2 - Context preserved
response2 = agent.run([
    Message(role=Role.USER, content="What's my name?")
])
# Agent knows: "Alice"

Flow

run() called
    ├─→ memory.get_history()
    │   └─→ Returns previous messages
    ├─→ Append new user messages
    ├─→ memory.add_many(new_messages)
    ├─→ Execute agent loop
    │   ├─→ LLM sees full history
    │   ├─→ Tool calls append to history
    │   └─→ memory.add() for each message
    ├─→ memory.add(final_response)
    └─→ Return response

Without Memory

agent = Agent(tools=[...], provider=provider)  # No memory

# Each call is independent
response1 = agent.run([Message(role=Role.USER, content="My name is Alice")])
response2 = agent.run([Message(role=Role.USER, content="What's my name?")])
# Agent doesn't know - no memory

Implementation

Class Structure

class ConversationMemory:
    def __init__(self, max_messages: int = 20, max_tokens: Optional[int] = None):
        if max_messages < 1:
            raise ValueError("max_messages must be at least 1")
        if max_tokens is not None and max_tokens < 1:
            raise ValueError("max_tokens must be at least 1")

        self.max_messages = max_messages
        self.max_tokens = max_tokens
        self._messages: List[Message] = []

Core Methods

def add(self, message: Message) -> None:
    """Add a single message to history."""
    self._messages.append(message)
    self._enforce_limits()

def add_many(self, messages: List[Message]) -> None:
    """Add multiple messages at once."""
    self._messages.extend(messages)
    self._enforce_limits()

def get_history(self) -> List[Message]:
    """Get full conversation history."""
    return list(self._messages)

def get_recent(self, n: int) -> List[Message]:
    """Get last N messages."""
    if n < 1:
        raise ValueError("n must be at least 1")
    return self._messages[-n:] if len(self._messages) >= n else list(self._messages)

def clear(self) -> None:
    """Clear all messages."""
    self._messages.clear()

Serialization

def to_dict(self) -> Dict[str, Any]:
    """Serialize memory for logging/persistence."""
    return {
        "max_messages": self.max_messages,
        "max_tokens": self.max_tokens,
        "message_count": len(self._messages),
        "messages": [msg.to_dict() for msg in self._messages],
        "summary": self._summary,
    }

Deserialization with from_dict()

Reconstruct a ConversationMemory from a dictionary produced by to_dict(). The restored instance preserves the exact persisted state — _enforce_limits() is not re-run, so no messages are silently dropped during reconstruction. The tool-pair boundary is fixed to ensure a valid starting message.

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "ConversationMemory":
    """Reconstruct a ConversationMemory from a to_dict() output."""
    ...

Usage:

import json

# Save
with open("conversation.json", "w") as f:
    json.dump(memory.to_dict(), f)

# Restore
with open("conversation.json", "r") as f:
    data = json.load(f)
    memory = ConversationMemory.from_dict(data)

# Summary is preserved
print(memory.summary)  # Restored if present

Key behaviors:

  • Config fields (max_messages, max_tokens) are restored from the dict
  • Messages are reconstructed via Message.from_dict()
  • The summary field (from summarize-on-trim) is preserved
  • _fix_tool_pair_boundary() runs to ensure valid conversation start
  • _last_trimmed is reset to empty (trim history is not persisted)

Summarize-on-Trim

When messages are trimmed by the sliding window, important early context is normally lost. Summarize-on-trim generates a summary of the dropped messages and preserves it as system context.

Configuration

Summarize-on-trim is configured via AgentConfig, not on ConversationMemory directly:

from selectools import Agent, AgentConfig, ConversationMemory

memory = ConversationMemory(max_messages=30)
agent = Agent(
    tools=[...], provider=provider, memory=memory,
    config=AgentConfig(
        summarize_on_trim=True,
        summarize_provider=provider,       # Provider for summarization
        summarize_model="gpt-4o-mini",     # Use a cheap/fast model
        summarize_max_tokens=150,          # Max tokens for the summary
    ),
)

How It Works

Messages exceed max_messages
    ├─→ _enforce_limits() trims oldest messages
    ├─→ Trimmed messages stored in _last_trimmed
    ├─→ Agent detects _last_trimmed is non-empty
    ├─→ Sends trimmed messages to summarize_provider
    ├─→ Provider returns 2-3 sentence summary
    ├─→ Summary stored in memory.summary
    ├─→ on_memory_summarize observer event fired
    └─→ On next turn, summary injected as [Conversation Summary] in system prompt

Key Properties

  • memory.summary: Read the current summary (or None if no trimming has occurred)
  • memory._last_trimmed: Messages removed during the most recent _enforce_limits() call

Summary Persistence

When using to_dict() / from_dict(), the summary is included:

data = memory.to_dict()
# data["summary"] contains the current summary string (or None)

restored = ConversationMemory.from_dict(data)
print(restored.summary)  # Summary is preserved

Best Practices

1. Choose Appropriate Limits

# Short interactions (Q&A bot)
memory = ConversationMemory(max_messages=10)

# Standard conversations
memory = ConversationMemory(max_messages=20)

# Long-form dialogues
memory = ConversationMemory(max_messages=50)

2. Use Token Limits for Cost Control

# Limit by tokens to prevent large prompts
memory = ConversationMemory(
    max_messages=100,     # High message count
    max_tokens=4000       # But limit tokens
)

3. Clear Memory Between Sessions

# Start fresh conversation
memory.clear()

4. Access Recent Context

# Get last 5 messages for display
recent = memory.get_recent(5)
for msg in recent:
    print(f"{msg.role}: {msg.content}")

5. Serialize and Restore

import json

# Save conversation
with open("conversation.json", "w") as f:
    json.dump(memory.to_dict(), f)

# Restore conversation (preserves summary and all messages)
with open("conversation.json", "r") as f:
    data = json.load(f)
    memory = ConversationMemory.from_dict(data)

Testing

def test_memory_sliding_window():
    memory = ConversationMemory(max_messages=3)

    # Add 5 messages
    for i in range(5):
        memory.add(Message(role=Role.USER, content=f"Message {i}"))

    # Should only keep last 3
    history = memory.get_history()
    assert len(history) == 3
    assert history[0].content == "Message 2"
    assert history[2].content == "Message 4"

def test_memory_with_agent():
    memory = ConversationMemory(max_messages=10)
    agent = Agent(tools=[...], provider=LocalProvider(), memory=memory)

    # First turn
    agent.run([Message(role=Role.USER, content="Hello")])
    assert len(memory.get_history()) > 0

    # Second turn
    agent.run([Message(role=Role.USER, content="Goodbye")])
    assert len(memory.get_history()) > 1

Common Pitfalls

1. Forgetting to Share Memory

# ❌ Bad - Each agent has separate memory
agent1 = Agent(..., memory=ConversationMemory())
agent2 = Agent(..., memory=ConversationMemory())

# ✅ Good - Shared memory
memory = ConversationMemory()
agent1 = Agent(..., memory=memory)
agent2 = Agent(..., memory=memory)

2. Not Clearing Between Users

# ❌ Bad - User A sees User B's history
def handle_user_a():
    agent.run([...])

def handle_user_b():
    agent.run([...])  # Sees User A's messages!

# ✅ Good - Clear between users
def handle_user(user_id):
    if user_id != previous_user:
        memory.clear()
    agent.run([...])

3. Setting Limits Too Low

# ❌ Bad - Forgets context quickly
memory = ConversationMemory(max_messages=2)

# ✅ Good - Reasonable context
memory = ConversationMemory(max_messages=20)

The following memory features were shipped in v0.16.0 and integrate with ConversationMemory via AgentConfig:

  • Sessions — Persistent session storage with JSON file, SQLite, and Redis backends
  • Entity Memory — LLM-based named entity extraction and context injection
  • Knowledge Graph — Relationship triple extraction with in-memory and SQLite storage
  • Knowledge Memory — Cross-session durable memory with daily logs and remember tool

Future Enhancements

Potential improvements (see Roadmap):

  1. Semantic Pruning: Remove similar/redundant messages to maximize useful context

Further Reading


Next Steps: Learn about usage tracking in the Usage Module.