Memory Module¶
File: src/selectools/memory.py
Classes: ConversationMemory
Table of Contents¶
- Overview
- Memory Management
- Integration with Agent
- Implementation
- Summarize-on-Trim
- Best Practices
- Related Memory Modules
Overview¶
The ConversationMemory class maintains dialogue history across multiple agent interactions, implementing a sliding window that keeps the most recent messages when limits are exceeded.
Purpose¶
- Multi-Turn Conversations: Enable context retention across calls
- Memory Management: Prevent token limit explosions
- History Access: Retrieve conversation state for debugging/logging
Memory Management¶
Configuration¶
memory = ConversationMemory(
max_messages=20, # Keep last 20 messages
max_tokens=4000 # Optional token-based limit
)
Sliding Window¶
Initial: []
Add: USER("Hello")
└─→ [USER("Hello")]
Add: ASSISTANT("Hi!")
└─→ [USER("Hello"), ASSISTANT("Hi!")]
Add: USER("What's 2+2?")
└─→ [USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]
... continues until limit ...
At limit (max_messages=3):
[USER("Hello"), ASSISTANT("Hi!"), USER("What's 2+2?")]
Add: ASSISTANT("4")
└─→ Remove oldest: USER("Hello")
└─→ [ASSISTANT("Hi!"), USER("What's 2+2?"), ASSISTANT("4")]
Tool-Pair-Aware Trimming¶
After the sliding window trim, the memory scans forward to find the first safe boundary. This prevents orphaning a TOOL result without its preceding ASSISTANT tool-use message, which would violate provider API contracts.
def _fix_tool_pair_boundary(self) -> None:
while len(self._messages) > 1:
first = self._messages[0]
if first.role == Role.TOOL:
self._messages.pop(0)
continue
if first.role == Role.ASSISTANT and first.tool_calls:
self._messages.pop(0)
continue
break
Before fix: Trim might leave [TOOL("result..."), USER("next question")] — invalid.
After fix: Advances past orphaned messages to [USER("next question")] — valid.
Observer Notifications¶
When an AgentObserver is registered, the agent fires on_memory_trim whenever trimming occurs — both for messages added during the run (via _memory_add) and for the initial user messages added at the start of run()/arun()/astream() (via _memory_add_many):
from selectools import AgentObserver
class MemoryWatcher(AgentObserver):
def on_memory_trim(self, run_id, messages_removed, messages_remaining, reason):
print(f"[{run_id}] Trimmed {messages_removed} messages, {messages_remaining} remaining")
The reason parameter is "enforce_limits" for sliding window / max-tokens trimming.
Implementation¶
def _enforce_limits(self) -> None:
# 1. Enforce message count limit
if len(self._messages) > self.max_messages:
excess = len(self._messages) - self.max_messages
self._messages = self._messages[excess:]
# 2. Enforce token count limit (if configured)
if self.max_tokens is not None:
while len(self._messages) > 1: # Keep at least 1
total_tokens = sum(
estimate_tokens(msg.content)
for msg in self._messages
)
if total_tokens <= self.max_tokens:
break
# Remove oldest message
self._messages.pop(0)
# 3. Fix tool-pair boundary
self._fix_tool_pair_boundary()
Integration with Agent¶
With Memory¶
from selectools import Agent, ConversationMemory, Message, Role
memory = ConversationMemory(max_messages=20)
agent = Agent(tools=[...], provider=provider, memory=memory)
# Turn 1
response1 = agent.run([
Message(role=Role.USER, content="My name is Alice")
])
# Turn 2 - Context preserved
response2 = agent.run([
Message(role=Role.USER, content="What's my name?")
])
# Agent knows: "Alice"
Flow¶
run() called
│
├─→ memory.get_history()
│ └─→ Returns previous messages
│
├─→ Append new user messages
│
├─→ memory.add_many(new_messages)
│
├─→ Execute agent loop
│ ├─→ LLM sees full history
│ ├─→ Tool calls append to history
│ └─→ memory.add() for each message
│
├─→ memory.add(final_response)
│
└─→ Return response
Without Memory¶
agent = Agent(tools=[...], provider=provider) # No memory
# Each call is independent
response1 = agent.run([Message(role=Role.USER, content="My name is Alice")])
response2 = agent.run([Message(role=Role.USER, content="What's my name?")])
# Agent doesn't know - no memory
Implementation¶
Class Structure¶
class ConversationMemory:
def __init__(self, max_messages: int = 20, max_tokens: Optional[int] = None):
if max_messages < 1:
raise ValueError("max_messages must be at least 1")
if max_tokens is not None and max_tokens < 1:
raise ValueError("max_tokens must be at least 1")
self.max_messages = max_messages
self.max_tokens = max_tokens
self._messages: List[Message] = []
Core Methods¶
def add(self, message: Message) -> None:
"""Add a single message to history."""
self._messages.append(message)
self._enforce_limits()
def add_many(self, messages: List[Message]) -> None:
"""Add multiple messages at once."""
self._messages.extend(messages)
self._enforce_limits()
def get_history(self) -> List[Message]:
"""Get full conversation history."""
return list(self._messages)
def get_recent(self, n: int) -> List[Message]:
"""Get last N messages."""
if n < 1:
raise ValueError("n must be at least 1")
return self._messages[-n:] if len(self._messages) >= n else list(self._messages)
def clear(self) -> None:
"""Clear all messages."""
self._messages.clear()
Serialization¶
def to_dict(self) -> Dict[str, Any]:
"""Serialize memory for logging/persistence."""
return {
"max_messages": self.max_messages,
"max_tokens": self.max_tokens,
"message_count": len(self._messages),
"messages": [msg.to_dict() for msg in self._messages],
"summary": self._summary,
}
Deserialization with from_dict()¶
Reconstruct a ConversationMemory from a dictionary produced by to_dict(). The restored instance preserves the exact persisted state — _enforce_limits() is not re-run, so no messages are silently dropped during reconstruction. The tool-pair boundary is fixed to ensure a valid starting message.
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "ConversationMemory":
"""Reconstruct a ConversationMemory from a to_dict() output."""
...
Usage:
import json
# Save
with open("conversation.json", "w") as f:
json.dump(memory.to_dict(), f)
# Restore
with open("conversation.json", "r") as f:
data = json.load(f)
memory = ConversationMemory.from_dict(data)
# Summary is preserved
print(memory.summary) # Restored if present
Key behaviors:
- Config fields (
max_messages,max_tokens) are restored from the dict - Messages are reconstructed via
Message.from_dict() - The
summaryfield (from summarize-on-trim) is preserved _fix_tool_pair_boundary()runs to ensure valid conversation start_last_trimmedis reset to empty (trim history is not persisted)
Summarize-on-Trim¶
When messages are trimmed by the sliding window, important early context is normally lost. Summarize-on-trim generates a summary of the dropped messages and preserves it as system context.
Configuration¶
Summarize-on-trim is configured via AgentConfig, not on ConversationMemory directly:
from selectools import Agent, AgentConfig, ConversationMemory
memory = ConversationMemory(max_messages=30)
agent = Agent(
tools=[...], provider=provider, memory=memory,
config=AgentConfig(
summarize_on_trim=True,
summarize_provider=provider, # Provider for summarization
summarize_model="gpt-4o-mini", # Use a cheap/fast model
summarize_max_tokens=150, # Max tokens for the summary
),
)
How It Works¶
Messages exceed max_messages
│
├─→ _enforce_limits() trims oldest messages
├─→ Trimmed messages stored in _last_trimmed
│
├─→ Agent detects _last_trimmed is non-empty
├─→ Sends trimmed messages to summarize_provider
├─→ Provider returns 2-3 sentence summary
│
├─→ Summary stored in memory.summary
├─→ on_memory_summarize observer event fired
│
└─→ On next turn, summary injected as [Conversation Summary] in system prompt
Key Properties¶
memory.summary: Read the current summary (orNoneif no trimming has occurred)memory._last_trimmed: Messages removed during the most recent_enforce_limits()call
Summary Persistence¶
When using to_dict() / from_dict(), the summary is included:
data = memory.to_dict()
# data["summary"] contains the current summary string (or None)
restored = ConversationMemory.from_dict(data)
print(restored.summary) # Summary is preserved
Best Practices¶
1. Choose Appropriate Limits¶
# Short interactions (Q&A bot)
memory = ConversationMemory(max_messages=10)
# Standard conversations
memory = ConversationMemory(max_messages=20)
# Long-form dialogues
memory = ConversationMemory(max_messages=50)
2. Use Token Limits for Cost Control¶
# Limit by tokens to prevent large prompts
memory = ConversationMemory(
max_messages=100, # High message count
max_tokens=4000 # But limit tokens
)
3. Clear Memory Between Sessions¶
4. Access Recent Context¶
# Get last 5 messages for display
recent = memory.get_recent(5)
for msg in recent:
print(f"{msg.role}: {msg.content}")
5. Serialize and Restore¶
import json
# Save conversation
with open("conversation.json", "w") as f:
json.dump(memory.to_dict(), f)
# Restore conversation (preserves summary and all messages)
with open("conversation.json", "r") as f:
data = json.load(f)
memory = ConversationMemory.from_dict(data)
Testing¶
def test_memory_sliding_window():
memory = ConversationMemory(max_messages=3)
# Add 5 messages
for i in range(5):
memory.add(Message(role=Role.USER, content=f"Message {i}"))
# Should only keep last 3
history = memory.get_history()
assert len(history) == 3
assert history[0].content == "Message 2"
assert history[2].content == "Message 4"
def test_memory_with_agent():
memory = ConversationMemory(max_messages=10)
agent = Agent(tools=[...], provider=LocalProvider(), memory=memory)
# First turn
agent.run([Message(role=Role.USER, content="Hello")])
assert len(memory.get_history()) > 0
# Second turn
agent.run([Message(role=Role.USER, content="Goodbye")])
assert len(memory.get_history()) > 1
Common Pitfalls¶
1. Forgetting to Share Memory¶
# ❌ Bad - Each agent has separate memory
agent1 = Agent(..., memory=ConversationMemory())
agent2 = Agent(..., memory=ConversationMemory())
# ✅ Good - Shared memory
memory = ConversationMemory()
agent1 = Agent(..., memory=memory)
agent2 = Agent(..., memory=memory)
2. Not Clearing Between Users¶
# ❌ Bad - User A sees User B's history
def handle_user_a():
agent.run([...])
def handle_user_b():
agent.run([...]) # Sees User A's messages!
# ✅ Good - Clear between users
def handle_user(user_id):
if user_id != previous_user:
memory.clear()
agent.run([...])
3. Setting Limits Too Low¶
# ❌ Bad - Forgets context quickly
memory = ConversationMemory(max_messages=2)
# ✅ Good - Reasonable context
memory = ConversationMemory(max_messages=20)
Related Memory Modules (v0.16.0)¶
The following memory features were shipped in v0.16.0 and integrate with ConversationMemory via AgentConfig:
- Sessions — Persistent session storage with JSON file, SQLite, and Redis backends
- Entity Memory — LLM-based named entity extraction and context injection
- Knowledge Graph — Relationship triple extraction with in-memory and SQLite storage
- Knowledge Memory — Cross-session durable memory with daily logs and
remembertool
Future Enhancements¶
Potential improvements (see Roadmap):
- Semantic Pruning: Remove similar/redundant messages to maximize useful context
Further Reading¶
- Agent Module - How agents use memory (including session, entity, KG, and knowledge integration)
- Sessions Module - Persistent session storage backends
- Entity Memory Module - Named entity extraction and tracking
- Knowledge Graph Module - Relationship triple extraction
- Knowledge Memory Module - Cross-session durable memory
- Types Module - Message data structure
Next Steps: Learn about usage tracking in the Usage Module.