Entity Memory Module¶

Added in: v0.16.0 File: src/selectools/entity_memory.py Classes: Entity, EntityMemory

Table of Contents¶

Overview
Quick Start
Entity Dataclass
EntityMemory Class
LLM-Powered Extraction
Deduplication and Merging
LRU Pruning
Agent Integration
Observer Events
Best Practices

Overview¶

The Entity Memory module automatically extracts, tracks, and recalls named entities (people, organizations, locations, concepts) across conversation turns. It gives agents persistent awareness of who and what has been discussed, enabling more coherent multi-turn interactions.

Purpose¶

Entity Extraction: LLM-powered identification of entities from conversation text
Attribute Tracking: Accumulate facts about entities across turns (e.g., "Alice works at Acme Corp")
Mention Counting: Track how frequently each entity appears
Context Injection: Automatically provide the agent with known entity context
LRU Pruning: Evict least-recently-used entities when capacity is exceeded

Quick Start¶

from selectools import Agent, AgentConfig, OpenAIProvider, ConversationMemory, Message, Role
from selectools.entity_memory import EntityMemory

entity_memory = EntityMemory(
    max_entities=100,
    provider=OpenAIProvider(),  # used for LLM-based extraction
)

agent = Agent(
    tools=[],
    provider=OpenAIProvider(),
    memory=ConversationMemory(max_messages=50),
    config=AgentConfig(entity_memory=entity_memory),
)

# Turn 1 -- entities extracted automatically
result = agent.run([
    Message(role=Role.USER, content="Alice is a software engineer at Acme Corp in Seattle.")
])

# Turn 2 -- agent has entity context
result = agent.run([
    Message(role=Role.USER, content="What do you know about Alice?")
])
# Agent knows: Alice is a software engineer at Acme Corp, located in Seattle

Entity Dataclass¶

Each tracked entity is represented as an Entity instance:

from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import datetime

@dataclass
class Entity:
    name: str                                      # canonical name
    entity_type: str                               # "person", "organization", "location", etc.
    attributes: Dict[str, str] = field(default_factory=dict)
    mentions: int = 0                              # total mention count
    first_seen: Optional[datetime] = None
    last_seen: Optional[datetime] = None
    aliases: List[str] = field(default_factory=list)  # alternative names

Example Entity¶

Entity(
    name="Alice",
    entity_type="person",
    attributes={
        "role": "software engineer",
        "company": "Acme Corp",
        "location": "Seattle",
    },
    mentions=3,
    first_seen=datetime(2026, 3, 13, 10, 0),
    last_seen=datetime(2026, 3, 13, 10, 15),
    aliases=["alice", "Alice Smith"],
)

EntityMemory Class¶

Constructor¶

class EntityMemory:
    def __init__(
        self,
        max_entities: int = 100,
        provider: Optional[Provider] = None,
        extraction_model: Optional[str] = None,
    ):
        """
        Args:
            max_entities: Maximum entities to track. LRU eviction when exceeded.
            provider: LLM provider used for entity extraction. If None,
                      extraction is skipped and entities must be added manually.
            extraction_model: Override model for extraction calls.
                              Defaults to the provider's configured model.
        """

Core Methods¶

def extract_entities(self, text: str) -> List[Entity]:
    """Extract entities from text using the LLM provider.

    Sends a structured extraction prompt to the LLM and parses
    the response into Entity objects. Returns newly extracted entities.
    """

def update(self, entities: List[Entity]) -> None:
    """Merge extracted entities into the tracked set.

    - New entities are added.
    - Existing entities have their attributes merged and mention counts incremented.
    - LRU eviction is triggered if max_entities is exceeded.
    """

def build_context(self) -> str:
    """Build a context string for injection into the system prompt.

    Returns a formatted block listing all tracked entities with
    their types and attributes, suitable for prepending to messages.
    """

def get_entity(self, name: str) -> Optional[Entity]:
    """Look up a tracked entity by name (case-insensitive)."""

def get_all_entities(self) -> List[Entity]:
    """Return all tracked entities, ordered by last_seen (most recent first)."""

def clear(self) -> None:
    """Remove all tracked entities."""

def to_dict(self) -> Dict[str, Any]:
    """Serialize entity memory for persistence."""

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "EntityMemory":
    """Restore entity memory from serialized data."""

LLM-Powered Extraction¶

When a provider is configured, extract_entities() sends the conversation text to the LLM with a structured extraction prompt:

Extract all named entities from the following text.
For each entity, provide:
- name: the canonical name
- entity_type: one of "person", "organization", "location", "product", "concept", "event", "other"
- attributes: key-value pairs of facts mentioned about the entity

Respond as a JSON array.

Text:
"""
Alice is a software engineer at Acme Corp in Seattle. She is working on Project Atlas.
"""

The LLM responds with structured JSON:

[
    {"name": "Alice", "entity_type": "person", "attributes": {"role": "software engineer", "company": "Acme Corp"}},
    {"name": "Acme Corp", "entity_type": "organization", "attributes": {"location": "Seattle"}},
    {"name": "Seattle", "entity_type": "location", "attributes": {}},
    {"name": "Project Atlas", "entity_type": "product", "attributes": {"team_member": "Alice"}}
]

Without a Provider¶

If no provider is given, automatic extraction is disabled. You can still manage entities manually:

from selectools.entity_memory import EntityMemory, Entity

em = EntityMemory(max_entities=50)  # no provider

# Manual entity management
em.update([
    Entity(name="Alice", entity_type="person", attributes={"role": "engineer"}),
])

context = em.build_context()

Deduplication and Merging¶

When update() encounters an entity whose name matches an existing tracked entity (case-insensitive), it merges rather than duplicates:

# Turn 1: "Alice is an engineer"
em.update([Entity(name="Alice", entity_type="person", attributes={"role": "engineer"})])

# Turn 2: "Alice lives in Seattle and goes by Ali"
em.update([Entity(
    name="Alice",
    entity_type="person",
    attributes={"location": "Seattle"},
    aliases=["Ali"],
)])

# Result: single entity with merged attributes
alice = em.get_entity("Alice")
# alice.attributes == {"role": "engineer", "location": "Seattle"}
# alice.mentions == 2
# alice.aliases == ["Ali"]

Merge Rules¶

Field	Merge Strategy
`name`	Keep existing canonical name
`entity_type`	Keep existing (first wins)
`attributes`	Merge dicts; new values overwrite old for same key
`mentions`	Increment by 1
`aliases`	Union of both alias lists
`last_seen`	Update to current time

LRU Pruning¶

When the number of tracked entities exceeds max_entities, the least-recently-used entities are evicted:

em = EntityMemory(max_entities=3)

em.update([Entity(name="A", entity_type="person")])  # [A]
em.update([Entity(name="B", entity_type="person")])  # [A, B]
em.update([Entity(name="C", entity_type="person")])  # [A, B, C]

# Capacity full -- next update evicts LRU
em.update([Entity(name="D", entity_type="person")])  # [B, C, D]  -- A evicted

An entity's last_seen timestamp is updated on every mention, so frequently-discussed entities remain in memory.

Agent Integration¶

Configuration¶

from selectools import Agent, AgentConfig, OpenAIProvider, ConversationMemory
from selectools.entity_memory import EntityMemory

entity_memory = EntityMemory(
    max_entities=200,
    provider=OpenAIProvider(),
)

agent = Agent(
    tools=[...],
    provider=OpenAIProvider(),
    memory=ConversationMemory(max_messages=50),
    config=AgentConfig(entity_memory=entity_memory),
)

Context Injection Flow¶

When entity memory is configured, the agent automatically injects entity context into the system prompt:

run() / arun() called
    |
    +-- entity_memory.extract_entities(user_message)
    |   +-- LLM extracts entities from new messages
    |
    +-- entity_memory.update(extracted_entities)
    |   +-- Merge with existing entities, LRU prune
    |
    +-- entity_memory.build_context()
    |   +-- "[Known Entities]
    |   |    - Alice (person): role=software engineer, company=Acme Corp
    |   |    - Acme Corp (organization): location=Seattle
    |   |    - Seattle (location)"
    |
    +-- Prepend context to system message
    |
    +-- Execute agent loop (LLM sees entity context)
    |
    +-- Return AgentResult

Context Format¶

The build_context() method produces a block like:

[Known Entities]
- Alice (person): role=software engineer, company=Acme Corp, location=Seattle
- Acme Corp (organization): location=Seattle, employee=Alice
- Project Atlas (product): team_member=Alice

This block is injected as part of the system message so the LLM can reference known entities without re-extraction.

Observer Events¶

Entity extraction fires an observer event:

from selectools import AgentObserver

class EntityWatcher(AgentObserver):
    def on_entity_extraction(
        self,
        run_id: str,
        entities_extracted: int,
        entities_total: int,
        entities: list,
    ) -> None:
        print(f"[{run_id}] Extracted {entities_extracted} entities, {entities_total} total tracked")
        for e in entities:
            print(f"  - {e.name} ({e.entity_type})")

Event	When	Parameters
`on_entity_extraction`	After extracting and merging entities	`run_id`, `entities_extracted`, `entities_total`, `entities`

Best Practices¶

1. Set Appropriate Capacity¶

# Short conversations -- fewer entities needed
em = EntityMemory(max_entities=50)

# Long-running assistants -- track more context
em = EntityMemory(max_entities=500)

2. Use a Cost-Effective Extraction Model¶

# Use a smaller model for extraction to reduce cost
em = EntityMemory(
    max_entities=100,
    provider=OpenAIProvider(model="gpt-4o-mini"),
)

3. Persist Entity Memory with Sessions¶

Entity memory is serialized when used with session storage:

from selectools.sessions import SQLiteSessionStore

store = SQLiteSessionStore(db_path="sessions.db")

agent = Agent(
    tools=[...],
    provider=OpenAIProvider(),
    memory=ConversationMemory(),
    config=AgentConfig(
        entity_memory=EntityMemory(max_entities=100, provider=OpenAIProvider()),
        session_store=store,
        session_id="user-42",
    ),
)
# Entity memory is saved/restored alongside conversation memory

4. Inspect Tracked Entities¶

for entity in entity_memory.get_all_entities():
    print(f"{entity.name} ({entity.entity_type}): {entity.attributes}")
    print(f"  Mentions: {entity.mentions}, Last seen: {entity.last_seen}")

5. Manual Entity Seeding¶

Pre-populate entities for domain-specific contexts:

em = EntityMemory(max_entities=100)

em.update([
    Entity(name="Selectools", entity_type="product", attributes={
        "type": "Python library",
        "purpose": "AI agent framework",
    }),
    Entity(name="OpenAI", entity_type="organization", attributes={
        "type": "AI company",
    }),
])

Testing¶

def test_entity_extraction_and_merge():
    em = EntityMemory(max_entities=50)

    em.update([
        Entity(name="Alice", entity_type="person", attributes={"role": "engineer"}),
    ])
    assert em.get_entity("Alice") is not None
    assert em.get_entity("Alice").mentions == 1

    # Merge new attributes
    em.update([
        Entity(name="Alice", entity_type="person", attributes={"location": "Seattle"}),
    ])
    alice = em.get_entity("Alice")
    assert alice.mentions == 2
    assert alice.attributes["role"] == "engineer"
    assert alice.attributes["location"] == "Seattle"


def test_lru_eviction():
    em = EntityMemory(max_entities=2)

    em.update([Entity(name="A", entity_type="person")])
    em.update([Entity(name="B", entity_type="person")])
    em.update([Entity(name="C", entity_type="person")])

    assert em.get_entity("A") is None  # evicted
    assert em.get_entity("B") is not None
    assert em.get_entity("C") is not None


def test_build_context():
    em = EntityMemory(max_entities=50)
    em.update([
        Entity(name="Alice", entity_type="person", attributes={"role": "engineer"}),
    ])

    context = em.build_context()
    assert "[Known Entities]" in context
    assert "Alice (person)" in context
    assert "role=engineer" in context


def test_serialization_roundtrip():
    em = EntityMemory(max_entities=50)
    em.update([
        Entity(name="Alice", entity_type="person", attributes={"role": "engineer"}),
    ])

    data = em.to_dict()
    em2 = EntityMemory.from_dict(data)

    assert em2.get_entity("Alice") is not None
    assert em2.get_entity("Alice").attributes["role"] == "engineer"

API Reference¶

Class	Description
`Entity(name, entity_type, attributes, mentions, aliases)`	Dataclass representing a tracked entity
`EntityMemory(max_entities, provider, extraction_model)`	LLM-powered entity tracker with LRU eviction

Method	Returns	Description
`extract_entities(text)`	`List[Entity]`	Extract entities from text via LLM
`update(entities)`	`None`	Merge entities into tracked set
`build_context()`	`str`	Build `[Known Entities]` context string
`get_entity(name)`	`Optional[Entity]`	Look up entity by name
`get_all_entities()`	`List[Entity]`	All tracked entities (most recent first)
`clear()`	`None`	Remove all entities
`to_dict()`	`Dict`	Serialize for persistence
`from_dict(data)`	`EntityMemory`	Restore from serialized data

AgentConfig Field	Type	Description
`entity_memory`	`Optional[EntityMemory]`	Entity memory instance for automatic extraction