Knowledge Graph Module¶
Added in: v0.16.0
File: src/selectools/knowledge_graph.py
Classes: Triple, TripleStore, InMemoryTripleStore, SQLiteTripleStore, KnowledgeGraphMemory
Table of Contents¶
- Overview
- Quick Start
- Triple Dataclass
- TripleStore Protocol
- Store Implementations
- KnowledgeGraphMemory
- LLM-Powered Extraction
- Agent Integration
- Observer Events
- Querying the Graph
- Best Practices
Overview¶
The Knowledge Graph module builds a graph of relationships between entities extracted from conversations. While Entity Memory tracks individual entities and their attributes, the Knowledge Graph tracks how entities relate to each other -- forming a structured, queryable web of knowledge.
Purpose¶
- Relationship Tracking: Capture subject-relation-object triples from conversation
- LLM Extraction: Automatically extract relationships using an LLM provider
- Keyword Query: Retrieve relevant triples by keyword or entity name
- Context Injection: Feed relationship context into the system prompt
- Persistence: Store triples in memory or SQLite for durability
How It Differs from Entity Memory¶
| Feature | Entity Memory | Knowledge Graph |
|---|---|---|
| Tracks | Individual entities + attributes | Relationships between entities |
| Structure | Key-value (entity -> attributes) | Graph (subject -> relation -> object) |
| Example | Alice: | Alice --works_at--> Acme Corp |
| Query | By entity name | By keyword, subject, or object |
| Best for | "What do I know about X?" | "How are X and Y related?" |
Quick Start¶
from selectools import Agent, AgentConfig, OpenAIProvider, ConversationMemory, Message, Role
from selectools.knowledge_graph import KnowledgeGraphMemory, InMemoryTripleStore
kg = KnowledgeGraphMemory(
store=InMemoryTripleStore(),
provider=OpenAIProvider(), # used for LLM-based extraction
)
agent = Agent(
tools=[],
provider=OpenAIProvider(),
memory=ConversationMemory(max_messages=50),
config=AgentConfig(knowledge_graph=kg),
)
# Turn 1 -- relationships extracted automatically
result = agent.run([
Message(role=Role.USER, content="Alice works at Acme Corp. Acme Corp is based in Seattle.")
])
# Turn 2 -- agent has relationship context
result = agent.run([
Message(role=Role.USER, content="Where does Alice's company operate?")
])
# Agent knows: Alice works_at Acme Corp, Acme Corp located_in Seattle
Triple Dataclass¶
Each relationship is represented as a Triple:
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class Triple:
subject: str # source entity
relation: str # relationship type (e.g., "works_at")
object: str # target entity
confidence: float = 1.0 # extraction confidence (0.0 - 1.0)
source_turn: Optional[int] = None # conversation turn where extracted
created_at: Optional[datetime] = None
Example Triples¶
Triple(subject="Alice", relation="works_at", object="Acme Corp", confidence=0.95)
Triple(subject="Acme Corp", relation="located_in", object="Seattle", confidence=0.90)
Triple(subject="Alice", relation="manages", object="Project Atlas", confidence=0.85)
Triple(subject="Bob", relation="reports_to", object="Alice", confidence=0.80)
TripleStore Protocol¶
All backends implement the TripleStore protocol:
from typing import Protocol, List, Optional
class TripleStore(Protocol):
def add(self, triples: List[Triple]) -> None:
"""Add triples to the store. Duplicates are ignored."""
...
def query(
self,
subject: Optional[str] = None,
relation: Optional[str] = None,
object: Optional[str] = None,
) -> List[Triple]:
"""Query triples by any combination of subject, relation, object.
None fields act as wildcards.
"""
...
def search(self, keywords: List[str], top_k: int = 20) -> List[Triple]:
"""Search triples matching any of the given keywords.
Matches against subject, relation, and object fields.
"""
...
def delete(
self,
subject: Optional[str] = None,
relation: Optional[str] = None,
object: Optional[str] = None,
) -> int:
"""Delete matching triples. Returns the number of triples deleted."""
...
def all(self) -> List[Triple]:
"""Return all triples in the store."""
...
def clear(self) -> None:
"""Remove all triples."""
...
def count(self) -> int:
"""Return the total number of triples."""
...
Store Implementations¶
1. InMemoryTripleStore¶
Best for: Prototyping, testing, short-lived sessions
from selectools.knowledge_graph import InMemoryTripleStore
store = InMemoryTripleStore()
store.add([
Triple(subject="Alice", relation="works_at", object="Acme Corp"),
Triple(subject="Acme Corp", relation="located_in", object="Seattle"),
])
# Query by subject
results = store.query(subject="Alice")
# [Triple(subject="Alice", relation="works_at", object="Acme Corp")]
# Keyword search
results = store.search(keywords=["Alice", "Seattle"], top_k=10)
# Returns triples mentioning Alice or Seattle
Features:
- No dependencies
- Fast in-memory lookup
- No persistence (lost on restart)
- Suitable for up to ~10k triples
2. SQLiteTripleStore¶
Best for: Production single-instance, persistent knowledge graphs
from selectools.knowledge_graph import SQLiteTripleStore
store = SQLiteTripleStore(db_path="knowledge_graph.db")
Schema:
CREATE TABLE triples (
id INTEGER PRIMARY KEY AUTOINCREMENT,
subject TEXT NOT NULL,
relation TEXT NOT NULL,
object TEXT NOT NULL,
confidence REAL DEFAULT 1.0,
source_turn INTEGER,
created_at TEXT,
UNIQUE(subject, relation, object)
);
CREATE INDEX idx_subject ON triples(subject);
CREATE INDEX idx_object ON triples(object);
CREATE INDEX idx_relation ON triples(relation);
Features:
- Persistent storage
- Indexed queries
- Duplicate-safe (UNIQUE constraint)
- ACID transactions
- Suitable for up to ~100k triples
Choosing a Store¶
| Feature | InMemory | SQLite |
|---|---|---|
| Persistence | No | Yes |
| Dependencies | None | None |
| Max Triples | ~10k | ~100k |
| Query Speed | Fast | Fast (indexed) |
| Setup | None | DB path |
KnowledgeGraphMemory¶
KnowledgeGraphMemory wraps a TripleStore with LLM-powered extraction and context building:
Constructor¶
class KnowledgeGraphMemory:
def __init__(
self,
store: TripleStore,
provider: Optional[Provider] = None,
extraction_model: Optional[str] = None,
max_context_triples: int = 30,
min_confidence: float = 0.5,
):
"""
Args:
store: Backend triple store.
provider: LLM provider for relationship extraction.
If None, extraction is disabled (manual-only).
extraction_model: Override model for extraction calls.
max_context_triples: Max triples to include in context injection.
min_confidence: Minimum confidence threshold for context inclusion.
"""
Core Methods¶
def extract_triples(self, text: str) -> List[Triple]:
"""Extract relationship triples from text using the LLM provider.
Returns a list of Triple objects parsed from the LLM response.
"""
def update(self, triples: List[Triple]) -> None:
"""Add triples to the underlying store."""
def query(self, keywords: List[str], top_k: int = 20) -> List[Triple]:
"""Search the triple store by keywords.
Filters results by min_confidence threshold.
"""
def build_context(self, keywords: Optional[List[str]] = None) -> str:
"""Build a context string for system prompt injection.
If keywords are provided, only relevant triples are included.
Otherwise, the most recent triples (up to max_context_triples) are used.
"""
def clear(self) -> None:
"""Clear all triples from the store."""
def to_dict(self) -> Dict[str, Any]:
"""Serialize for persistence (used by session storage)."""
@classmethod
def from_dict(cls, data: Dict[str, Any], store: TripleStore) -> "KnowledgeGraphMemory":
"""Restore from serialized data."""
LLM-Powered Extraction¶
When a provider is configured, extract_triples() sends text to the LLM with a structured prompt:
Extract all relationships from the following text as subject-relation-object triples.
For each triple, provide:
- subject: the source entity
- relation: the relationship (use snake_case, e.g., "works_at", "located_in", "manages")
- object: the target entity
- confidence: how confident you are (0.0 to 1.0)
Respond as a JSON array.
Text:
"""
Alice is a senior engineer at Acme Corp. She manages the Atlas project
and reports to Bob, the VP of Engineering. Acme Corp is headquartered in Seattle.
"""
The LLM responds:
[
{"subject": "Alice", "relation": "works_at", "object": "Acme Corp", "confidence": 0.95},
{"subject": "Alice", "relation": "has_role", "object": "senior engineer", "confidence": 0.95},
{"subject": "Alice", "relation": "manages", "object": "Atlas project", "confidence": 0.90},
{"subject": "Alice", "relation": "reports_to", "object": "Bob", "confidence": 0.90},
{"subject": "Bob", "relation": "has_role", "object": "VP of Engineering", "confidence": 0.90},
{"subject": "Acme Corp", "relation": "headquartered_in", "object": "Seattle", "confidence": 0.95}
]
Agent Integration¶
Configuration¶
from selectools import Agent, AgentConfig, OpenAIProvider, ConversationMemory
from selectools.knowledge_graph import KnowledgeGraphMemory, SQLiteTripleStore
kg = KnowledgeGraphMemory(
store=SQLiteTripleStore(db_path="kg.db"),
provider=OpenAIProvider(model="gpt-4o-mini"),
max_context_triples=30,
min_confidence=0.6,
)
agent = Agent(
tools=[...],
provider=OpenAIProvider(),
memory=ConversationMemory(max_messages=50),
config=AgentConfig(knowledge_graph=kg),
)
Context Injection Flow¶
run() / arun() called
|
+-- knowledge_graph.extract_triples(user_message)
| +-- LLM extracts relationship triples
|
+-- knowledge_graph.update(extracted_triples)
| +-- Store triples in backend
|
+-- Extract keywords from user message
|
+-- knowledge_graph.build_context(keywords)
| +-- "[Known Relationships]
| | - Alice works_at Acme Corp (confidence: 0.95)
| | - Acme Corp headquartered_in Seattle (confidence: 0.95)
| | - Alice manages Atlas project (confidence: 0.90)"
|
+-- Prepend context to system message
|
+-- Execute agent loop
|
+-- Return AgentResult
Context Format¶
The build_context() method produces:
[Known Relationships]
- Alice works_at Acme Corp (0.95)
- Alice manages Atlas project (0.90)
- Alice reports_to Bob (0.90)
- Acme Corp headquartered_in Seattle (0.95)
- Bob has_role VP of Engineering (0.90)
Observer Events¶
Knowledge graph extraction fires an observer event:
from selectools import AgentObserver
class KGWatcher(AgentObserver):
def on_kg_extraction(
self,
run_id: str,
triples_extracted: int,
triples_total: int,
triples: list,
) -> None:
print(f"[{run_id}] Extracted {triples_extracted} triples, {triples_total} total in store")
for t in triples:
print(f" {t.subject} --{t.relation}--> {t.object} ({t.confidence:.2f})")
| Event | When | Parameters |
|---|---|---|
on_kg_extraction |
After extracting and storing triples | run_id, triples_extracted, triples_total, triples |
Querying the Graph¶
By Subject¶
# All relationships where Alice is the subject
triples = kg.store.query(subject="Alice")
# Alice works_at Acme Corp
# Alice manages Atlas project
# Alice reports_to Bob
By Object¶
# All relationships pointing to Acme Corp
triples = kg.store.query(object="Acme Corp")
# Alice works_at Acme Corp
By Relation Type¶
# All "manages" relationships
triples = kg.store.query(relation="manages")
# Alice manages Atlas project
By Keywords¶
# Free-text keyword search
triples = kg.query(keywords=["Alice", "engineering"], top_k=10)
# Returns triples mentioning Alice or engineering
Combined Queries¶
# Alice's role at Acme Corp specifically
triples = kg.store.query(subject="Alice", object="Acme Corp")
# Alice works_at Acme Corp
Best Practices¶
1. Use SQLite for Persistent Graphs¶
# Prototyping
kg = KnowledgeGraphMemory(store=InMemoryTripleStore(), provider=provider)
# Production
kg = KnowledgeGraphMemory(
store=SQLiteTripleStore(db_path="knowledge.db"),
provider=provider,
)
2. Filter by Confidence¶
# Only high-confidence triples in context
kg = KnowledgeGraphMemory(
store=store,
provider=provider,
min_confidence=0.8, # ignore uncertain extractions
)
3. Use a Cost-Effective Extraction Model¶
# Use a smaller model for extraction
kg = KnowledgeGraphMemory(
store=store,
provider=OpenAIProvider(model="gpt-4o-mini"),
)
4. Limit Context Size¶
# Prevent context from growing too large
kg = KnowledgeGraphMemory(
store=store,
provider=provider,
max_context_triples=20, # cap at 20 triples in prompt
)
5. Combine with Entity Memory¶
from selectools.entity_memory import EntityMemory
from selectools.knowledge_graph import KnowledgeGraphMemory, SQLiteTripleStore
agent = Agent(
tools=[...],
provider=OpenAIProvider(),
memory=ConversationMemory(),
config=AgentConfig(
entity_memory=EntityMemory(max_entities=100, provider=OpenAIProvider()),
knowledge_graph=KnowledgeGraphMemory(
store=SQLiteTripleStore(db_path="kg.db"),
provider=OpenAIProvider(),
),
),
)
# Agent gets both [Known Entities] and [Known Relationships] context
6. Seed Domain Knowledge¶
from selectools.knowledge_graph import Triple
kg.update([
Triple(subject="Python", relation="is_a", object="programming language", confidence=1.0),
Triple(subject="selectools", relation="written_in", object="Python", confidence=1.0),
Triple(subject="selectools", relation="supports", object="OpenAI", confidence=1.0),
Triple(subject="selectools", relation="supports", object="Anthropic", confidence=1.0),
])
Testing¶
def test_triple_store_add_and_query():
store = InMemoryTripleStore()
store.add([
Triple(subject="Alice", relation="works_at", object="Acme"),
Triple(subject="Bob", relation="works_at", object="Acme"),
])
results = store.query(subject="Alice")
assert len(results) == 1
assert results[0].object == "Acme"
results = store.query(object="Acme")
assert len(results) == 2
def test_triple_store_keyword_search():
store = InMemoryTripleStore()
store.add([
Triple(subject="Alice", relation="works_at", object="Acme Corp"),
Triple(subject="Bob", relation="lives_in", object="Seattle"),
])
results = store.search(keywords=["Alice"], top_k=10)
assert len(results) == 1
assert results[0].subject == "Alice"
def test_duplicate_triples_ignored():
store = InMemoryTripleStore()
store.add([
Triple(subject="A", relation="r", object="B"),
Triple(subject="A", relation="r", object="B"), # duplicate
])
assert store.count() == 1
def test_build_context():
store = InMemoryTripleStore()
store.add([
Triple(subject="Alice", relation="works_at", object="Acme", confidence=0.9),
])
kg = KnowledgeGraphMemory(store=store, max_context_triples=10)
context = kg.build_context()
assert "[Known Relationships]" in context
assert "Alice" in context
assert "works_at" in context
assert "Acme" in context
def test_confidence_filtering():
store = InMemoryTripleStore()
store.add([
Triple(subject="A", relation="r1", object="B", confidence=0.9),
Triple(subject="C", relation="r2", object="D", confidence=0.3),
])
kg = KnowledgeGraphMemory(store=store, min_confidence=0.5)
results = kg.query(keywords=["A", "C"], top_k=10)
assert len(results) == 1
assert results[0].subject == "A"
API Reference¶
| Class | Description |
|---|---|
Triple(subject, relation, object, confidence) |
Dataclass for a subject-relation-object relationship |
TripleStore |
Protocol defining add/query/search/delete/clear interface |
InMemoryTripleStore() |
In-memory triple store for prototyping |
SQLiteTripleStore(db_path) |
SQLite-backed persistent triple store |
KnowledgeGraphMemory(store, provider, max_context_triples, min_confidence) |
LLM-powered knowledge graph with context injection |
| Method | Returns | Description |
|---|---|---|
extract_triples(text) |
List[Triple] |
Extract triples from text via LLM |
update(triples) |
None |
Add triples to the store |
query(keywords, top_k) |
List[Triple] |
Search triples by keywords |
build_context(keywords) |
str |
Build [Known Relationships] context string |
clear() |
None |
Remove all triples |
to_dict() |
Dict |
Serialize for persistence |
from_dict(data, store) |
KnowledgeGraphMemory |
Restore from serialized data |
| AgentConfig Field | Type | Description |
|---|---|---|
knowledge_graph |
Optional[KnowledgeGraphMemory] |
Knowledge graph instance for relationship tracking |
Further Reading¶
- Entity Memory Module - Entity attribute tracking (complements the knowledge graph)
- Memory Module - Conversation memory
- Sessions Module - Persist graph state across restarts
- Knowledge Module - Cross-session long-term knowledge
Next Steps: Learn about cross-session knowledge in the Knowledge Module.