Usage Tracking Module¶
Files: src/selectools/usage.py, src/selectools/analytics.py, src/selectools/pricing.py
Classes: UsageStats, AgentUsage, AgentAnalytics, ToolMetrics
Table of Contents¶
Overview¶
Selectools provides automatic cost and usage tracking for:
- Token consumption (prompt, completion, embeddings)
- API costs (per model)
- Per-tool attribution
- Tool usage patterns and success rates
- Iteration-by-iteration breakdown
Why Track Usage?¶
- Cost Control: Monitor spending in real-time
- Optimization: Identify expensive operations
- Debugging: Understand token consumption patterns
- Analytics: Track tool effectiveness
Usage Statistics¶
UsageStats Dataclass¶
Tracks a single API call:
@dataclass
class UsageStats:
prompt_tokens: int = 0
completion_tokens: int = 0
total_tokens: int = 0
cost_usd: float = 0.0
model: str = ""
provider: str = ""
# RAG support
embedding_tokens: int = 0
embedding_cost_usd: float = 0.0
Example¶
stats = UsageStats(
prompt_tokens=1500,
completion_tokens=300,
total_tokens=1800,
cost_usd=0.0045,
model="gpt-4o-mini",
provider="openai"
)
Agent Usage Tracking¶
AgentUsage Class¶
Aggregates statistics across multiple iterations:
@dataclass
class AgentUsage:
# Cumulative totals
total_prompt_tokens: int = 0
total_completion_tokens: int = 0
total_tokens: int = 0
total_cost_usd: float = 0.0
total_embedding_tokens: int = 0
total_embedding_cost_usd: float = 0.0
# Per-tool breakdown
tool_usage: Dict[str, int] = field(default_factory=dict)
tool_tokens: Dict[str, int] = field(default_factory=dict)
# Per-iteration history
iterations: List[UsageStats] = field(default_factory=list)
Automatic Tracking¶
agent = Agent(tools=[...], provider=provider)
response = agent.run([Message(role=Role.USER, content="Search for Python")])
# Usage automatically tracked
print(f"Total tokens: {agent.total_tokens:,}")
print(f"Total cost: ${agent.total_cost:.6f}")
print(agent.get_usage_summary())
# Also available on the result itself
print(response.usage) # AgentUsage snapshot
Real-Time Usage via AgentObserver¶
For real-time per-call token tracking (without waiting for the run to complete), use the on_usage observer event:
from selectools import AgentObserver
class UsageTracker(AgentObserver):
def on_usage(self, run_id, usage):
print(f"[{run_id}] {usage.total_tokens} tokens, ${usage.cost_usd:.6f}")
This fires for every LLM call, including cache hits and structured output retries.
Output¶
============================================================
📊 Usage Summary
============================================================
Total Tokens: 2,543
- Prompt: 1,890
- Completion: 653
Total Cost: $0.012345
Iterations: 3
Tool Usage:
- search: 1 calls, 847 tokens
- calculate: 2 calls, 1,696 tokens
============================================================
Per-Tool Attribution¶
print(agent.usage.tool_usage)
# {'search': 1, 'calculate': 2}
print(agent.usage.tool_tokens)
# {'search': 847, 'calculate': 1696}
Iteration Breakdown¶
for i, iteration in enumerate(agent.usage.iterations):
print(f"Iteration {i+1}:")
print(f" Tokens: {iteration.total_tokens}")
print(f" Cost: ${iteration.cost_usd:.6f}")
Reset Usage¶
Tool Analytics¶
Enable Analytics¶
from selectools import Agent, AgentConfig
config = AgentConfig(enable_analytics=True)
agent = Agent(tools=[...], provider=provider, config=config)
AgentAnalytics Class¶
Tracks detailed tool metrics:
@dataclass
class ToolMetrics:
name: str
total_calls: int = 0
successful_calls: int = 0
failed_calls: int = 0
total_duration: float = 0.0
total_cost: float = 0.0
# Streaming metrics
total_chunks: int = 0
streaming_calls: int = 0
# Parameter patterns
parameter_usage: Dict[str, Dict[Any, int]] = field(default_factory=dict)
@property
def success_rate(self) -> float:
return (self.successful_calls / self.total_calls * 100.0) if self.total_calls > 0 else 0.0
@property
def avg_duration(self) -> float:
return self.total_duration / self.total_calls if self.total_calls > 0 else 0.0
Get Analytics¶
analytics = agent.get_analytics()
if analytics:
# Print summary
print(analytics.summary())
# Get specific tool metrics
search_metrics = analytics.get_metrics("search")
print(f"Search success rate: {search_metrics.success_rate:.1f}%")
print(f"Avg duration: {search_metrics.avg_duration:.3f}s")
# Export to file
analytics.to_json("analytics.json")
analytics.to_csv("analytics.csv")
Analytics Output¶
============================================================
Tool Usage Analytics
============================================================
search:
Calls: 15
Success rate: 93.3% (14/15)
Failures: 1
Avg duration: 0.234s
Total duration: 3.510s
Total cost: $0.001250
Common parameters:
query: "Python tutorials" (8x, 5 unique)
calculate:
Calls: 23
Success rate: 100.0% (23/23)
Avg duration: 0.012s
Total duration: 0.276s
Common parameters:
operation: "add" (12x, 4 unique)
============================================================
Export Analytics¶
# JSON format
analytics.to_json("analytics.json")
# Output: {"tools": {...}, "summary": {...}}
# CSV format
analytics.to_csv("analytics.csv")
# Columns: tool_name, total_calls, successful_calls, failed_calls,
# success_rate, avg_duration, total_duration, total_cost
Pricing System¶
Model Registry Integration¶
Pricing is derived from the model registry:
from selectools.models import OpenAI, Anthropic, Gemini
# Model info includes pricing
model = OpenAI.GPT_4O
print(f"Prompt: ${model.prompt_cost}/1M tokens")
print(f"Completion: ${model.completion_cost}/1M tokens")
Cost Calculation¶
from selectools.pricing import calculate_cost
cost = calculate_cost(
model="gpt-4o",
prompt_tokens=1000,
completion_tokens=500
)
# Formula:
# cost = (prompt_tokens / 1M * prompt_cost) + (completion_tokens / 1M * completion_cost)
# cost = (1000 / 1M * 2.50) + (500 / 1M * 10.00)
# cost = 0.0025 + 0.005 = $0.0075
Embedding Cost¶
from selectools.pricing import calculate_embedding_cost
cost = calculate_embedding_cost(
model="text-embedding-3-small",
tokens=1000
)
# Formula:
# cost = tokens / 1M * prompt_cost
# cost = 1000 / 1M * 0.02 = $0.00002
Cost Warnings¶
config = AgentConfig(
cost_warning_threshold=0.10 # Warn at $0.10
)
agent = Agent(tools=[...], provider=provider, config=config)
# When threshold exceeded:
# ⚠️ Cost Warning: Total cost $0.125000 exceeds threshold $0.100000
Implementation¶
Adding Usage Stats¶
class Agent:
def __init__(self, ...):
self.usage = AgentUsage()
def _call_provider(self, ...):
# ... call provider ...
# Track usage
self.usage.add_usage(usage_stats, tool_name=None)
# Check warning threshold
if (
self.config.cost_warning_threshold
and self.usage.total_cost_usd > self.config.cost_warning_threshold
):
print(f"⚠️ Cost Warning: Total cost ${self.usage.total_cost_usd:.6f} "
f"exceeds threshold ${self.config.cost_warning_threshold:.6f}")
Recording Tool Calls¶
def run(self, messages):
# ... agent loop ...
# Before tool execution
start_time = time.time()
try:
result = self._execute_tool_with_timeout(tool, parameters)
duration = time.time() - start_time
# Track analytics
if self.analytics:
self.analytics.record_tool_call(
tool_name=tool.name,
success=True,
duration=duration,
params=parameters,
cost=0.0,
chunk_count=chunk_counter["count"]
)
except Exception as exc:
duration = time.time() - start_time
# Track failure
if self.analytics:
self.analytics.record_tool_call(
tool_name=tool.name,
success=False,
duration=duration,
params=parameters,
cost=0.0
)
Best Practices¶
1. Monitor Costs in Production¶
config = AgentConfig(
cost_warning_threshold=1.00, # $1 threshold
verbose=True
)
agent = Agent(..., config=config)
# Check costs periodically
if agent.total_cost > 10.00:
alert_ops_team(f"High agent costs: ${agent.total_cost}")
2. Log Usage Metrics¶
response = agent.run([...])
# Log to monitoring system
metrics.gauge("agent.tokens", agent.total_tokens)
metrics.gauge("agent.cost_usd", agent.total_cost)
metrics.histogram("agent.iterations", len(agent.usage.iterations))
3. Analyze Tool Performance¶
config = AgentConfig(enable_analytics=True)
agent = Agent(..., config=config)
# After running
analytics = agent.get_analytics()
for tool_name, metrics in analytics.get_all_metrics().items():
if metrics.failure_rate > 10.0:
logger.warning(f"Tool {tool_name} has high failure rate: {metrics.failure_rate:.1f}%")
if metrics.avg_duration > 5.0:
logger.warning(f"Tool {tool_name} is slow: {metrics.avg_duration:.2f}s")
4. Export for Analysis¶
# Daily analytics export
date = datetime.now().strftime("%Y-%m-%d")
analytics.to_json(f"analytics/{date}.json")
analytics.to_csv(f"analytics/{date}.csv")
5. Reset Between Sessions¶
# For chatbots, reset per user session
def new_session(user_id):
agent.reset_usage()
if agent.analytics:
agent.analytics.reset()
Testing¶
def test_usage_tracking():
agent = Agent(tools=[...], provider=LocalProvider())
agent.run([Message(role=Role.USER, content="Test")])
assert agent.total_tokens > 0
assert len(agent.usage.iterations) > 0
def test_cost_warning():
captured = []
def capture_print(*args):
captured.append(" ".join(str(a) for a in args))
with patch("builtins.print", capture_print):
config = AgentConfig(cost_warning_threshold=0.001)
agent = Agent(..., config=config)
agent.run([...])
assert any("Cost Warning" in msg for msg in captured)
def test_analytics():
config = AgentConfig(enable_analytics=True)
agent = Agent(..., config=config)
agent.run([...])
analytics = agent.get_analytics()
assert analytics is not None
assert len(analytics.get_all_metrics()) > 0
Cache-Aware Usage Tracking¶
When response caching is enabled via AgentConfig(cache=...), usage tracking remains accurate even for cached responses.
How It Works¶
- Cache miss: Provider is called normally;
UsageStatstracked as usual - Cache hit: The stored
UsageStatsfrom the original call is replayed viaagent.usage.add_usage()
This means agent.total_cost and agent.total_tokens reflect the logical usage (what it would have cost), not just the actual API calls.
Cache Stats¶
The cache itself tracks hit/miss/eviction metrics:
from selectools import InMemoryCache
cache = InMemoryCache(max_size=500, default_ttl=600)
config = AgentConfig(cache=cache)
agent = Agent(tools=[...], provider=provider, config=config)
# Run some queries...
agent.run([Message(role=Role.USER, content="Hello")])
agent.reset()
agent.run([Message(role=Role.USER, content="Hello")]) # cache hit
# Cache performance
stats = cache.stats
print(f"Hit rate: {stats.hit_rate:.1%}") # 50.0%
print(f"Hits: {stats.hits}") # 1
print(f"Misses: {stats.misses}") # 1
print(f"Evictions: {stats.evictions}") # 0
CacheStats Dataclass¶
@dataclass
class CacheStats:
hits: int = 0
misses: int = 0
evictions: int = 0
@property
def total_requests(self) -> int:
return self.hits + self.misses
@property
def hit_rate(self) -> float:
total = self.total_requests
return self.hits / total if total > 0 else 0.0
Monitoring Cache + Usage Together¶
# Agent usage (logical)
print(f"Tokens used: {agent.total_tokens:,}")
print(f"Cost: ${agent.total_cost:.6f}")
# Cache efficiency
print(f"Cache hit rate: {cache.stats.hit_rate:.1%}")
print(f"API calls saved: {cache.stats.hits}")
# Cost savings estimate
avg_cost_per_call = agent.total_cost / cache.stats.total_requests
savings = cache.stats.hits * avg_cost_per_call
print(f"Estimated savings: ${savings:.6f}")
RAG Usage Tracking¶
When using RAG, both LLM and embedding costs are tracked:
from selectools.rag import RAGAgent, VectorStore
from selectools.embeddings import OpenAIEmbeddingProvider
embedder = OpenAIEmbeddingProvider()
store = VectorStore.create("memory", embedder=embedder)
agent = RAGAgent.from_directory("./docs", provider, store)
response = agent.run("What are the features?")
# Usage includes both LLM and embeddings
print(agent.usage)
# Output:
# ============================================================
# 📊 Usage Summary
# ============================================================
# Total Tokens: 5,432
# - Prompt: 3,210
# - Completion: 1,200
# - Embeddings: 1,022
# Total Cost: $0.002150
# - LLM: $0.002000
# - Embeddings: $0.000150
# ============================================================
Further Reading¶
- Agent Module - How usage is tracked
- Agent Module - Caching - Response caching details
- Providers Module - Usage stat extraction
- Models Module - Pricing information
- RAG Module - RAG usage tracking
Next Steps: Learn about the RAG system in the RAG Module.