Skip to content

Serve Module

Added in: v0.19.0 Package: src/selectools/serve/ Classes: AgentRouter, AgentServer Functions: create_app()

Table of Contents

  1. Overview
  2. Quick Start
  3. CLI Commands
  4. Endpoints
  5. Streaming (SSE)
  6. Playground UI
  7. Python API
  8. FastAPI Integration
  9. Flask Integration
  10. Configuration Options
  11. Request / Response Models
  12. API Reference
  13. Examples

Overview

The serve module turns any selectools Agent into an HTTP API with one command. No framework boilerplate, no config files, no Docker -- just selectools serve agent.yaml and you have a live endpoint with streaming, a health check, tool schema introspection, and an interactive playground UI.

Why Serve?

selectools serve Manual FastAPI setup
Lines of code 1 CLI command or 3 lines of Python 40+ lines minimum
Dependencies Zero (stdlib http.server) fastapi, uvicorn, pydantic
Streaming SSE built-in Manual SSE wiring
Playground Built-in chat UI at /playground Build your own
Schema Auto-generated from tools Manual OpenAPI spec

Design Philosophy

  • Zero dependencies. The built-in server uses Python's stdlib http.server. No FastAPI, no Flask, no uvicorn required.
  • Production-ready integrations. When you outgrow the built-in server, AgentRouter drops into FastAPI or Flask with 3 lines of code.
  • Config-driven. Load agents from YAML files or built-in templates. No Python code required for common configurations.

Quick Start

One Command

# Serve from a YAML config
selectools serve agent.yaml

# Serve a built-in template
selectools serve customer_support

# Customize host and port
selectools serve agent.yaml --port 3000 --host 127.0.0.1

# Disable the playground UI
selectools serve agent.yaml --no-playground

Three Lines of Python

from selectools.serve import create_app

app = create_app(agent, playground=True)
app.serve(port=8000)

The server prints its endpoints on startup:

Selectools agent serving at http://0.0.0.0:8000
  POST /invoke   -- single prompt
  POST /stream   -- SSE streaming
  GET  /health   -- health check
  GET  /schema   -- tool schemas
  GET  /playground -- chat UI

Press Ctrl+C to stop.

CLI Commands

selectools serve

Start an agent HTTP server from a YAML config file or template name.

selectools serve <config> [--port PORT] [--host HOST] [--no-playground]
Argument Default Description
config (required) Path to YAML config file, or a template name (customer_support, data_analyst, etc.).
--port 8000 Port number.
--host 0.0.0.0 Bind address. Use 127.0.0.1 for local-only.
--no-playground False Disable the playground chat UI.

When config is a template name (e.g. customer_support), the CLI auto-detects an API key from environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, or GOOGLE_API_KEY) and creates the provider automatically.

selectools doctor

Diagnose API keys, optional dependencies, and provider connectivity.

selectools doctor

Output:

Selectools Doctor
========================================
Version: 0.19.0
Python: 3.12.0

API Keys:
  OPENAI_API_KEY: OK
  ANTHROPIC_API_KEY: MISSING
  GOOGLE_API_KEY: MISSING
  GEMINI_API_KEY: MISSING

Optional Dependencies:
  fastapi: OK (FastAPI serving)
  flask: not installed (Flask serving)
  redis: OK (Redis cache/sessions)
  chromadb: not installed (Chroma vector store)
  ...

Provider Connectivity:
  OpenAI: OK (connected)
  Anthropic: skipped (no key)
  Gemini: skipped (no key)

Diagnosis complete.

Endpoints

POST /invoke

Send a single prompt and receive a JSON response.

Request:

{
  "prompt": "What is the capital of France?"
}

Response:

{
  "content": "The capital of France is Paris.",
  "tool_calls": [],
  "reasoning": null,
  "iterations": 1,
  "tokens": 42,
  "cost_usd": 0.00012,
  "run_id": "run-abc123"
}

cURL example:

curl -X POST http://localhost:8000/invoke \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is the capital of France?"}'

POST /stream

Send a prompt and receive an SSE (Server-Sent Events) stream. Each event is a JSON object with a type field.

Request: Same as /invoke.

Response stream:

data: {"type": "chunk", "content": "The capital"}
data: {"type": "chunk", "content": " of France"}
data: {"type": "chunk", "content": " is Paris."}
data: {"type": "result", "content": "The capital of France is Paris.", "iterations": 1}
data: [DONE]

GET /health

Health check endpoint. Returns agent status, version, model, provider, and available tools.

Response:

{
  "status": "ok",
  "version": "0.19.0",
  "model": "gpt-4o",
  "provider": "openai",
  "tools": ["read_file", "write_file", "web_search"]
}

GET /schema

Returns JSON schemas for all tools registered with the agent.

Response:

{
  "model": "gpt-4o",
  "tools": [
    {
      "name": "read_file",
      "description": "Read a file from disk",
      "parameters": {
        "type": "object",
        "properties": {
          "path": {"type": "string", "description": "File path to read"}
        },
        "required": ["path"]
      }
    }
  ]
}

GET /playground

Interactive chat UI served as a single HTML page. See Playground UI below.

GET /

Redirects to /playground when the playground is enabled.


Streaming (SSE)

The /stream endpoint uses Server-Sent Events for real-time token streaming. The agent's astream() method powers this -- each token chunk is forwarded as an SSE event.

Event Types

Type Description
chunk A text fragment from the LLM. Concatenate all chunks for the full response.
result Final result with content, iteration count. Sent once at the end.
[DONE] Stream termination signal.

JavaScript Client

const response = await fetch("http://localhost:8000/stream", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ prompt: "Explain quantum computing" }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  for (const line of text.split("\n")) {
    if (line.startsWith("data: ") && line !== "data: [DONE]") {
      const event = JSON.parse(line.slice(6));
      if (event.type === "chunk") {
        process.stdout.write(event.content);
      }
    }
  }
}

Playground UI

When enabled (default), the server serves an interactive chat interface at /playground. The playground is a single self-contained HTML page with no external dependencies.

Features

  • Real-time streaming responses via SSE
  • Conversation history within the session
  • Tool call visibility (shows which tools the agent invoked)
  • Model and provider info displayed in the header
  • Works in any modern browser

The playground is intended for development and testing. For production UIs, build a custom frontend against the /invoke and /stream endpoints.

Disabling

# CLI
selectools serve agent.yaml --no-playground

# Python
app = create_app(agent, playground=False)

Python API

AgentRouter

The AgentRouter class handles request routing and is the core building block for all integrations. It works standalone or embedded in any WSGI/ASGI framework.

from selectools.serve import AgentRouter

router = AgentRouter(agent, prefix="/api/v1", enable_playground=True)

# Use handler methods directly
result = router.handle_invoke({"prompt": "Hello"})
health = router.handle_health()
schema = router.handle_schema()

create_app()

Create a standalone HTTP server with zero dependencies:

from selectools.serve import create_app

app = create_app(
    agent,
    prefix="",           # URL prefix for all endpoints
    playground=True,      # Enable /playground UI
    host="0.0.0.0",      # Bind address
    port=8000,            # Port number
)

app.serve()  # Blocking -- starts the server

FastAPI Integration

Drop AgentRouter into a FastAPI application for production deployments:

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse, StreamingResponse
from selectools.serve import AgentRouter

app = FastAPI()
router = AgentRouter(agent)

@app.post("/invoke")
async def invoke(request: Request):
    body = await request.json()
    return JSONResponse(router.handle_invoke(body))

@app.post("/stream")
async def stream(request: Request):
    body = await request.json()
    return StreamingResponse(
        router.handle_stream(body),
        media_type="text/event-stream",
    )

@app.get("/health")
async def health():
    return JSONResponse(router.handle_health())

Run with uvicorn for production-grade performance:

uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4

Flask Integration

from flask import Flask, request, jsonify, Response
from selectools.serve import AgentRouter

app = Flask(__name__)
router = AgentRouter(agent)

@app.route("/invoke", methods=["POST"])
def invoke():
    return jsonify(router.handle_invoke(request.json))

@app.route("/stream", methods=["POST"])
def stream():
    return Response(
        router.handle_stream(request.json),
        content_type="text/event-stream",
    )

@app.route("/health")
def health():
    return jsonify(router.handle_health())

Configuration Options

YAML Config File

The recommended way to configure a served agent. See the Templates Module for full YAML reference.

provider: openai
model: gpt-4o
system_prompt: "You are a helpful coding assistant."
tools:
  - selectools.toolbox.file_tools.read_file
  - selectools.toolbox.file_tools.write_file
  - ./my_custom_tool.py
budget:
  max_cost_usd: 1.00
retry:
  max_retries: 3

Environment Variables

The CLI auto-detects providers from environment variables:

Variable Provider
OPENAI_API_KEY OpenAI (checked first)
ANTHROPIC_API_KEY Anthropic
GOOGLE_API_KEY / GEMINI_API_KEY Gemini

Request / Response Models

File: src/selectools/serve/models.py

InvokeRequest

Field Type Description
prompt str The user prompt.
config_overrides Optional[Dict[str, Any]] Override agent config for this request.

InvokeResponse

Field Type Description
content str Agent response text.
tool_calls List[Dict] Tools invoked during execution.
reasoning Optional[str] Reasoning trace (when using CoT/ReAct strategies).
iterations int Number of agent loop iterations.
tokens int Total tokens consumed.
cost_usd float Estimated cost in USD.
run_id str Unique run identifier for trace lookup.

HealthResponse

Field Type Description
status str Always "ok" when healthy.
version str Selectools version.
model str Active model name.
provider str Active provider name.
tools List[str] Names of registered tools.

API Reference

AgentRouter.init()

Parameter Type Default Description
agent Agent (required) The agent to serve.
prefix str "" URL prefix for all endpoints (e.g. "/api/v1").
enable_playground bool True Enable the /playground chat UI.

AgentRouter Methods

Method Description
handle_invoke(body) Process a POST /invoke request. Returns response dict.
handle_stream(body) Process a POST /stream request. Yields SSE-formatted strings.
handle_health() Process a GET /health request. Returns health dict.
handle_schema() Process a GET /schema request. Returns tool schemas dict.

create_app()

Parameter Type Default Description
agent Agent (required) The agent to serve.
prefix str "" URL prefix for all endpoints.
playground bool True Enable the /playground chat UI.
host str "0.0.0.0" Bind address.
port int 8000 Port number.

Returns an AgentServer instance. Call .serve() to start (blocking).

AgentServer Methods

Method Description
serve(port=None) Start the HTTP server. Blocking. Uses stdlib http.server.

Examples

Example File Description
62 62_serve_agent.py Serve an agent with the built-in server
63 63_serve_fastapi.py Embed AgentRouter in FastAPI

Further Reading


Next Steps: Learn about YAML configuration and pre-built templates in the Templates Module.