Manifest: Automatic LLM Routing to Reduce AI Agent Costs
Manifest is a self-hosted proxy that sits between an AI agent and LLM providers. When a request arrives, it runs a deterministic scoring algorithm against 23 dimensions of the prompt and routes the request to the cheapest model capable of handling that complexity level. The routing decision runs locally in under 2 milliseconds and does not call another LLM, so it adds no meaningful latency.
The integration change is minimal: point your existing OpenAI client at http://localhost:2099/v1 and use manifest/auto as the model name. No agent logic changes are required.
Why model routing matters
AI agents make far more low-complexity calls than high-complexity ones. Classification, intent routing, summarization, and entity extraction do not require a top-tier model, but in a single-model setup every call pays the premium price regardless of difficulty.
A single user interaction can trigger dozens of internal LLM calls. At scale, routing all of them to an expensive model produces bills that are significantly higher than necessary for the actual complexity being processed.
How the routing algorithm works
Manifest scores each incoming request across 23 dimensions including prompt length, presence of code blocks or mathematical reasoning, context window requirements, tool use intent, output format (such as JSON schema or streaming), vision inputs, and the historical difficulty of similar prompts.
Based on this score, the request is assigned to a complexity tier (simple, standard, complex, reasoning, or coding). Each tier has a configured primary model and fallbacks. Manifest forwards the request to the primary model for that tier. If it fails, it retries transparently against the fallback list.
The algorithm is deterministic and runs entirely locally.
Installation
Manifest runs as a set of Docker containers. A one-line installer downloads the docker-compose.yml and environment files:
Docker pulls the Manifest and Postgres images and starts the containers. After startup, the dashboard is available at http://localhost:2099.
Initial configuration
Create an admin account on first login at http://localhost:2099, then connect LLM providers under Connect providers. Select the API Keys tab for cloud providers like OpenAI and Anthropic, or the Local tab for Ollama instances. Toggle each provider and enter the relevant API key. Ollama requires no key.
Connecting an agent
From the dashboard, click + Connect Agent, name the agent, and select the SDK type (OpenAI in this example). Manifest generates a base URL and an API key specific to that agent.
The base URL is http://localhost:2099/v1. The API key is generated by Manifest and is separate from any provider API key.
Python integration
The response arrives identically to a direct OpenAI call. Manifest intercepted the request, scored it, and routed it to the most cost-effective model for the task.
Dashboard and observability
The dashboard overview shows message count, total cost, total tokens, and a savings metric comparing actual spend to what the same requests would have cost against a default expensive model.
The Messages tab logs every request with the model selected, cost per call, input and output token counts, latency, and status. This makes it straightforward to audit routing decisions and understand where spend is going.
Comparison with similar tools
| Feature | Manifest | OpenRouter | LiteLLM |
|---|---|---|---|
| Primary goal | Cost optimization | Unified access | Unified interface |
| Routing | Automatic and intelligent | Manual (you specify the model) | Manual (you write the logic) |
| Hosting | Self-hosted and cloud | Cloud only | Self-hosted |
| Privacy | High (prompts stay local) | Lower (prompts go to their cloud) | High (prompts stay local) |
| Key feature | 23-dimension scoring | Broadest model access | Standardized API format |
OpenRouter solves the problem of accessing many models through one endpoint. LiteLLM standardizes the API format across providers. Manifest solves automatic model selection, which neither of the others addresses. The tools serve different problems and can be complementary.
Notable features and current limitations
Subscription routing lets Manifest use existing ChatGPT Plus or Claude Pro subscriptions instead of incurring per-token API costs, routing calls to them when appropriate.
Automatic fallbacks retry failed requests against the next model in the tier's fallback list, improving reliability without changes to the agent code.
Self-hosting means prompts and data never leave the local infrastructure for routing purposes, which matters for applications with data privacy requirements.
The scoring algorithm is not fully transparent by default, which means initial trust in its decisions is required before you have enough traffic to validate them through the dashboard. Tier overrides are available for cases where the default routing is incorrect.
As a newer project, the ecosystem is still growing. First-party SDKs for languages beyond Python are limited, and integrations with logging and storage systems are fewer than in more mature tools.
Final thoughts
Manifest is most valuable when an agent runs at a scale where the cost difference between routing simple calls to cheap models versus expensive ones is significant. For small-scale or infrequent use, the setup overhead is not justified.
For agents making hundreds or thousands of calls daily, the combination of automatic routing, fallback reliability, and per-call observability makes it worth the infrastructure cost of running the Docker containers.
The source code and further documentation are available at github.com/mnfst/manifest.