Manifest: Automatic LLM Routing to Reduce AI Agent Costs

Stanley Ulili

Updated on May 4, 2026

Why model routing matters
How the routing algorithm works
Installation
Initial configuration
Connecting an agent
Python integration
Dashboard and observability
Comparison with similar tools
Notable features and current limitations
Final thoughts

Manifest is a self-hosted proxy that sits between an AI agent and LLM providers. When a request arrives, it runs a deterministic scoring algorithm against 23 dimensions of the prompt and routes the request to the cheapest model capable of handling that complexity level. The routing decision runs locally in under 2 milliseconds and does not call another LLM, so it adds no meaningful latency.

The integration change is minimal: point your existing OpenAI client at http://localhost:2099/v1 and use manifest/auto as the model name. No agent logic changes are required.

Why model routing matters

AI agents make far more low-complexity calls than high-complexity ones. Classification, intent routing, summarization, and entity extraction do not require a top-tier model, but in a single-model setup every call pays the premium price regardless of difficulty.

Graph showing exponential monthly spend growth when using GPT-4o/Opus versus a cheaper model like Haiku/GPT-mini for the same workload

A single user interaction can trigger dozens of internal LLM calls. At scale, routing all of them to an expensive model produces bills that are significantly higher than necessary for the actual complexity being processed.

How the routing algorithm works

Manifest scores each incoming request across 23 dimensions including prompt length, presence of code blocks or mathematical reasoning, context window requirements, tool use intent, output format (such as JSON schema or streaming), vision inputs, and the historical difficulty of similar prompts.

Based on this score, the request is assigned to a complexity tier (simple, standard, complex, reasoning, or coding). Each tier has a configured primary model and fallbacks. Manifest forwards the request to the primary model for that tier. If it fails, it retries transparently against the fallback list.

The algorithm is deterministic and runs entirely locally.

Manifest GitHub page stating "Manifest is a smart model router for agents and AI applications that redirects each query to the right model, saving up to 70% in AI costs"

Installation

Manifest runs as a set of Docker containers. A one-line installer downloads the docker-compose.yml and environment files:

Copied!

bash <(curl -sSL https://raw.githubusercontent.com/mnfst/manifest/main/docker/install.sh) --dir ~/manifest-test

Terminal showing the execution of the curl command initiating the Manifest self-host installer

Copied!

cd ~/manifest-test

Copied!

docker compose up -d

Docker pulls the Manifest and Postgres images and starts the containers. After startup, the dashboard is available at http://localhost:2099.

Initial configuration

Create an admin account on first login at http://localhost:2099, then connect LLM providers under Connect providers. Select the API Keys tab for cloud providers like OpenAI and Anthropic, or the Local tab for Ollama instances. Toggle each provider and enter the relevant API key. Ollama requires no key.

Connecting an agent

From the dashboard, click + Connect Agent, name the agent, and select the SDK type (OpenAI in this example). Manifest generates a base URL and an API key specific to that agent.

The "Set up agent: openai" modal showing the Base URL and API Key needed to connect an application to Manifest

The base URL is http://localhost:2099/v1. The API key is generated by Manifest and is separate from any provider API key.

Python integration

main.py

Copied!

from openai import OpenAI

BASE_URL = "http://localhost:2099/v1"

# Use the API key generated by Manifest, not your OpenAI key
API_KEY = "YOUR_MANIFEST_API_KEY"

client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY,
)

response = client.chat.completions.create(
    model="manifest/auto",  # triggers smart routing
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Explain in a simple sentence why using different AI models for different tasks can save money."
        }
    ],
    max_completion_tokens=100
)

print(response.choices[0].message.content)
print(response.usage)

Close-up of the Python code highlighting the BASE_URL and model="manifest/auto" lines which are the key changes for integration

Copied!

python main.py

The response arrives identically to a direct OpenAI call. Manifest intercepted the request, scored it, and routed it to the most cost-effective model for the task.

Dashboard and observability

The dashboard overview shows message count, total cost, total tokens, and a savings metric comparing actual spend to what the same requests would have cost against a default expensive model.

Manifest dashboard showing the OpenAI Overview with graphs for messages, cost, token usage, and a Savings tile showing a 34% reduction

The Messages tab logs every request with the model selected, cost per call, input and output token counts, latency, and status. This makes it straightforward to audit routing decisions and understand where spend is going.

Comparison with similar tools

Feature	Manifest	OpenRouter	LiteLLM
Primary goal	Cost optimization	Unified access	Unified interface
Routing	Automatic and intelligent	Manual (you specify the model)	Manual (you write the logic)
Hosting	Self-hosted and cloud	Cloud only	Self-hosted
Privacy	High (prompts stay local)	Lower (prompts go to their cloud)	High (prompts stay local)
Key feature	23-dimension scoring	Broadest model access	Standardized API format

OpenRouter solves the problem of accessing many models through one endpoint. LiteLLM standardizes the API format across providers. Manifest solves automatic model selection, which neither of the others addresses. The tools serve different problems and can be complementary.

Notable features and current limitations

Subscription routing lets Manifest use existing ChatGPT Plus or Claude Pro subscriptions instead of incurring per-token API costs, routing calls to them when appropriate.

Automatic fallbacks retry failed requests against the next model in the tier's fallback list, improving reliability without changes to the agent code.

Self-hosting means prompts and data never leave the local infrastructure for routing purposes, which matters for applications with data privacy requirements.

The scoring algorithm is not fully transparent by default, which means initial trust in its decisions is required before you have enough traffic to validate them through the dashboard. Tier overrides are available for cases where the default routing is incorrect.

As a newer project, the ecosystem is still growing. First-party SDKs for languages beyond Python are limited, and integrations with logging and storage systems are fewer than in more mature tools.

Final thoughts

Manifest is most valuable when an agent runs at a scale where the cost difference between routing simple calls to cheap models versus expensive ones is significant. For small-scale or infrequent use, the setup overhead is not justified.

For agents making hundreds or thousands of calls daily, the combination of automatic routing, fallback reliability, and per-call observability makes it worth the infrastructure cost of running the Docker containers.

The source code and further documentation are available at github.com/mnfst/manifest.

Got an article suggestion? Let us know

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Manifest: Automatic LLM Routing to Reduce AI Agent Costs

Contents

Why model routing matters

How the routing algorithm works

Installation

Initial configuration

Connecting an agent

Python integration

Dashboard and observability

Comparison with similar tools

Notable features and current limitations

Final thoughts

Please accept cookies