# Manifest: Automatic LLM Routing to Reduce AI Agent Costs

[Manifest](https://github.com/mnfst/manifest) is a self-hosted proxy that sits between an AI agent and LLM providers. When a request arrives, it runs a deterministic scoring algorithm against 23 dimensions of the prompt and routes the request to the cheapest model capable of handling that complexity level. The routing decision runs locally in under 2 milliseconds and does not call another LLM, so it adds no meaningful latency.

The integration change is minimal: point your existing OpenAI client at `http://localhost:2099/v1` and use `manifest/auto` as the model name. No agent logic changes are required.

## Why model routing matters

AI agents make far more low-complexity calls than high-complexity ones. Classification, intent routing, summarization, and entity extraction do not require a top-tier model, but in a single-model setup every call pays the premium price regardless of difficulty.

![Graph showing exponential monthly spend growth when using GPT-4o/Opus versus a cheaper model like Haiku/GPT-mini for the same workload](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/9bd5e306-e32d-4d63-607b-56e0faed7e00/public =1280x720)

A single user interaction can trigger dozens of internal LLM calls. At scale, routing all of them to an expensive model produces bills that are significantly higher than necessary for the actual complexity being processed.

## How the routing algorithm works

Manifest scores each incoming request across 23 dimensions including prompt length, presence of code blocks or mathematical reasoning, context window requirements, tool use intent, output format (such as JSON schema or streaming), vision inputs, and the historical difficulty of similar prompts.

Based on this score, the request is assigned to a complexity tier (simple, standard, complex, reasoning, or coding). Each tier has a configured primary model and fallbacks. Manifest forwards the request to the primary model for that tier. If it fails, it retries transparently against the fallback list.

The algorithm is deterministic and runs entirely locally.

![Manifest GitHub page stating "Manifest is a smart model router for agents and AI applications that redirects each query to the right model, saving up to 70% in AI costs"](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/2a2e88db-0897-4b90-1305-ee88a353e500/lg2x =1280x720)

## Installation

Manifest runs as a set of Docker containers. A one-line installer downloads the `docker-compose.yml` and environment files:

```command
bash <(curl -sSL https://raw.githubusercontent.com/mnfst/manifest/main/docker/install.sh) --dir ~/manifest-test
```

![Terminal showing the execution of the curl command initiating the Manifest self-host installer](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/55f8defb-7c3e-46dd-74ea-4ea307acb800/md2x =1280x720)

```command
cd ~/manifest-test
```

```command
docker compose up -d
```

Docker pulls the Manifest and Postgres images and starts the containers. After startup, the dashboard is available at `http://localhost:2099`.

## Initial configuration

Create an admin account on first login at `http://localhost:2099`, then connect LLM providers under **Connect providers**. Select the API Keys tab for cloud providers like OpenAI and Anthropic, or the Local tab for Ollama instances. Toggle each provider and enter the relevant API key. Ollama requires no key.

## Connecting an agent

From the dashboard, click **+ Connect Agent**, name the agent, and select the SDK type (OpenAI in this example). Manifest generates a base URL and an API key specific to that agent.

![The "Set up agent: openai" modal showing the Base URL and API Key needed to connect an application to Manifest](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/ed07ddf9-9801-42d8-0337-bd2fbe7d4100/lg1x =1280x720)

The base URL is `http://localhost:2099/v1`. The API key is generated by Manifest and is separate from any provider API key.

## Python integration

```python
[label main.py]
from openai import OpenAI

BASE_URL = "http://localhost:2099/v1"

# Use the API key generated by Manifest, not your OpenAI key
API_KEY = "YOUR_MANIFEST_API_KEY"

client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY,
)

response = client.chat.completions.create(
    model="manifest/auto",  # triggers smart routing
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Explain in a simple sentence why using different AI models for different tasks can save money."
        }
    ],
    max_completion_tokens=100
)

print(response.choices[0].message.content)
print(response.usage)
```

![Close-up of the Python code highlighting the BASE_URL and model="manifest/auto" lines which are the key changes for integration](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/20d82a32-3906-4dc7-7d45-e4ebacebb800/public =1280x720)

```command
python main.py
```

The response arrives identically to a direct OpenAI call. Manifest intercepted the request, scored it, and routed it to the most cost-effective model for the task.

## Dashboard and observability

The dashboard overview shows message count, total cost, total tokens, and a savings metric comparing actual spend to what the same requests would have cost against a default expensive model.

![Manifest dashboard showing the OpenAI Overview with graphs for messages, cost, token usage, and a Savings tile showing a 34% reduction](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/5e0d3fed-61c9-4cec-acfc-ef9701ab7800/lg1x =1280x720)

The **Messages** tab logs every request with the model selected, cost per call, input and output token counts, latency, and status. This makes it straightforward to audit routing decisions and understand where spend is going.

## Comparison with similar tools

| Feature | Manifest | OpenRouter | LiteLLM |
|---|---|---|---|
| Primary goal | Cost optimization | Unified access | Unified interface |
| Routing | Automatic and intelligent | Manual (you specify the model) | Manual (you write the logic) |
| Hosting | Self-hosted and cloud | Cloud only | Self-hosted |
| Privacy | High (prompts stay local) | Lower (prompts go to their cloud) | High (prompts stay local) |
| Key feature | 23-dimension scoring | Broadest model access | Standardized API format |

OpenRouter solves the problem of accessing many models through one endpoint. LiteLLM standardizes the API format across providers. Manifest solves automatic model selection, which neither of the others addresses. The tools serve different problems and can be complementary.

## Notable features and current limitations

**Subscription routing** lets Manifest use existing ChatGPT Plus or Claude Pro subscriptions instead of incurring per-token API costs, routing calls to them when appropriate.

**Automatic fallbacks** retry failed requests against the next model in the tier's fallback list, improving reliability without changes to the agent code.

**Self-hosting** means prompts and data never leave the local infrastructure for routing purposes, which matters for applications with data privacy requirements.

The scoring algorithm is not fully transparent by default, which means initial trust in its decisions is required before you have enough traffic to validate them through the dashboard. Tier overrides are available for cases where the default routing is incorrect.

As a newer project, the ecosystem is still growing. First-party SDKs for languages beyond Python are limited, and integrations with logging and storage systems are fewer than in more mature tools.

## Final thoughts

Manifest is most valuable when an agent runs at a scale where the cost difference between routing simple calls to cheap models versus expensive ones is significant. For small-scale or infrequent use, the setup overhead is not justified.

For agents making hundreds or thousands of calls daily, the combination of automatic routing, fallback reliability, and per-call observability makes it worth the infrastructure cost of running the Docker containers.

The source code and further documentation are available at [github.com/mnfst/manifest](https://github.com/mnfst/manifest).