Context Mode: Reducing AI Context Bloat with an MCP Server
Context Mode is an open-source MCP (Model, Code, & Process) server that sits between an AI coding agent and its tools. Instead of letting raw tool output flood the context window, it intercepts each call, indexes large outputs locally in a SQLite FTS5 database, and returns only a concise summary or the specific matching content the AI needs. It also persists session events so that when the context compacts, the AI can be given a prioritized snapshot of what happened rather than starting cold.
Context bloat and why it happens
AI coding agents operate within a finite context window. Each tool call an agent makes, such as reading a file, listing a directory, or taking a browser snapshot, returns its full raw output directly into that window. The output stays there for the duration of the session, consuming tokens on every subsequent request even when the content is no longer relevant.
The cumulative effect is significant. A single Playwright page snapshot can be around 56 KB. Reading 20 GitHub issues can produce 59 KB. A moderately sized server access log can reach 45 KB. Performing a handful of these operations early in a session can consume over 143,000 tokens, more than 70% of a 200k token context window, before any substantial coding has begun.
Once the context approaches its limit, the agent is forced to compact: older content is summarized and discarded to make room. This causes the AI to lose track of requirements given earlier, forget the structure of files it just edited, and sometimes repeat failed approaches because it no longer has a record of what it already tried. A productive session can degrade to ineffectiveness within 30 minutes under these conditions.
How Context Mode works
The sandbox architecture
Context Mode intercepts tool calls before they execute on the operating system and runs them in an isolated subprocess. The raw output is captured by Context Mode rather than returned directly to the agent.
For small outputs under 5 KB, Context Mode generates and returns a brief summary. For larger outputs, the data is chunked and indexed into a per-project FTS5 (Full-Text Search version 5) database, an extension built into SQLite. The agent receives a confirmation that the content has been indexed and can then issue targeted queries against it. Only the matching sections come back into the context window.
This means a 56 KB Playwright snapshot or a 59 KB GitHub issues list never enters the context at all. The agent works with concise, query-scoped answers instead.
Token savings in practice
The reduction in context usage is substantial across common development operations.
| Operation | Raw output | With Context Mode | Saved |
|---|---|---|---|
| Playwright snapshot | 56.2 KB | 299 B | 99% |
| GitHub issues (20) | 58.9 KB | 1.1 KB | 98% |
| Access log (500 req) | 45.1 KB | 155 B | 100% |
| Analytics CSV (500 rows) | 85.5 KB | 222 B | 100% |
| Repo research (subagent) | 986 KB | 62 KB | 94% |
| Total | ~315 KB | ~5.4 KB | 98% |
Session continuity
Context Mode monitors the session using hooks that capture events as they occur: file edits, reads, and creations; tasks created and completed; git commits, pushes, and diffs; errors encountered; and key decisions or corrections made. These events are written to the persistent SQLite database.
When a context compaction occurs, Context Mode builds a priority-tiered snapshot of the session state, typically under 2 KB, and injects it back into the newly cleared context. This acts as a structured "previously on" summary, preserving the most critical file states, recent actions, and decisions. The practical effect is that sessions that would normally degrade after 30 minutes can remain coherent for several hours.
Installation
Claude Code
Add the Context Mode marketplace entry to Claude Code:
Then install the plugin:
When prompted to choose an installation scope, select the appropriate option. Context Mode is now active and will automatically manage the MCP server and intercept tool calls.
Gemini CLI and VS Code Copilot
Install the package globally:
Create a .vscode/mcp.json file in the project root to register the server:
Add the hooks configuration at .github/hooks/context-mode.json:
Replace vscode-copilot with gemini-cli when configuring for Gemini CLI. After saving both files, restart the editor. The sessionstart hook will handle routing instructions on the first session.
Analyzing a log file with Context Mode
The following example demonstrates Context Mode against a 5,000-line access log where every 100th entry is an HTTP 500 error. The file can be generated with:
This produces a 20.1 KB file. Prompting the agent to index it and find 500 error patterns:
Use context-mode to index access.log. Find all the 500 error patterns and summarize the IP addresses associated with them.
Instead of reading the entire file into the context, the agent uses ctx_batch_execute, which chunks the file, runs shell commands like grep, sort, and uniq against it, and indexes the results into the FTS5 database. The agent receives a concise summary of the indexed content and can query it for the specific error patterns and IP addresses.
Running ctx-stats after the session shows the savings:
For this single file, the report shows 20.1 KB processed, 4.9 KB kept in the sandbox, and a 25% reduction in context usage for this operation. Across a full session with multiple large files, API responses, and repeated tool calls, those savings compound substantially.
Final thoughts
Context Mode addresses a practical bottleneck in AI-assisted development. The FTS5 indexing approach is well-suited to the actual access patterns of coding agents, which typically need targeted answers rather than full raw output. Session continuity solves a real and frustrating problem: the AI losing track of its own work mid-session.
The token savings are most significant in workflows that involve large files, web snapshots, or extensive API responses. For agents operating within tight context budgets or long-running tasks, the difference between working with and without Context Mode is measurable both in cost and in session coherence.
The project is maintained at github.com/mksglu/context-mode.