Context Mode: Reducing AI Context Bloat with an MCP Server

Stanley Ulili

Updated on March 23, 2026

Context bloat and why it happens
How Context Mode works
Installation
Analyzing a log file with Context Mode
Final thoughts

Context Mode is an open-source MCP (Model, Code, & Process) server that sits between an AI coding agent and its tools. Instead of letting raw tool output flood the context window, it intercepts each call, indexes large outputs locally in a SQLite FTS5 database, and returns only a concise summary or the specific matching content the AI needs. It also persists session events so that when the context compacts, the AI can be given a prioritized snapshot of what happened rather than starting cold.

Context bloat and why it happens

AI coding agents operate within a finite context window. Each tool call an agent makes, such as reading a file, listing a directory, or taking a browser snapshot, returns its full raw output directly into that window. The output stays there for the duration of the session, consuming tokens on every subsequent request even when the content is no longer relevant.

Claude Code context usage monitor with the "MCP tools" line item highlighted, showing it consumes 40,000 tokens or 20% of the total context

The cumulative effect is significant. A single Playwright page snapshot can be around 56 KB. Reading 20 GitHub issues can produce 59 KB. A moderately sized server access log can reach 45 KB. Performing a handful of these operations early in a session can consume over 143,000 tokens, more than 70% of a 200k token context window, before any substantial coding has begun.

Once the context approaches its limit, the agent is forced to compact: older content is summarized and discarded to make room. This causes the AI to lose track of requirements given earlier, forget the structure of files it just edited, and sometimes repeat failed approaches because it no longer has a record of what it already tried. A productive session can degrade to ineffectiveness within 30 minutes under these conditions.

How Context Mode works

The sandbox architecture

Context Mode intercepts tool calls before they execute on the operating system and runs them in an isolated subprocess. The raw output is captured by Context Mode rather than returned directly to the agent.

For small outputs under 5 KB, Context Mode generates and returns a brief summary. For larger outputs, the data is chunked and indexed into a per-project FTS5 (Full-Text Search version 5) database, an extension built into SQLite. The agent receives a confirmation that the content has been indexed and can then issue targeted queries against it. Only the matching sections come back into the context window.

This means a 56 KB Playwright snapshot or a 59 KB GitHub issues list never enters the context at all. The agent works with concise, query-scoped answers instead.

Token savings in practice

The reduction in context usage is substantial across common development operations.

Benchmark table titled "Before vs After" comparing raw output sizes with Context Mode sizes and showing percentage savings for operations like Playwright snapshots, GitHub issues, access logs, and CSV files

Operation	Raw output	With Context Mode	Saved
Playwright snapshot	56.2 KB	299 B	99%
GitHub issues (20)	58.9 KB	1.1 KB	98%
Access log (500 req)	45.1 KB	155 B	100%
Analytics CSV (500 rows)	85.5 KB	222 B	100%
Repo research (subagent)	986 KB	62 KB	94%
Total	~315 KB	~5.4 KB	98%

Session continuity

Context Mode monitors the session using hooks that capture events as they occur: file edits, reads, and creations; tasks created and completed; git commits, pushes, and diffs; errors encountered; and key decisions or corrections made. These events are written to the persistent SQLite database.

When a context compaction occurs, Context Mode builds a priority-tiered snapshot of the session state, typically under 2 KB, and injects it back into the newly cleared context. This acts as a structured "previously on" summary, preserving the most critical file states, recent actions, and decisions. The practical effect is that sessions that would normally degrade after 30 minutes can remain coherent for several hours.

Installation

Claude Code

Add the Context Mode marketplace entry to Claude Code:

Copied!

/plugin marketplace add mksglu/context-mode

Then install the plugin:

Copied!

/plugin install context-mode@mksglu/context-mode

When prompted to choose an installation scope, select the appropriate option. Context Mode is now active and will automatically manage the MCP server and intercept tool calls.

Gemini CLI and VS Code Copilot

Install the package globally:

Copied!

npm install -g context-mode

Create a .vscode/mcp.json file in the project root to register the server:

.vscode/mcp.json

Copied!

{
  "servers": [
    {
      "command": "context-mode"
    }
  ]
}

Add the hooks configuration at .github/hooks/context-mode.json:

.github/hooks/context-mode.json

Copied!

{
  "hooks": {
    "PreToolUse": [
      {
        "type": "command",
        "command": "context-mode hook vscode-copilot pretooluse"
      }
    ],
    "PostToolUse": [
      {
        "type": "command",
        "command": "context-mode hook vscode-copilot posttooluse"
      }
    ],
    "SessionStart": [
      {
        "type": "command",
        "command": "context-mode hook vscode-copilot sessionstart"
      }
    ]
  }
}

Replace vscode-copilot with gemini-cli when configuring for Gemini CLI. After saving both files, restart the editor. The sessionstart hook will handle routing instructions on the first session.

Analyzing a log file with Context Mode

The following example demonstrates Context Mode against a 5,000-line access log where every 100th entry is an HTTP 500 error. The file can be generated with:

Copied!

python3 -c 'import random; [print(f"192.168.1.{random.randint(1,255)} - - [10/Mar/2026:15:00:{i:02d}] \"GET /api/v1/resource HTTP/1.1\" {200 if i%100!=0 else 500} {random.randint(100,2000)}") for i in range(5000)]' > access.log

This produces a 20.1 KB file. Prompting the agent to index it and find 500 error patterns:

Use context-mode to index access.log. Find all the 500 error patterns and summarize the IP addresses associated with them.

Instead of reading the entire file into the context, the agent uses ctx_batch_execute, which chunks the file, runs shell commands like grep, sort, and uniq against it, and indexes the results into the FTS5 database. The agent receives a concise summary of the indexed content and can query it for the specific error patterns and IP addresses.

Running ctx-stats after the session shows the savings:

Copied!

/context-mode:ctx-stats

Context Window Protection session report from the ctx-stats command showing total data processed, amount kept in sandbox, estimated tokens saved, and context savings percentage

For this single file, the report shows 20.1 KB processed, 4.9 KB kept in the sandbox, and a 25% reduction in context usage for this operation. Across a full session with multiple large files, API responses, and repeated tool calls, those savings compound substantially.

Final thoughts

Context Mode addresses a practical bottleneck in AI-assisted development. The FTS5 indexing approach is well-suited to the actual access patterns of coding agents, which typically need targeted answers rather than full raw output. Session continuity solves a real and frustrating problem: the AI losing track of its own work mid-session.

The token savings are most significant in workflows that involve large files, web snapshots, or extensive API responses. For agents operating within tight context budgets or long-running tasks, the difference between working with and without Context Mode is measurable both in cost and in session coherence.

The project is maintained at github.com/mksglu/context-mode.

Got an article suggestion? Let us know

Recurring Tasks in Claude Code: The /loop Skill and Desktop Scheduler

Claude Code's /loop skill schedules recurring tasks using cron within an active terminal session, with a 3-day expiration and no persistence across restarts. The Claude Desktop app provides a separate scheduler for tasks that need to survive reboots and run indefinitely.

→