# Caveman: Reducing LLM Output Tokens by up to 75% with a Prompt Skill

[Caveman](https://github.com/juliusbrussee/caveman) **is a prompt skill that rewrites how an LLM structures its output**. Instead of conversational paragraphs, the model produces dense, fact-based statements with articles dropped, hedging removed, and arrows used for causality instead of full sentences. The claimed output token reduction is up to 75%, though real-world results vary by query type.

## The verbosity problem

LLMs are trained to be conversational. A response to "how does authentication work in this app?" might open with "Here's how auth works in this app" and spend several sentences framing the context before delivering the actual facts. Every word in that response is a token, and API pricing for both input and output is calculated in tokens.

For developers running many queries or building chat applications, this conversational padding adds up. The goal with Caveman is to have the model function as a pure information retrieval engine rather than a conversational partner.

**Before Caveman**, a standard Claude response to a question about an auth system might read:

![Typical verbose output from Claude Code before applying the Caveman skill](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/ef76f42d-0953-4e57-6eb7-a598e0495700/lg2x =1280x720)

> Here's how auth works in this app. This is a simulated authentication system — no backend, no passwords, no real security. It exists to demonstrate Better Stack RUM user tracking...

**After Caveman**, the same question produces:

![Hyper-concise and direct output generated after activating the Caveman skill](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/7f87bd8d-83c3-4eb6-72b6-2ca01e779800/md2x =1280x720)

```text
Auth in this app
Demo-only; client-side auth. No real security. Built for Better Stack RUM tracking demos.

How it works
- auth-provider.tsx -> React context. Store user in localStorage key ...

Flow
- App load -> check localStorage for saved user
- User pick demo persona -> save to localStorage + set Better Stack RUM identity
```

The technical content is identical. The conversational scaffolding is gone.

## Why brevity may also improve accuracy

A research paper titled "Brevity Constraints Reverse Performance Hierarchies in Language Models" found that constraining large models to brief responses improved accuracy by up to 26 percentage points on certain benchmarks. The researchers identified the cause as "spontaneous scale-dependent verbosity": larger models tend to over-elaborate their reasoning, which introduces errors and steers them away from correct answers. Forcing brevity can remove the mechanism that causes these deviations.

![Caveman repository referencing the research paper on brevity improving LLM accuracy](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/f16d18ec-788d-4c24-c993-d5748f12a100/md2x =1280x720)

This suggests Caveman is not just a cost optimization. It is also a mitigation for a documented failure mode in large models.

## Token economics

### Output savings

Benchmark tests comparing a standard baseline, a simple "be concise" prompt, and Caveman produced the following results across 10 prompts:

![Dashboard comparing output token usage across Baseline, Terse, and Caveman showing Caveman's clear advantage](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/db21996a-2dc7-490d-264a-cbf91c51f600/public =1280x720)

Caveman achieved a 45% reduction in output tokens compared to the baseline and a 39% reduction compared to asking the model to simply be terse.

### Input cost and single-shot penalty

The Caveman skill is a detailed system prompt. Sending it adds a significant number of tokens to each initial request. In the benchmark, the input cost for a Caveman session was around 4 cents versus a fraction of a cent for a bare baseline prompt. For a single isolated query, Caveman was approximately 10% more expensive than baseline once both input and output are accounted for.

### How caching changes the math

In conversational sessions, modern LLM APIs cache the initial system prompt. On subsequent turns in the same session, the Caveman skill is not re-billed at full price. The high initial input cost is amortized over the conversation, and the consistent output token savings dominate. Factoring in prompt caching, the total cost savings across a multi-turn session were approximately 39%.

Caveman is most cost-effective in interactive chat sessions and agents with multi-step reasoning, not in single one-off API calls.

## Installation and setup

The skill is installed via the Vercel Skills package:

```command
npx skills add https://github.com/juliusbrussee/caveman --skill caveman
```

![Terminal command needed to install the Caveman skill into a project](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/6f914140-43be-4494-7ec7-c2ed721c5b00/public =1280x720)

Once installed, it can be activated with `/caveman` in a compatible chat interface, or loaded as a system prompt in code.

## The `skill.md` structure

The `skill.md` file is the actual system prompt the model receives. It contains:

**Rules:** Explicit negative constraints. Drop articles (`a`, `an`, `the`), pleasantries (`Sure, I'd be happy to help`), hedging (`likely`, `probably`), and verbose synonyms (`big` instead of `extensive`, `fix` instead of `implement a solution for`).

**Pattern:** A defined output structure: `[thing] [action] [reason]. [next step]`. This enforces a direct logical flow.

**Intensity levels:**

- `lite`: Removes filler and hedging, keeps articles and full sentences. Tight but readable.
- `full`: The default. Drops articles and sentence fragments. Classic Caveman style.
- `ultra`: Abbreviates common words (`db` for database), strips conjunctions, uses `->` for causality throughout.

**Wenyan mode:** Uses Classical Chinese (文言文) for maximum information density per character. Not practical for most use cases but demonstrates the theoretical ceiling of token compression.

## Related skills in the ecosystem

**`caveman-commit`** generates Git commit messages in Conventional Commits format. Messages are terse and exact, keeping version history scannable.

**`caveman-review`** produces one-line code review comments per finding: location, problem, and suggested fix without preamble.

**`caveman-compress`** rewrites a natural-language document (such as a `README.md` or preferences file) in Caveman style. The compressed version can be fed back to an LLM as context in future sessions, reducing input tokens for prompts that rely on that document.

## Final thoughts

Caveman is **most useful in two scenarios: interactive chat sessions where prompt caching makes the economics favorable, and agent pipelines** where the model reasons across multiple steps and verbose intermediate output accumulates cost. For single-shot queries it adds more input cost than it saves on output, so the expected use case matters before reaching for it.

The **accuracy angle from the brevity research paper is the more surprising argument for it**. If enforcing conciseness reduces over-elaboration errors in large models, it has value beyond cost optimization. Whether that effect holds for a given task and model is worth testing empirically rather than assuming.

The repository and full skill documentation are available at [github.com/juliusbrussee/caveman](https://github.com/juliusbrussee/caveman).