# The CRAP Metric: Quantifying Code Risk with Cyclomatic Complexity and Test Coverage

The Change Risk Anti-Patterns (CRAP) index **is a code quality metric that combines cyclomatic complexity and test coverage into a single score**. It was introduced in 2007 by Alberto Savoia and Bob Evans in a [Google Testing Blog post](https://testing.googleblog.com/2011/02/this-code-is-crap.html) as a way to quantify the subjective experience of risky, hard-to-maintain code.

![Screenshot of the original Google Testing Blog post titled "This Code is CRAP" introducing the metric](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/0fe80b77-7154-49d0-7ac7-a0eeeb0ad800/orig =1280x720)

## The two components

<iframe width="100%" height="315" src="https://www.youtube.com/embed/XuMR1pgc6pc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>


### Cyclomatic complexity

Cyclomatic complexity counts the number of linearly independent execution paths through a function. Every decision point adds to this count:

- Conditional statements (`if`, `else if`, `else`)
- Loops (`for`, `while`, `loop`)
- Match arms or switch cases
- Ternary operators
- Error handling paths (`catch`, `?` in Rust)

A score of 1–5 is simple and easy to reason about. A score of 6–10 is moderately complex. Above 10, a function becomes significantly harder to understand; above 15, it is likely to cause problems when modified.

![Diagram illustrating cyclomatic complexity with nodes and edges showing how decision points create multiple paths](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/58074e59-02f6-4b83-864e-ae0a529e3e00/md2x =1280x720)

High complexity alone is not fatal. A complex but thoroughly tested function is manageable: tests document expected behavior and catch regressions when the code changes.

### Test coverage

Coverage measures the percentage of code executed by automated tests. Low coverage on simple code is low risk. Low coverage on complex code is the problem the CRAP metric targets.

100% line coverage also does not guarantee correctness. It means the code was executed, not that every path was asserted against the correct output. The CRAP metric addresses this by scaling risk with the untested fraction of a function's complexity.

## The formula

![CRAP formula displayed on screen: CRAP(m) = C² × (1 - cov)³ + C](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/eabc87ab-0e52-44be-17a3-2a80052e0800/lg2x =1280x720)

```
CRAP(m) = CC² × (1 - cov)³ + CC
```

Where `CC` is the cyclomatic complexity and `cov` is the coverage fraction (0 to 1).

The key design choices are the exponents. Complexity is squared, and the uncovered fraction is cubed. This makes the formula non-linear: increasing coverage on a complex function reduces the score dramatically, while adequate coverage on a simple function produces a score close to the bare complexity value.

**100% coverage:** `(1 - 1) = 0`, so the first term vanishes. CRAP equals CC. The risk is the inherent complexity, acknowledged but considered managed by the tests.

**0% coverage:** `(1 - 0)³ = 1`. CRAP = CC² + CC. For CC = 15: `225 + 15 = 240`.

**50% coverage, CC = 15:**
```
CRAP = 15² × (0.5)³ + 15
     = 225 × 0.125 + 15
     = 28.125 + 15
     = 43.125
```

Even 50% coverage leaves a score of 43 for a function with CC = 15, which most thresholds would still flag. The formula communicates that complex code requires high coverage to be considered low-risk, not just adequate coverage.

## Practical application with `cargo-crap`

`cargo-crap` is a Rust command-line tool that calculates CRAP scores from coverage data.

### Setup

```command
cargo install cargo-llvm-cov
```

```command
cargo install cargo-crap
```

### Generating coverage and running analysis

From the project root, generate an `lcov` coverage file:

```command
cargo llvm-cov --lcov --output-path lcov.info
```

Then run `cargo-crap` against it:

```command
cargo crap --lcov lcov.info
```

The tool calculates cyclomatic complexity for each function, combines it with coverage from `lcov.info`, and produces a ranked table.

### Reading the output

A well-tested complex function:

![cargo-crap output table showing a low CRAP score for a complex but well-tested function](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/a8600886-73c4-4f05-b0f9-cd050b453100/md1x =1280x720)

| CRAP | CC | Coverage | Function | Location |
|---|---|---|---|---|
| 13.0 | 13 | 96.0% | `process_device_telemetry` | ./main.rs:77 |

A function with CC = 13 and 96% coverage scores 13.0: well within acceptable range.

The same function with tests removed:

![cargo-crap output showing the CRAP score skyrocketing after tests are removed](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/03c6568f-d54b-45c1-949a-67e9d5d72c00/orig =1280x720)

| CRAP | CC | Coverage | Function | Location |
|---|---|---|---|---|
| 182.0 | 13 | 0.0% | `process_device_telemetry` | ./main.rs:77 |

CC is unchanged at 13. Coverage drops to 0%. CRAP jumps to 182.0. The score change is the formula's non-linear penalty for untested complexity, not a change in the code itself.

## CI integration

`cargo-crap` exits with a non-zero status code if any function exceeds the threshold, which causes CI to fail:

```command
cargo crap --lcov lcov.info --fail-above --threshold 30
```

`--threshold 30` is the default. Set it lower for stricter enforcement or higher for legacy codebases with accumulated debt.

For existing projects, `cargo-crap` supports a baseline mode: generate an initial report as the baseline, then configure the CI check to fail only when a new change increases the CRAP score of an existing function or introduces a new function above the threshold. This allows incremental improvement without halting feature development to address all existing debt at once.

## Relevance to AI-generated code

AI coding assistants generate syntactically correct and often complex code quickly. They are less consistent at generating the unit tests needed to cover all execution paths in that code. A function with CC = 20 may be produced in seconds; the 15 tests needed to cover it adequately may not follow.

![Blog post excerpt explaining that AI agents are generating code and moving quickly through codebases](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/67255a5e-49f5-4345-6d5f-d254991e0600/orig =1280x720)

A CI check on compilation and linting will pass for this code. A human reviewer may approve it based on plausibility. Without an objective score, the untested complexity enters the codebase silently. A CRAP threshold in CI catches this regardless of whether the code was written by a human or generated by a tool.

## Final thoughts

The **CRAP metric is most useful as a CI gate rather than a code review discussion point. Its value is in automation**: a threshold violation fails the build, which creates a clear prompt to either reduce complexity or add tests before the code merges. The formula's non-linear scaling means that teams do not need to enforce 100% coverage on all code, only on code complex enough to warrant it.

`cargo-crap` is Rust-specific, but the metric itself is language-agnostic. Similar tools exist for Java (`Jacoco`), PHP, JavaScript, and other ecosystems. The formula is the same regardless of the implementation.