# Bumblebee: Read-Only Endpoint Scanner for Developer Machine Supply-Chain Exposure

[Bumblebee](https://github.com/perplexity-ai/bumblebee) is **an open-source Go binary from Perplexity AI that inventories a developer machine's packages, editor extensions, browser extensions, and AI tool** configurations by parsing metadata files directly. It never executes package managers (`npm`, `pip`, and so on) or project code, which means it cannot trigger malicious lifecycle scripts in compromised packages. Output is NDJSON, structured for ingestion into security tooling or logging platforms.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/L6iAw5yitfc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>


## The problem: developer machines as attack surface

Traditional security scanning focuses on repositories, container images, and production environments. A developer's local machine has its own attack surface that this pipeline does not cover.

![Diagram showing the Universe of Tools on a developer laptop with connections to package managers, browser extensions, AI tools, and more](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/00761436-f2ae-4b1d-ec38-e993244c9300/md1x =1280x720)

A typical developer machine runs multiple package managers (npm, pip, Go modules, Bun), browser extensions, editor plugins, AI coding assistants, and local MCP servers. Each is a potential supply-chain attack vector. When a malicious package is discovered, the question is not only whether it reached production but whether any developer already has it installed locally.

## What Bumblebee scans

Bumblebee reads on-disk metadata (lock files, extension manifests, configuration files) to inventory:

- Packages across Go, npm, pip, and other ecosystems
- Editor extensions (VS Code, JetBrains)
- Browser extensions
- AI tool and MCP server configurations

![Bumblebee scanning pipeline showing how threat intel feeds into an exposure catalog, which Bumblebee uses to scan devices and produce logs and inventories](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/d619cef4-da05-4cd9-4228-4883b21f4000/md1x =1280x720)

The pipeline allows a security team to take a new threat signal (a public advisory about a malicious package), create an exposure catalog, and immediately query which developer machines are affected.

## Why read-only matters

npm and other package managers support lifecycle scripts: shell commands that run automatically at install time (`preinstall`, `postinstall`). Malicious packages can use these hooks to execute attacker-controlled code.

![Animation showing how a scanner executing package manager commands can trigger a malicious preinstall hook](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/e6b74057-049b-4611-0a20-08acde9d9e00/orig =1280x720)

Running `npm ls` or similar commands in project directories to inventory packages risks triggering these scripts. Bumblebee parses the metadata files instead of invoking the package manager, so it cannot trigger lifecycle scripts regardless of what they contain.

## Installation and usage

Bumblebee is a single self-contained binary with no daemons or non-standard library dependencies.

```command
go install github.com/perplexity-ai/bumblebee/cmd/bumblebee@latest
```

![go install command being executed in a terminal window](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/c7b38be3-afed-40c7-5127-398023dd1600/md2x =1280x720)

### Self-test

```command
bumblebee selftest
```

Expected output: `selftest OK (3 findings in 4ms)`. This verifies the binary is working correctly before running a live scan.

### Baseline scan

```command
bumblebee scan --profile baseline > inventory.ndjson
```

![bumblebee scan command with baseline profile redirecting output to inventory.ndjson](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/e1ede177-1aa1-4b04-e6ad-f594ee1f3100/lg1x =1280x720)

The baseline profile scans common global and user-level locations for package roots, editor extensions, browser extensions, and AI tool configs. It completes in seconds.

### Reading the output

Each line in the NDJSON file is a self-contained JSON record:

```command
head -n 1 inventory.ndjson | jq
```

![Formatted JSON object showing a package record with keys including record_type, ecosystem, package_name, and version](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/c7b425ac-7b99-45df-577c-9f6169dbb000/lg2x =1280x720)

A sample record:

```json
{
  "record_type": "package",
  "scanner_name": "bumblebee",
  "scan_time": "2026-05-25T00:39:17.808781Z",
  "endpoint": {
    "hostname": "Joshs-MacBook-Pro.local",
    "os": "darwin",
    "arch": "arm64",
    "username": "josh"
  },
  "profile": "baseline",
  "ecosystem": "go",
  "package_name": "github.com/davecgh/go-spew",
  "version": "v1.1.1",
  "project_path": "/Users/josh/go/pkg/mod/github.com/!l!b!i!m/sarama@v1.43.3/",
  "source_file": "/Users/josh/go/pkg/mod/github.com/!l!b!i!m/sarama@v1.43.3/go.mod",
  "od": {
    "direct_dependency": true,
    "has_lifecycle_scripts": false,
    "confidence": "medium"
  }
}
```

Each record includes the endpoint details, the package ecosystem and version, the exact source file where the dependency was found, and whether the package has lifecycle scripts.

## Scan profiles

**`baseline`** scans common global and user-level package roots. Suitable for regular lightweight inventory, runs in seconds.

**`project-root`** scans workspace directories where developers keep active code (such as `~/code` or `~/src`). Useful for checking dependencies in project lock files.

**`deep`** is the incident response profile. It recursively searches one or more explicit root directories for any evidence of packages. Slower and more thorough.

```command
bumblebee scan --profile deep \
  --root /Users/josh \
  --exposure-catalog ./catalog.json \
  --findings-only \
  --max-duration 5m > findings.ndjson
```

![CLI showing flags and options for the deep scan profile in Bumblebee](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/9ba84765-a943-46d0-4247-aa048a7dbb00/md2x =1280x720)

`--exposure-catalog` accepts a JSON file of known malicious packages. `--findings-only` suppresses records that do not match the catalog, producing a focused incident report. `--max-duration` ensures the scan completes in a bounded time window.

## How Bumblebee relates to other security tools

**SCA (Software Composition Analysis)** analyzes a project's declared dependencies for known vulnerabilities. It covers what a team is building.

**SBOM (Software Bill of Materials)** creates a formal manifest of everything in a released artifact. It covers what a team ships.

**EDR (Endpoint Detection and Response)** monitors runtime process behavior. It covers what executed on a machine.

**Bumblebee** covers the local developer state: everything present on a developer's machine, including packages from old clones, globally installed tools, and extensions, regardless of whether any of it is part of an active project or ever shipped.

These categories are complementary. Bumblebee fills a gap that the others do not address.

## Final thoughts

The practical workflow is: run `baseline` scans regularly and store the NDJSON output centrally. When a new advisory appears, query the stored inventory for the affected package names and versions to identify exposed machines without waiting for developers to self-report.

The **read-only design is what makes this safe to run during an active incident. A compromised machine should not be prodded with commands** that execute code from its package directories. Bumblebee's metadata-only approach is appropriate for both routine hygiene and incident response.

Source code and documentation are at [github.com/perplexity-ai/bumblebee](https://github.com/perplexity-ai/bumblebee).