Bumblebee: Read-Only Endpoint Scanner for Developer Machine Supply-Chain Exposure
Bumblebee is an open-source Go binary from Perplexity AI that inventories a developer machine's packages, editor extensions, browser extensions, and AI tool configurations by parsing metadata files directly. It never executes package managers (npm, pip, and so on) or project code, which means it cannot trigger malicious lifecycle scripts in compromised packages. Output is NDJSON, structured for ingestion into security tooling or logging platforms.
The problem: developer machines as attack surface
Traditional security scanning focuses on repositories, container images, and production environments. A developer's local machine has its own attack surface that this pipeline does not cover.
A typical developer machine runs multiple package managers (npm, pip, Go modules, Bun), browser extensions, editor plugins, AI coding assistants, and local MCP servers. Each is a potential supply-chain attack vector. When a malicious package is discovered, the question is not only whether it reached production but whether any developer already has it installed locally.
What Bumblebee scans
Bumblebee reads on-disk metadata (lock files, extension manifests, configuration files) to inventory:
- Packages across Go, npm, pip, and other ecosystems
- Editor extensions (VS Code, JetBrains)
- Browser extensions
- AI tool and MCP server configurations
The pipeline allows a security team to take a new threat signal (a public advisory about a malicious package), create an exposure catalog, and immediately query which developer machines are affected.
Why read-only matters
npm and other package managers support lifecycle scripts: shell commands that run automatically at install time (preinstall, postinstall). Malicious packages can use these hooks to execute attacker-controlled code.
Running npm ls or similar commands in project directories to inventory packages risks triggering these scripts. Bumblebee parses the metadata files instead of invoking the package manager, so it cannot trigger lifecycle scripts regardless of what they contain.
Installation and usage
Bumblebee is a single self-contained binary with no daemons or non-standard library dependencies.
Self-test
Expected output: selftest OK (3 findings in 4ms). This verifies the binary is working correctly before running a live scan.
Baseline scan
The baseline profile scans common global and user-level locations for package roots, editor extensions, browser extensions, and AI tool configs. It completes in seconds.
Reading the output
Each line in the NDJSON file is a self-contained JSON record:
A sample record:
Each record includes the endpoint details, the package ecosystem and version, the exact source file where the dependency was found, and whether the package has lifecycle scripts.
Scan profiles
baseline scans common global and user-level package roots. Suitable for regular lightweight inventory, runs in seconds.
project-root scans workspace directories where developers keep active code (such as ~/code or ~/src). Useful for checking dependencies in project lock files.
deep is the incident response profile. It recursively searches one or more explicit root directories for any evidence of packages. Slower and more thorough.
--exposure-catalog accepts a JSON file of known malicious packages. --findings-only suppresses records that do not match the catalog, producing a focused incident report. --max-duration ensures the scan completes in a bounded time window.
How Bumblebee relates to other security tools
SCA (Software Composition Analysis) analyzes a project's declared dependencies for known vulnerabilities. It covers what a team is building.
SBOM (Software Bill of Materials) creates a formal manifest of everything in a released artifact. It covers what a team ships.
EDR (Endpoint Detection and Response) monitors runtime process behavior. It covers what executed on a machine.
Bumblebee covers the local developer state: everything present on a developer's machine, including packages from old clones, globally installed tools, and extensions, regardless of whether any of it is part of an active project or ever shipped.
These categories are complementary. Bumblebee fills a gap that the others do not address.
Final thoughts
The practical workflow is: run baseline scans regularly and store the NDJSON output centrally. When a new advisory appears, query the stored inventory for the affected package names and versions to identify exposed machines without waiting for developers to self-report.
The read-only design is what makes this safe to run during an active incident. A compromised machine should not be prodded with commands that execute code from its package directories. Bumblebee's metadata-only approach is appropriate for both routine hygiene and incident response.
Source code and documentation are at github.com/perplexity-ai/bumblebee.