AI Penetration Testing with Shannon
If you need security testing that keeps up with your development pace, Shannon is an open-source, autonomous AI penetration testing framework powered by Large Language Models and a multi-agent architecture to deliver continuous, in-depth security audits.
Instead of relying on slow, manual pentests, you can automate both static code analysis and real-world exploitation with browser automation, uncovering vulnerabilities like SQL Injection, XSS, and complex authorization bypasses before they reach production.
In this guide, you’ll learn how Shannon works, how to set it up, and how to interpret its zero-false-positive security reports, as you walk through a complete end-to-end test of a sample application and explore its multi-phase workflow in action.
The challenge with traditional penetration testing
Before diving into the technical details of Shannon, understanding the problem it aims to solve is essential. For many development teams, security testing is a critical but challenging part of the software development lifecycle (SDLC).
The high cost and time investment
Traditionally, companies hire external penetration testers or maintain a dedicated internal security team to audit applications before a major release. This process is highly manual and requires specialized expertise. As a result, it comes with a significant price tag.
Professional penetration testing services can cost thousands of dollars per day. A single engagement for a moderately complex application can easily run into tens of thousands of dollars. This cost is often repeated, as the cycle involves paying for an initial test, receiving a report with vulnerabilities, the development team fixing the identified issues, and paying for a re-test to verify the fixes.
This cycle consumes not only a significant budget but also valuable time, potentially pushing back release dates. For startups and smaller companies, this level of expenditure can be prohibitive, leaving them to choose between security and speed—a dangerous compromise.
The need for automation and integration
The cyclical nature of manual testing doesn't fit well with modern agile and DevOps practices, which emphasize continuous integration and continuous delivery (CI/CD). A manual security audit that takes days or weeks creates a major roadblock in a pipeline designed for rapid, automated deployments.
This is precisely where Shannon comes in. It offers an automated, repeatable, and cost-effective solution that can be run as often as needed. You can integrate it directly into your CI/CD pipeline, making security testing a seamless, automated part of every build. Because it's open-source, the primary cost is not the software itself, but the API usage of the underlying AI model, which is still a fraction of the cost of manual testing.
Getting started with Shannon: prerequisites and setup
Shannon is built upon the Anthropic Agent SDK, which means it uses Anthropic's Claude models as its intelligent core. To get started, you will need to configure your environment with the necessary credentials and clone the Shannon repository.
Essential prerequisites
Before running a test, ensure you have the following:
An Anthropic API Key: This is the most crucial requirement. Shannon uses the Claude API to power its AI agents. It's important to note that a standard Claude Pro or Max subscription will not work. You need to have access to the API and generate a specific API key. You will also need to have credits in your Anthropic account, as each test consumes tokens.
Docker and Docker Compose: Shannon orchestrates its various components and tools using Docker containers. Make sure you have both Docker and Docker Compose installed and running on your system.
Git: You'll need Git to clone the Shannon repository from GitHub.
A Target Application: You need a web application to test. For this tutorial, the OWASP Juice Shop will be used, a deliberately insecure web application designed for security training.
Installation and configuration
Setting up Shannon on your machine involves several steps.
First, open your terminal and clone the official Shannon repository from GitHub:
You need to provide your Anthropic API key to Shannon. There are two primary ways to do this. You can export environment variables (recommended), which is a straightforward method where you set the API key as an environment variable in your current terminal session:
Replace "your-api-key" with the actual key you obtained from your Anthropic account. The CLAUDE_CODE_MAX_OUTPUT_TOKENS variable is also recommended to ensure the model can generate comprehensive outputs.
Alternatively, you can create a .env file in the root of the shannon directory. This is useful if you don't want to export variables every time you open a new terminal:
Again, replace "your-api-key" with your actual key.
Shannon's code analysis agents need direct access to the source code of the application you want to test. To facilitate this, you must place your target application's repository inside the shannon/repos/ directory. Assuming you've cloned the OWASP Juice Shop project, you would move or clone it into this specific directory:
You are now fully set up and ready to launch your first AI-powered penetration test.
Launching your first penetration test
With the setup complete, running a test is as simple as executing a single command. For this demonstration, you'll first need to get the OWASP Juice Shop application running locally. You can follow the instructions on its official repository, but it typically involves running npm install and npm start. By default, it runs on http://127.0.0.1:3000.
The shannon start command
To initiate the pentest, use the shannon start command, providing the URL of the running application and the name of the repository folder.
Breaking down this command: ./shannon start is the main executable script to begin the workflow. URL=http://127.0.0.1:3000 tells Shannon's dynamic analysis agent (the one that uses a browser) where to find the live, running application. REPO=juice-shop tells Shannon's static analysis agents which directory inside shannon/repos/ contains the source code to analyze.
What happens when you start a test
The first time you run this command, Shannon will use Docker Compose to build and pull all the necessary container images. This can take several minutes as it sets up the entire testing environment, which includes the Temporal workflow orchestrator, Playwright for browser automation, and various other security tools.
Once the initial setup is complete, the penetration testing workflow begins. The terminal will display progress information, including links to monitor the process.
Understanding the Shannon workflow and its five phases
Shannon's strength lies in its structured, multi-phase approach to penetration testing. Each phase is handled by specialized AI agents that build upon the findings of the previous ones.
Phase 1: pre-flight
This is the initial validation stage. Before any testing begins, a pre-flight agent ensures that the environment is correctly configured. It checks for valid API credentials for the Claude model, the availability of the Docker containers, and the existence of the specified target repository. This prevents the workflow from starting a costly, long-running process only to fail due to a simple configuration error.
Phase 2: pre-reconnaissance
In this phase, Shannon performs static code analysis. AI agents read and analyze the entire source code of the target application. The goal is to build a foundational understanding of the application's inner workings without running it. Key activities include architecture analysis (identifying the frameworks, libraries, and overall structure), entry point mapping (locating all APIs, routes, and user input points), and security pattern identification (searching for common security patterns like authentication mechanisms).
Phase 3: reconnaissance
This phase transitions from static to dynamic analysis. Using the intelligence gathered in Phase 2, a new agent interacts with the live application. This is where browser automation comes into play, powered by Playwright. The agent behaves like a methodical human user, attempting to navigate through all pages and user flows, click on buttons and interact with UI elements, submit forms with various inputs, and log in and test authenticated functionalities.
While doing this, it meticulously observes and records everything, including network requests, API calls, cookies, and session tokens. This process maps out the application's real-world attack surface.
Phase 4: vulnerability and exploitation
This is the core of the penetration test. Based on all the information gathered from the code and the live application, Shannon launches a parallel attack. A suite of specialized agents, each an "expert" in a specific type of vulnerability, runs simultaneously. Five key pipelines include the Injection Agent (tests for SQL Injection, Command Injection), XSS Agent (looks for Cross-Site Scripting vulnerabilities), Auth Agent (probes for weaknesses in authentication mechanisms), SSRF Agent (attempts Server-Side Request Forgery attacks), and AuthZ Agent (focuses on Authorization bypasses like IDOR).
Crucially, for each potential vulnerability found, another agent is spawned to try and actively exploit it. This confirmation step is what allows Shannon to deliver reports with zero false positives. If an exploit is successful, it's a confirmed vulnerability.
Phase 5: reporting
In the final phase, a dedicated report agent gathers all the confirmed findings from the vulnerability and exploitation agents. It consolidates, structures, and formats this information into a series of comprehensive, human-readable Markdown reports.
The critical role of Temporal
A key technology underpinning Shannon's reliability is Temporal. A full pentest can take many hours to complete. During this time, many things can go wrong—your computer might crash, you could lose internet connectivity, or you might run out of Claude API credits.
Temporal is an open-source, durable execution system. It acts as a workflow orchestrator for Shannon, ensuring that the long-running process is resilient to failures. Temporal remembers the exact state of the workflow at all times (state persistence) and can automatically resume it from the exact point where it left off once issues are resolved (resumption and retries).
This means you don't lose hours of progress and expensive API calls due to an unexpected interruption. You can monitor the workflow's progress in real-time through the Temporal Web UI, which provides a detailed timeline of every task and activity.
Analyzing the comprehensive security reports
Once the pentest is complete, Shannon generates a wealth of detailed reports in the shannon/repos/<your-repo>/deliverables/ directory. These are not just vague warnings; they are actionable intelligence documents.
The comprehensive_security_assessment_report.md provides a high-level summary. It details the scope of the test, lists the total number of critical vulnerabilities found, and categorizes them.
The real power, however, lies in the specific vulnerability reports. Each report includes a summary (a high-level description of the vulnerability), vulnerable location (the specific API endpoint and HTTP method), overview (an explanation of how the vulnerability works), impact (the potential damage an attacker could cause), severity (a rating of the risk), prerequisites (what an attacker would need to exploit this), and exploitation steps.
The exploitation steps section is the most valuable. It provides a step-by-step, playbook-style guide on how to reproduce the exploit. It includes the exact curl commands needed to register users, log in, obtain tokens, and execute the final attack. This level of detail makes it incredibly easy for developers to verify the issue and develop a fix.
Shannon Lite vs. Shannon Pro: features and costs
The version of Shannon demonstrated and available on GitHub is Shannon Lite, which is open-source under the AGPL-3.0 license. There is also a commercial version called Shannon Pro.
While Shannon Lite is incredibly powerful, Shannon Pro offers additional features geared towards enterprise and professional use cases, including advanced core scanning (LLVM-powered data flow analysis for higher precision), CVSS scoring (standardized scoring for vulnerability severity), integration (native CI/CD pipeline support), deployment (option for cloud or self-hosted deployment), enterprise features (multi-user support, Role-Based Access Control, and SSO/SAML integration), and compliance reporting (generation of reports for standards like OWASP, PCI-DSS, and SOC2).
Cost considerations
Running Shannon is not free. The OWASP Juice Shop test consumed almost $60 in Claude API credits. While this is significantly cheaper than hiring a manual tester for even a single day, it's a cost that can add up, especially for individuals or indie developers.
Subsequent runs on the same project are typically faster (and thus cheaper) as Shannon builds upon its previous knowledge. Nevertheless, the inability to use a flat-rate Claude subscription with the Agent SDK is a current limitation that makes per-run costs a primary consideration.
Final thoughts
Shannon marks a major step forward in automated security testing. By combining the reasoning power of LLMs with a robust multi-agent framework, it gives you a way to dramatically reduce the time and cost required to secure your web applications.
You’ve now explored the full Shannon ecosystem, including how to set it up, run a comprehensive penetration test, and interpret its detailed, actionable reports. You’ve also seen its five-phase workflow in action, from initial code analysis to live exploitation, and understood how Temporal ensures the entire process remains reliable and resilient.
While API credit costs may still be a consideration, the long-term value is clear. Shannon enables you to shift security left by embedding deep testing directly into your development cycle. Instead of treating security as a costly, one-time audit, you turn it into a continuous, automated safeguard. As AI and cybersecurity increasingly intersect, tools like Shannon are not just innovative. They are shaping the future of how you build secure software.