# Datadog Bits AI vs Resolve AI

AI SRE tools have quickly moved from something experimental to something teams rely on in production in less than 18 months. The idea is straightforward. You get **an always-on agent that investigates alerts, finds root causes, and suggests fixes before your on-call engineer even starts digging in**.

That said, the two main approaches to solving this problem are very different.

**Datadog Bits AI SRE** represents the incumbent approach. It is built directly into Datadog’s observability platform and uses the data Datadog already collects, including metrics, logs, traces, RUM data, and service relationships. Because of this, it has deep, built-in context across your systems.

**Resolve AI**, on the other hand, takes a different path. It is a newer company founded by former Splunk leaders who helped create OpenTelemetry. It raised $125 million at a $1 billion valuation in February 2026. Instead of being tied to one platform, it works as **a vendor-neutral layer that sits on top of your existing observability, infrastructure, and code tools**.

So which one actually makes on-call easier? It really comes down to a few key factors. It depends on **where your telemetry data lives**, **how much you are willing to spend per investigation**, and **whether you want your AI SRE tied to a single observability vendor or running independently across your stack**.

## Quick comparison at a glance

| Category | Datadog Bits AI SRE | Resolve AI |
|----------|-------------------|------------|
| **Type** | Platform-native AI SRE (Datadog add-on) | Standalone AI SRE (vendor-neutral) |
| **Data access model** | Native Datadog telemetry | Integrations across multiple observability tools |
| **Telemetry sources** | Metrics, logs, traces, RUM, profiler, network path, source code, dashboards | Logs, traces, metrics via Datadog, Grafana, Splunk, Prometheus, and others |
| **Investigation approach** | Hypothesis-driven with parallel analysis | Multi-agent system with parallel hypotheses |
| **Remediation** | Suggested fixes, triage actions (Slack, Jira, incidents) | PR generation, kubectl commands, code fixes, scripts |
| **Learning** | Improves accuracy from each investigation | Learns from past incidents and runbooks |
| **Pricing model** | Per-investigation ($25-36/investigation) | Enterprise pricing (contact sales) |
| **Compliance** | SOC 2, GDPR, HIPAA, FedRAMP | SOC 2 Type II, GDPR, HIPAA |
| **Vendor lock-in** | Tied to Datadog ecosystem | Works across multiple observability stacks |
| **Founded/launched** | GA December 2025 (Datadog founded 2010) | Seed October 2024, Series A February 2026 |

## How each tool investigates incidents

This is where the real divergence happens. Both tools claim to investigate alerts autonomously, but the mechanics underneath tell very different stories about what "autonomous investigation" actually means in practice.

### Datadog Bits AI SRE: native telemetry advantage

Bits AI SRE activates the moment a Datadog monitor alert fires. It does not wait for a human to trigger an investigation or type a question into a chat box. The agent reads the same telemetry your team would: metrics, logs, traces, dashboards, recent changes, and runbooks. It forms hypotheses about the root cause, tests each one against live data, and produces a structured report with its conclusions, evidence trail, and confidence level.

What makes this work is Datadog's data advantage. The agent has direct, native access to everything Datadog collects across your environment. There is no API latency from integrating with external tools, no sampling limitations from third-party connections, and no need to configure which data sources the agent can reach. If Datadog sees it, Bits AI sees it.

The March 2026 update expanded that visibility significantly. Bits AI SRE now analyzes source code, RUM sessions, database monitoring query plans, network path data, and continuous profiler output in addition to the original metrics, logs, traces, and dashboards. That means a single investigation can trace a latency spike from user sessions through backend dependencies, database queries, network paths, and down to specific code paths, all without switching tools.

Bits AI also handles triage actions directly. It can send Slack or Teams messages, create Datadog incidents, page engineers, open cases in Case Management, and generate Jira tickets, all pre-filled with investigation context. For teams that already live inside Datadog, this is genuinely useful.

The catch? Every bit of that value depends on your telemetry living inside Datadog. If half your stack is instrumented with Grafana, Prometheus, or another backend, Bits AI SRE has a blind spot exactly where you need visibility most. Does your entire observability stack run through Datadog, or are you stitching together multiple tools?

![SCREENSHOT: Datadog Bits AI SRE investigation interface](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/e692c066-7432-4fd4-fecc-3cfaf14e6900/orig =710x399)

### Resolve AI: vendor-neutral multi-agent system

Resolve AI takes the opposite approach. Instead of native telemetry access, it connects to your existing tools through integrations: Datadog, Grafana, Splunk, Prometheus, Chronosphere, Kloudfuse, and others on the observability side, plus Kubernetes, AWS, GitHub, and Slack on the infrastructure and collaboration side.

When an alert fires, Resolve AI's multi-agent system kicks in. Specialized agents handle different aspects of the investigation: one correlates alerts across services and filters noise, another plans the investigation with parallel hypotheses, and others gather evidence from code, infrastructure, and telemetry data. The system continuously learns from past incidents and runbooks, building institutional knowledge over time.

The key differentiator is the code-aware investigation. Resolve AI does not just look at telemetry; it also reads your source code and maps it to infrastructure behavior. When it finds a root cause, it can pinpoint the specific PR that introduced a regression, surface the exact method causing failures, and generate remediation pull requests with full context. One customer reported that Resolve AI identified the exact PR that introduced a bug and specified the affected event IDs and categories.

Resolve AI also operates multiple specialized agents beyond incident response. A cost optimization agent analyzes resource allocation across Kubernetes and AWS to find waste, and a production debugging agent helps engineers ask questions about code, services, and infrastructure in natural language.

The tradeoff is integration complexity. Resolve AI is only as good as the data it can access through its integrations. If an integration is missing or poorly configured, the agent has less context to work with. Setup takes longer than Bits AI's zero-configuration approach, and the quality of investigation depends heavily on integration coverage and the quality of the underlying observability data.

![SCREENSHOT: Resolve AI investigation timeline and root cause output](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/4e3821cb-1d27-4ca2-aa46-cd1090058000/lg1x =1500x788)

| Investigation capability | Datadog Bits AI SRE | Resolve AI |
|--------------------------|-------------------|------------|
| **Activation** | Automatic on alert fire | Automatic on alert fire |
| **Data access** | Native (zero-config) | Via integrations (requires setup) |
| **Parallel hypothesis testing** | Yes | Yes |
| **Code-level analysis** | Source code access (since March 2026) | Deep code analysis, PR identification |
| **Remediation output** | Triage actions (Slack, Jira, incidents) | PR generation, kubectl commands, code fixes |
| **Learning** | Improves per investigation | Learns from incidents and runbooks |
| **Time to first investigation** | Immediate (if on Datadog) | After integration setup |
| **Cross-platform visibility** | Datadog ecosystem only | Multi-vendor observability stacks |

## Integration and data access

How much of your production environment can the AI SRE actually see? This question matters more than any feature comparison, because an AI SRE that cannot access the right signals will produce confident-sounding but wrong conclusions.

### Datadog Bits AI SRE

Bits AI SRE has seamless access to the entire Datadog product suite. That includes over 750 integrations that feed data into Datadog, plus all the derived context: service maps, dependency graphs, deployment tracking, anomaly baselines, and ML-powered Watchdog alerts.

The depth of data access is hard to overstate. After the March 2026 update, Bits AI can correlate signals across metrics, logs, traces, source code, RUM sessions, database monitoring, network paths, and continuous profiler data. For organizations that have fully instrumented their stack in Datadog, this represents the most complete data picture any AI SRE can work with.

The limitation is equally hard to overstate: if telemetry does not flow through Datadog, it does not exist for Bits AI. In practice, many organizations run hybrid observability stacks. Maybe infrastructure metrics live in Datadog but application traces run through Grafana, or logs go to Splunk for compliance reasons. In those environments, Bits AI has partial visibility, and partial visibility during a multi-service incident is arguably worse than no AI assistance at all. A half-informed agent that delivers a confident but incomplete analysis can send your team chasing the wrong root cause.

### Resolve AI

Resolve AI takes read-only access to the minimum data it needs across your existing tools. The integration library includes Grafana, Datadog, Splunk, Prometheus, Chronosphere, Kloudfuse on the telemetry side, Kubernetes and AWS on infrastructure, GitHub for code, Slack and Notion for collaboration and knowledge, plus support for custom tooling via MCP, APIs, and webhooks.

The vendor-neutral approach means Resolve AI can investigate across whatever combination of tools your team actually uses. If your metrics are in Prometheus, logs in Datadog, and traces in Grafana, Resolve AI can correlate across all three. It also has access to code repositories, which gives it an investigation dimension that platform-locked tools cannot easily replicate.

Data security is handled through a satellite architecture. The Resolve AI satellite acts as a gateway, giving you control over what data gets accessed, how frequently metadata is scraped, and which roles, environments, or namespaces are visible.

The downside is integration maintenance. Each connection requires configuration, authentication, and ongoing upkeep. If an API token expires or a tool changes its API, the integration breaks. And the investigation quality is fundamentally constrained by what the integrations can access. Native platform access (like Bits AI has within Datadog) will always be faster and deeper than API-mediated access.

Are you running a single-vendor observability stack, or do you have telemetry scattered across multiple platforms? That single question probably determines which tool fits better.

| Integration aspect | Datadog Bits AI SRE | Resolve AI |
|-------------------|-------------------|------------|
| **Observability tools** | Datadog only (750+ integrations feeding into Datadog) | Datadog, Grafana, Splunk, Prometheus, Chronosphere, Kloudfuse |
| **Infrastructure** | AWS, Azure, GCP (via Datadog) | Kubernetes, AWS, custom tools |
| **Code repositories** | Source code access (March 2026+) | GitHub (deep code analysis) |
| **Collaboration** | Slack, Teams, Jira, ServiceNow, GitHub | Slack, Notion, Jira |
| **Custom tooling** | Datadog MCP server, API | MCP, APIs, webhooks |
| **Setup complexity** | Zero (if already on Datadog) | Moderate (integration-by-integration) |
| **Data access speed** | Native (fastest possible) | API-mediated (integration-dependent) |

[summary]
## 🚀 An AI SRE that works with your entire stack, not against your budget

[Better Stack](https://betterstack.com/) provides an AI-powered SRE that activates autonomously during incidents, analyzing your service map, querying logs, reviewing recent deployments, and suggesting root causes, all within a unified platform that includes logs, metrics, traces, incident management, and on-call scheduling. The MCP server connects Claude, Cursor, and other AI assistants directly to your observability data for natural language querying.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/n6TtDk8ITgc" title="AI SRE Demo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

[Start monitoring for free](https://betterstack.com/users/sign-up).
[/summary]

## Remediation and automated fixes

Finding the root cause is only half the battle. What the AI SRE does with that information determines whether it is genuinely useful or just a faster version of reading dashboards.

### Datadog Bits AI SRE

Bits AI SRE focuses on triage and coordination rather than direct code-level remediation. After completing an investigation, it can execute seven distinct triage actions: sending messages to Slack and Teams, creating incidents in Datadog Incident Response, paging engineers, creating cases in Case Management, and generating Jira tickets. All of these actions come pre-filled with investigation context including affected services, suspected root causes, relevant dashboards, and supporting telemetry.

The newer Workflow Automation integration lets Bits AI SRE trigger automated workflows from its investigation context. Three actions are available in the Datadog Action Catalog: Trigger Investigation, Get Investigation, and List Investigation. This opens the door to chaining Bits AI investigations with automated remediation scripts, though the remediation scripts themselves must be pre-built by your team.

Bits AI also includes the Bits AI Dev Agent as a separate product, which can open PRs for code fixes. However, this is a distinct agent from the SRE product and carries its own pricing.

The approach is deliberate. High-stakes remediation actions like database rollbacks or infrastructure changes still require human approval. Bits AI provides the analysis and context to help humans make faster decisions rather than making those decisions autonomously.

### Resolve AI

Resolve AI leans further into autonomous remediation. The system generates Git PRs, kubectl commands, code fixes, and scripts that are tailored to your specific setup. When it identifies a root cause, it does not just tell you what went wrong; it suggests how to fix it and can draft the actual fix.

The PR generation is particularly interesting for teams that want their AI SRE to go beyond investigation. Resolve AI creates pull requests with full context, including the root cause analysis, affected code paths, and the specific change that should resolve the issue. The human engineer reviews and merges, but the investigative and drafting work is done.

Automatic post-mortem generation is another differentiator. After an incident resolves, the system documents what happened, what was investigated, what the root cause was, and what was done to fix it, reducing the manual toil of writing post-incident reviews.

For cost optimization, a separate Resolve AI agent analyzes resource allocation across Kubernetes and AWS. It compares actual CPU and memory usage against configured requests and limits, identifies over-provisioned pods, finds idle infrastructure, and surfaces storage hotspots, all as actionable recommendations rather than just charts.

| Remediation capability | Datadog Bits AI SRE | Resolve AI |
|----------------------|-------------------|------------|
| **Triage actions** | Slack, Teams, Jira, incidents, paging | Slack notifications, ticket updates |
| **Code fix generation** | Via separate Bits AI Dev Agent | Built-in PR generation with context |
| **kubectl/infrastructure** | Via Workflow Automation (pre-built scripts) | Native kubectl commands and scripts |
| **Post-mortems** | Manual (with investigation context) | Automatic generation |
| **Cost optimization** | Separate Cloud Cost Management product | Built-in cost optimization agent |
| **Human-in-the-loop** | Required for high-stakes actions | Required for PR merges and major changes |

## Pricing comparison

Pricing is where these two products diverge sharply, and where the decision gets complicated for teams trying to budget for AI SRE capabilities.

### Datadog Bits AI SRE

Datadog publishes transparent pricing for Bits AI SRE investigations. The structure is:

- **Annual plan:** $500 per 20 investigations per month (effectively $25/investigation)
- **Monthly plan:** $600 per 20 investigations per month ($30/investigation)
- **On-demand:** $36 per investigation

Only conclusive investigations are billable. If an investigation ends inconclusively or does not complete, you are not charged. Volume discounts are available for larger commitments.

The per-investigation model is straightforward but creates an important constraint: the cost scales directly with your alert volume. Teams with noisy alerting configurations could burn through investigation budgets quickly. A team processing 100 alerts per month at the annual rate pays $2,500/month, while a team with 500 alerts (not uncommon in large microservice environments) pays $12,500/month.

There is also the base Datadog platform cost to consider. Bits AI SRE is an add-on. You need Datadog infrastructure monitoring, APM, log management, or other products running first, and those carry their own per-host, per-feature pricing. The total cost of Bits AI SRE is the investigation fee plus whatever you already pay for Datadog.

How many alerts does your team process monthly? Multiply that by $25-36 and you have your Bits AI SRE budget, before the base Datadog platform cost.

### Resolve AI

Resolve AI does not publish pricing. The pricing page is a contact form that routes to enterprise sales. Given the company's $1B valuation and $4M ARR (as reported by TechCrunch in December 2025), the customer base is relatively small and likely concentrated in enterprise accounts.

Based on available market data from AI SRE comparisons, Resolve AI likely sits in the enterprise tier. The product is available through the AWS Marketplace, which suggests some customers use existing AWS spend commitments to fund it.

The lack of public pricing makes direct cost comparison impossible. What we can say is that the pricing model appears to be enterprise contract-based rather than per-investigation or usage-based. For teams evaluating both tools, the only way to get Resolve AI pricing is through a sales conversation.

| Pricing aspect | Datadog Bits AI SRE | Resolve AI |
|---------------|-------------------|------------|
| **Public pricing** | Yes | No (contact sales) |
| **Model** | Per-investigation | Enterprise contracts |
| **Annual cost (20 investigations/month)** | $6,000/year | Not disclosed |
| **Annual cost (100 investigations/month)** | $30,000/year | Not disclosed |
| **Base platform requirement** | Datadog subscription (additional cost) | None (works with existing tools) |
| **Free trial** | 14-day Datadog trial | Demo-based evaluation |
| **Volume discounts** | Available | Likely (enterprise negotiation) |
| **Available on marketplace** | Datadog billing | AWS Marketplace |

## Customer evidence and traction

What are actual engineering teams saying about these tools in production?

### Datadog Bits AI SRE

Datadog reports that over 2,000 customer environments tested Bits AI SRE before its December 2025 GA launch. The customer testimonials on the product page are specific and named: iFood's SRE team reported a 70% reduction in MTTR, engineers at Uber Freight highlighted noise reduction and correlation capabilities, and Energisa's systems engineer noted root causes delivered in under four minutes.

The internal benchmark data is also notable. Datadog published a detailed engineering blog about how they evaluate Bits AI SRE against real-world incidents, and the March 2026 update claims the newer version is approximately twice as fast and more accurate on internal benchmarks. The product claims up to 95% reduction in time to resolution.

Datadog's scale as a publicly traded company (NASDAQ: DDOG) gives the product a level of support infrastructure and product continuity that startups cannot yet match. Thousands of organizations use it in production, spanning global enterprises and fast-growing startups across diverse production environments.

### Resolve AI

Resolve AI's customer base is smaller but includes recognizable names. DoorDash is the flagship case study: the company's Senior Director of Engineering reported 87% faster incident investigations and noted that fewer engineers get pulled into war rooms. A Coinbase engineer reported that Resolve AI surfaced accurate root causes 73% faster than their teams. A senior director at Uni reported 2x productivity improvement.

The founder credentials are strong. CEO Spiros Xanthos co-created OpenTelemetry and previously led Splunk's Observability business. The $125M Series A led by Lightspeed Venture Partners, with existing investors Greylock and Unusual Ventures, indicates serious institutional backing.

However, the company's reported ARR of approximately $4M (as of December 2025) means the production footprint is still relatively limited compared to Datadog's thousands of deployed environments. That is not necessarily a problem for the right buyer, but it does mean less battle-testing across diverse architectures and failure modes.

| Traction metric | Datadog Bits AI SRE | Resolve AI |
|----------------|-------------------|------------|
| **Customer environments** | 2,000+ (pre-GA testing) | Not disclosed (early enterprise accounts) |
| **Named customers** | iFood, Uber Freight, Energisa, Kyndryl, Nulab, Cordada | DoorDash, Coinbase, Uni, Blueground, DataStax |
| **Reported MTTR improvement** | Up to 90-95% faster | Up to 87% faster (DoorDash) |
| **Company maturity** | Public company (NASDAQ: DDOG) | Series A startup ($1B valuation) |
| **Funding** | N/A (publicly traded) | $160M total ($35M seed + $125M Series A) |

[summary]
## 🚀 AI incident investigation without the per-investigation bill

[Better Stack](https://betterstack.com/) includes AI-powered incident management with autonomous investigation, on-call scheduling, unlimited phone/SMS alerts, and Slack-native collaboration at $29/month per responder. No per-investigation charges, no separate add-ons for on-call, and no need to bolt on PagerDuty.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/l2eLPEdvRDw" title="Incident Management Overview" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

[Start monitoring for free](https://betterstack.com/users/sign-up).
[/summary]

## Vendor lock-in and portability

This is the section that should get the most weight in your evaluation, because AI SRE tools are the kind of investment where switching costs compound quickly. The more your team relies on an AI SRE for incident response, the harder it becomes to change direction later.

### Datadog Bits AI SRE

Bits AI SRE deepens your dependency on the Datadog ecosystem. The agent works exclusively with Datadog telemetry, produces insights within Datadog's interface, and its triage actions route through Datadog's incident management and workflow automation. The more value you extract from Bits AI, the more reasons you have to keep your observability stack consolidated in Datadog.

For teams already committed to Datadog, this is not a problem. It is a feature. Tighter integration means better performance, less configuration, and more seamless workflows. But for teams evaluating observability strategy over a multi-year horizon, the deepening lock-in should be weighed against Datadog's well-documented pricing complexity and the potential for costs to compound as you add more products.

Would you feel comfortable migrating away from Datadog in two years if your AI SRE workflows, investigation history, and automated triage actions are all built on Bits AI?

### Resolve AI

Resolve AI's vendor-neutral design explicitly avoids lock-in to any single observability platform. If you switch from Datadog to Grafana, or add New Relic alongside Splunk, Resolve AI adds or swaps integrations rather than requiring a platform migration.

The OpenTelemetry heritage of the founding team (they literally co-created the standard) shows in the product philosophy. Your instrumentation, telemetry, and investigation history are not trapped inside a single vendor's ecosystem.

The counterpoint is that Resolve AI introduces its own form of lock-in. As the system learns from your incidents, builds institutional knowledge, and integrates with your workflows, the switching cost shifts from "observability platform dependency" to "AI SRE dependency." Moving from Resolve AI to another AI SRE tool means losing that accumulated investigation context.

| Lock-in dimension | Datadog Bits AI SRE | Resolve AI |
|-------------------|-------------------|------------|
| **Platform dependency** | Full (Datadog ecosystem) | Minimal (multi-vendor) |
| **Instrumentation portability** | Proprietary agents preferred | OpenTelemetry philosophy |
| **Investigation history** | Stored in Datadog | Stored in Resolve AI |
| **Workflow portability** | Datadog-specific automations | Integration-based (more portable) |
| **Switching cost trajectory** | Increases with Datadog adoption | Increases with accumulated learning |


## Final thoughts

The AI SRE market in 2026 is splitting along a clear fault line. On one side, **observability platforms like Datadog are embedding AI capabilities natively, trading vendor neutrality for depth of integration**. On the other, standalone AI SRE startups like Resolve AI are trading native data access for cross-platform flexibility. Neither approach is wrong. They solve different versions of the same problem.

Datadog Bits AI SRE is the strongest choice for teams that have already consolidated their observability stack in Datadog and **want the fastest, deepest AI investigation capability without additional integration work**. The native data access, zero-configuration setup, and tight platform integration make it genuinely powerful within its ecosystem. The per-investigation pricing is transparent and predictable for teams with controlled alert volumes.

Resolve AI is the stronger choice for enterprise teams **running heterogeneous observability stacks who need cross-platform investigation, code-level remediation, and vendor independence**. The multi-agent architecture, PR generation, and cost optimization capabilities go further into the remediation workflow than Bits AI currently does. But the enterprise-only pricing, integration complexity, and early-stage customer base are real considerations.

The question neither tool fully answers yet is this: should your AI SRE be the same company that sells you observability, or should it be an independent layer? Datadog says integration wins. Resolve AI says independence wins. The right answer depends on your stack, your budget, and how much you trust any single vendor with both your data and your incident response.

If both options feel like they come with compromises you would rather avoid, **[Better Stack](https://betterstack.com/) offers a unified approach: AI-powered incident investigation built into a full observability platform** (logs, metrics, traces, error tracking, incident management, on-call) with volume-based pricing, an MCP server for AI assistant integration, and no per-investigation billing. [Start monitoring for free](https://betterstack.com/users/sign-up).