AI SRE tools have quickly moved from something experimental to something teams rely on in production in less than 18 months. The idea is straightforward. You get an always-on agent that investigates alerts, finds root causes, and suggests fixes before your on-call engineer even starts digging in.
That said, the two main approaches to solving this problem are very different.
Datadog Bits AI SRE represents the incumbent approach. It is built directly into Datadog’s observability platform and uses the data Datadog already collects, including metrics, logs, traces, RUM data, and service relationships. Because of this, it has deep, built-in context across your systems.
Resolve AI, on the other hand, takes a different path. It is a newer company founded by former Splunk leaders who helped create OpenTelemetry. It raised $125 million at a $1 billion valuation in February 2026. Instead of being tied to one platform, it works as a vendor-neutral layer that sits on top of your existing observability, infrastructure, and code tools.
So which one actually makes on-call easier? It really comes down to a few key factors. It depends on where your telemetry data lives, how much you are willing to spend per investigation, and whether you want your AI SRE tied to a single observability vendor or running independently across your stack.
This is where the real divergence happens. Both tools claim to investigate alerts autonomously, but the mechanics underneath tell very different stories about what "autonomous investigation" actually means in practice.
Datadog Bits AI SRE: native telemetry advantage
Bits AI SRE activates the moment a Datadog monitor alert fires. It does not wait for a human to trigger an investigation or type a question into a chat box. The agent reads the same telemetry your team would: metrics, logs, traces, dashboards, recent changes, and runbooks. It forms hypotheses about the root cause, tests each one against live data, and produces a structured report with its conclusions, evidence trail, and confidence level.
What makes this work is Datadog's data advantage. The agent has direct, native access to everything Datadog collects across your environment. There is no API latency from integrating with external tools, no sampling limitations from third-party connections, and no need to configure which data sources the agent can reach. If Datadog sees it, Bits AI sees it.
The March 2026 update expanded that visibility significantly. Bits AI SRE now analyzes source code, RUM sessions, database monitoring query plans, network path data, and continuous profiler output in addition to the original metrics, logs, traces, and dashboards. That means a single investigation can trace a latency spike from user sessions through backend dependencies, database queries, network paths, and down to specific code paths, all without switching tools.
Bits AI also handles triage actions directly. It can send Slack or Teams messages, create Datadog incidents, page engineers, open cases in Case Management, and generate Jira tickets, all pre-filled with investigation context. For teams that already live inside Datadog, this is genuinely useful.
The catch? Every bit of that value depends on your telemetry living inside Datadog. If half your stack is instrumented with Grafana, Prometheus, or another backend, Bits AI SRE has a blind spot exactly where you need visibility most. Does your entire observability stack run through Datadog, or are you stitching together multiple tools?
Resolve AI: vendor-neutral multi-agent system
Resolve AI takes the opposite approach. Instead of native telemetry access, it connects to your existing tools through integrations: Datadog, Grafana, Splunk, Prometheus, Chronosphere, Kloudfuse, and others on the observability side, plus Kubernetes, AWS, GitHub, and Slack on the infrastructure and collaboration side.
When an alert fires, Resolve AI's multi-agent system kicks in. Specialized agents handle different aspects of the investigation: one correlates alerts across services and filters noise, another plans the investigation with parallel hypotheses, and others gather evidence from code, infrastructure, and telemetry data. The system continuously learns from past incidents and runbooks, building institutional knowledge over time.
The key differentiator is the code-aware investigation. Resolve AI does not just look at telemetry; it also reads your source code and maps it to infrastructure behavior. When it finds a root cause, it can pinpoint the specific PR that introduced a regression, surface the exact method causing failures, and generate remediation pull requests with full context. One customer reported that Resolve AI identified the exact PR that introduced a bug and specified the affected event IDs and categories.
Resolve AI also operates multiple specialized agents beyond incident response. A cost optimization agent analyzes resource allocation across Kubernetes and AWS to find waste, and a production debugging agent helps engineers ask questions about code, services, and infrastructure in natural language.
The tradeoff is integration complexity. Resolve AI is only as good as the data it can access through its integrations. If an integration is missing or poorly configured, the agent has less context to work with. Setup takes longer than Bits AI's zero-configuration approach, and the quality of investigation depends heavily on integration coverage and the quality of the underlying observability data.
Investigation capability
Datadog Bits AI SRE
Resolve AI
Activation
Automatic on alert fire
Automatic on alert fire
Data access
Native (zero-config)
Via integrations (requires setup)
Parallel hypothesis testing
Yes
Yes
Code-level analysis
Source code access (since March 2026)
Deep code analysis, PR identification
Remediation output
Triage actions (Slack, Jira, incidents)
PR generation, kubectl commands, code fixes
Learning
Improves per investigation
Learns from incidents and runbooks
Time to first investigation
Immediate (if on Datadog)
After integration setup
Cross-platform visibility
Datadog ecosystem only
Multi-vendor observability stacks
Integration and data access
How much of your production environment can the AI SRE actually see? This question matters more than any feature comparison, because an AI SRE that cannot access the right signals will produce confident-sounding but wrong conclusions.
Datadog Bits AI SRE
Bits AI SRE has seamless access to the entire Datadog product suite. That includes over 750 integrations that feed data into Datadog, plus all the derived context: service maps, dependency graphs, deployment tracking, anomaly baselines, and ML-powered Watchdog alerts.
The depth of data access is hard to overstate. After the March 2026 update, Bits AI can correlate signals across metrics, logs, traces, source code, RUM sessions, database monitoring, network paths, and continuous profiler data. For organizations that have fully instrumented their stack in Datadog, this represents the most complete data picture any AI SRE can work with.
The limitation is equally hard to overstate: if telemetry does not flow through Datadog, it does not exist for Bits AI. In practice, many organizations run hybrid observability stacks. Maybe infrastructure metrics live in Datadog but application traces run through Grafana, or logs go to Splunk for compliance reasons. In those environments, Bits AI has partial visibility, and partial visibility during a multi-service incident is arguably worse than no AI assistance at all. A half-informed agent that delivers a confident but incomplete analysis can send your team chasing the wrong root cause.
Resolve AI
Resolve AI takes read-only access to the minimum data it needs across your existing tools. The integration library includes Grafana, Datadog, Splunk, Prometheus, Chronosphere, Kloudfuse on the telemetry side, Kubernetes and AWS on infrastructure, GitHub for code, Slack and Notion for collaboration and knowledge, plus support for custom tooling via MCP, APIs, and webhooks.
The vendor-neutral approach means Resolve AI can investigate across whatever combination of tools your team actually uses. If your metrics are in Prometheus, logs in Datadog, and traces in Grafana, Resolve AI can correlate across all three. It also has access to code repositories, which gives it an investigation dimension that platform-locked tools cannot easily replicate.
Data security is handled through a satellite architecture. The Resolve AI satellite acts as a gateway, giving you control over what data gets accessed, how frequently metadata is scraped, and which roles, environments, or namespaces are visible.
The downside is integration maintenance. Each connection requires configuration, authentication, and ongoing upkeep. If an API token expires or a tool changes its API, the integration breaks. And the investigation quality is fundamentally constrained by what the integrations can access. Native platform access (like Bits AI has within Datadog) will always be faster and deeper than API-mediated access.
Are you running a single-vendor observability stack, or do you have telemetry scattered across multiple platforms? That single question probably determines which tool fits better.
Integration aspect
Datadog Bits AI SRE
Resolve AI
Observability tools
Datadog only (750+ integrations feeding into Datadog)
🚀 An AI SRE that works with your entire stack, not against your budget
Better Stack provides an AI-powered SRE that activates autonomously during incidents, analyzing your service map, querying logs, reviewing recent deployments, and suggesting root causes, all within a unified platform that includes logs, metrics, traces, incident management, and on-call scheduling. The MCP server connects Claude, Cursor, and other AI assistants directly to your observability data for natural language querying.
Finding the root cause is only half the battle. What the AI SRE does with that information determines whether it is genuinely useful or just a faster version of reading dashboards.
Datadog Bits AI SRE
Bits AI SRE focuses on triage and coordination rather than direct code-level remediation. After completing an investigation, it can execute seven distinct triage actions: sending messages to Slack and Teams, creating incidents in Datadog Incident Response, paging engineers, creating cases in Case Management, and generating Jira tickets. All of these actions come pre-filled with investigation context including affected services, suspected root causes, relevant dashboards, and supporting telemetry.
The newer Workflow Automation integration lets Bits AI SRE trigger automated workflows from its investigation context. Three actions are available in the Datadog Action Catalog: Trigger Investigation, Get Investigation, and List Investigation. This opens the door to chaining Bits AI investigations with automated remediation scripts, though the remediation scripts themselves must be pre-built by your team.
Bits AI also includes the Bits AI Dev Agent as a separate product, which can open PRs for code fixes. However, this is a distinct agent from the SRE product and carries its own pricing.
The approach is deliberate. High-stakes remediation actions like database rollbacks or infrastructure changes still require human approval. Bits AI provides the analysis and context to help humans make faster decisions rather than making those decisions autonomously.
Resolve AI
Resolve AI leans further into autonomous remediation. The system generates Git PRs, kubectl commands, code fixes, and scripts that are tailored to your specific setup. When it identifies a root cause, it does not just tell you what went wrong; it suggests how to fix it and can draft the actual fix.
The PR generation is particularly interesting for teams that want their AI SRE to go beyond investigation. Resolve AI creates pull requests with full context, including the root cause analysis, affected code paths, and the specific change that should resolve the issue. The human engineer reviews and merges, but the investigative and drafting work is done.
Automatic post-mortem generation is another differentiator. After an incident resolves, the system documents what happened, what was investigated, what the root cause was, and what was done to fix it, reducing the manual toil of writing post-incident reviews.
For cost optimization, a separate Resolve AI agent analyzes resource allocation across Kubernetes and AWS. It compares actual CPU and memory usage against configured requests and limits, identifies over-provisioned pods, finds idle infrastructure, and surfaces storage hotspots, all as actionable recommendations rather than just charts.
Remediation capability
Datadog Bits AI SRE
Resolve AI
Triage actions
Slack, Teams, Jira, incidents, paging
Slack notifications, ticket updates
Code fix generation
Via separate Bits AI Dev Agent
Built-in PR generation with context
kubectl/infrastructure
Via Workflow Automation (pre-built scripts)
Native kubectl commands and scripts
Post-mortems
Manual (with investigation context)
Automatic generation
Cost optimization
Separate Cloud Cost Management product
Built-in cost optimization agent
Human-in-the-loop
Required for high-stakes actions
Required for PR merges and major changes
Pricing comparison
Pricing is where these two products diverge sharply, and where the decision gets complicated for teams trying to budget for AI SRE capabilities.
Datadog Bits AI SRE
Datadog publishes transparent pricing for Bits AI SRE investigations. The structure is:
Annual plan: $500 per 20 investigations per month (effectively $25/investigation)
Monthly plan: $600 per 20 investigations per month ($30/investigation)
On-demand: $36 per investigation
Only conclusive investigations are billable. If an investigation ends inconclusively or does not complete, you are not charged. Volume discounts are available for larger commitments.
The per-investigation model is straightforward but creates an important constraint: the cost scales directly with your alert volume. Teams with noisy alerting configurations could burn through investigation budgets quickly. A team processing 100 alerts per month at the annual rate pays $2,500/month, while a team with 500 alerts (not uncommon in large microservice environments) pays $12,500/month.
There is also the base Datadog platform cost to consider. Bits AI SRE is an add-on. You need Datadog infrastructure monitoring, APM, log management, or other products running first, and those carry their own per-host, per-feature pricing. The total cost of Bits AI SRE is the investigation fee plus whatever you already pay for Datadog.
How many alerts does your team process monthly? Multiply that by $25-36 and you have your Bits AI SRE budget, before the base Datadog platform cost.
Resolve AI
Resolve AI does not publish pricing. The pricing page is a contact form that routes to enterprise sales. Given the company's $1B valuation and $4M ARR (as reported by TechCrunch in December 2025), the customer base is relatively small and likely concentrated in enterprise accounts.
Based on available market data from AI SRE comparisons, Resolve AI likely sits in the enterprise tier. The product is available through the AWS Marketplace, which suggests some customers use existing AWS spend commitments to fund it.
The lack of public pricing makes direct cost comparison impossible. What we can say is that the pricing model appears to be enterprise contract-based rather than per-investigation or usage-based. For teams evaluating both tools, the only way to get Resolve AI pricing is through a sales conversation.
Pricing aspect
Datadog Bits AI SRE
Resolve AI
Public pricing
Yes
No (contact sales)
Model
Per-investigation
Enterprise contracts
Annual cost (20 investigations/month)
$6,000/year
Not disclosed
Annual cost (100 investigations/month)
$30,000/year
Not disclosed
Base platform requirement
Datadog subscription (additional cost)
None (works with existing tools)
Free trial
14-day Datadog trial
Demo-based evaluation
Volume discounts
Available
Likely (enterprise negotiation)
Available on marketplace
Datadog billing
AWS Marketplace
Customer evidence and traction
What are actual engineering teams saying about these tools in production?
Datadog Bits AI SRE
Datadog reports that over 2,000 customer environments tested Bits AI SRE before its December 2025 GA launch. The customer testimonials on the product page are specific and named: iFood's SRE team reported a 70% reduction in MTTR, engineers at Uber Freight highlighted noise reduction and correlation capabilities, and Energisa's systems engineer noted root causes delivered in under four minutes.
The internal benchmark data is also notable. Datadog published a detailed engineering blog about how they evaluate Bits AI SRE against real-world incidents, and the March 2026 update claims the newer version is approximately twice as fast and more accurate on internal benchmarks. The product claims up to 95% reduction in time to resolution.
Datadog's scale as a publicly traded company (NASDAQ: DDOG) gives the product a level of support infrastructure and product continuity that startups cannot yet match. Thousands of organizations use it in production, spanning global enterprises and fast-growing startups across diverse production environments.
Resolve AI
Resolve AI's customer base is smaller but includes recognizable names. DoorDash is the flagship case study: the company's Senior Director of Engineering reported 87% faster incident investigations and noted that fewer engineers get pulled into war rooms. A Coinbase engineer reported that Resolve AI surfaced accurate root causes 73% faster than their teams. A senior director at Uni reported 2x productivity improvement.
The founder credentials are strong. CEO Spiros Xanthos co-created OpenTelemetry and previously led Splunk's Observability business. The $125M Series A led by Lightspeed Venture Partners, with existing investors Greylock and Unusual Ventures, indicates serious institutional backing.
However, the company's reported ARR of approximately $4M (as of December 2025) means the production footprint is still relatively limited compared to Datadog's thousands of deployed environments. That is not necessarily a problem for the right buyer, but it does mean less battle-testing across diverse architectures and failure modes.
🚀 AI incident investigation without the per-investigation bill
Better Stack includes AI-powered incident management with autonomous investigation, on-call scheduling, unlimited phone/SMS alerts, and Slack-native collaboration at $29/month per responder. No per-investigation charges, no separate add-ons for on-call, and no need to bolt on PagerDuty.
This is the section that should get the most weight in your evaluation, because AI SRE tools are the kind of investment where switching costs compound quickly. The more your team relies on an AI SRE for incident response, the harder it becomes to change direction later.
Datadog Bits AI SRE
Bits AI SRE deepens your dependency on the Datadog ecosystem. The agent works exclusively with Datadog telemetry, produces insights within Datadog's interface, and its triage actions route through Datadog's incident management and workflow automation. The more value you extract from Bits AI, the more reasons you have to keep your observability stack consolidated in Datadog.
For teams already committed to Datadog, this is not a problem. It is a feature. Tighter integration means better performance, less configuration, and more seamless workflows. But for teams evaluating observability strategy over a multi-year horizon, the deepening lock-in should be weighed against Datadog's well-documented pricing complexity and the potential for costs to compound as you add more products.
Would you feel comfortable migrating away from Datadog in two years if your AI SRE workflows, investigation history, and automated triage actions are all built on Bits AI?
Resolve AI
Resolve AI's vendor-neutral design explicitly avoids lock-in to any single observability platform. If you switch from Datadog to Grafana, or add New Relic alongside Splunk, Resolve AI adds or swaps integrations rather than requiring a platform migration.
The OpenTelemetry heritage of the founding team (they literally co-created the standard) shows in the product philosophy. Your instrumentation, telemetry, and investigation history are not trapped inside a single vendor's ecosystem.
The counterpoint is that Resolve AI introduces its own form of lock-in. As the system learns from your incidents, builds institutional knowledge, and integrates with your workflows, the switching cost shifts from "observability platform dependency" to "AI SRE dependency." Moving from Resolve AI to another AI SRE tool means losing that accumulated investigation context.
Lock-in dimension
Datadog Bits AI SRE
Resolve AI
Platform dependency
Full (Datadog ecosystem)
Minimal (multi-vendor)
Instrumentation portability
Proprietary agents preferred
OpenTelemetry philosophy
Investigation history
Stored in Datadog
Stored in Resolve AI
Workflow portability
Datadog-specific automations
Integration-based (more portable)
Switching cost trajectory
Increases with Datadog adoption
Increases with accumulated learning
Final thoughts
The AI SRE market in 2026 is splitting along a clear fault line. On one side, observability platforms like Datadog are embedding AI capabilities natively, trading vendor neutrality for depth of integration. On the other, standalone AI SRE startups like Resolve AI are trading native data access for cross-platform flexibility. Neither approach is wrong. They solve different versions of the same problem.
Datadog Bits AI SRE is the strongest choice for teams that have already consolidated their observability stack in Datadog and want the fastest, deepest AI investigation capability without additional integration work. The native data access, zero-configuration setup, and tight platform integration make it genuinely powerful within its ecosystem. The per-investigation pricing is transparent and predictable for teams with controlled alert volumes.
Resolve AI is the stronger choice for enterprise teams running heterogeneous observability stacks who need cross-platform investigation, code-level remediation, and vendor independence. The multi-agent architecture, PR generation, and cost optimization capabilities go further into the remediation workflow than Bits AI currently does. But the enterprise-only pricing, integration complexity, and early-stage customer base are real considerations.
The question neither tool fully answers yet is this: should your AI SRE be the same company that sells you observability, or should it be an independent layer? Datadog says integration wins. Resolve AI says independence wins. The right answer depends on your stack, your budget, and how much you trust any single vendor with both your data and your incident response.
If both options feel like they come with compromises you would rather avoid, Better Stack offers a unified approach: AI-powered incident investigation built into a full observability platform (logs, metrics, traces, error tracking, incident management, on-call) with volume-based pricing, an MCP server for AI assistant integration, and no per-investigation billing. Start monitoring for free.