Checking for outliers

AI SRE helps you identify outliers and anomalies in your telemetry data through several built-in capabilities, rather than a single "find outliers" button.

Query summaries

When you ask AI SRE to run a direct query on your logs or metrics, use the "Summary" output mode. The AI-generated summary is specifically designed to highlight anomalies, including:

  • Spikes and drops in values.
  • Changes in error rates.
  • Unusual periodic patterns.
  • Capacity saturation.

The summary will always point out "when" the anomaly occurred and "by how much" it deviated from the baseline.

Chart alerts

AI SRE can create alerts on your dashboard charts. It supports three types:

  • Anomaly alerts: Use a Robust Random Cut Forest (RRCF) model to learn the normal shape of your data and fire when current values look unusual. This is ideal for metrics with complex seasonal patterns.
  • Threshold alerts: Trigger when a metric crosses a static value (e.g., "CPU usage > 90%").
  • Relative alerts: Trigger based on a percentage change over a time window (e.g., "error rate increased by 50% in the last hour").

This is a proactive way to detect outliers, ensuring that future anomalies will automatically page the on-call team.

Service maps

For services monitored by a Better Stack collector with eBPF enabled, you can ask AI SRE to "generate a service map."

This visualizes the connections between your services and surfaces key metrics like error rates and request rates on every edge. By inspecting the colored edges and metrics, you can spot services with unusually high error rates or low request volumes.