# Querying histograms

Histograms are a powerful tool for understanding the distribution of your metric data, such as request latencies or response sizes. Instead of just tracking an average, histograms group observations into configurable buckets, giving you a much richer view of your data's performance and characteristics.

In our system, histogram data is stored in a highly optimized format. This guide will walk you through how this data is structured and how to write SQL queries to visualize it as tables, heatmaps, and even calculate custom quantiles like p95.

For details on how you can send us histograms, see [Ingesting metrics API](https://betterstack.com/docs/logs/ingesting-data/http/metrics/#histograms). Histograms and summaries coming from [Prometheus](https://betterstack.com/docs/logs/ingesting-data/metrics/prometheus-push/) and histograms from [OpenTelemetry](https://betterstack.com/docs/logs/open-telemetry/) are supported out-of-the-box.

## Querying histograms

We store histograms in a unified and efficient way. Histogram data is split into `bucket_bounds` (the upper limits of the buckets) and `bucket_counts` (the number of observations in each bucket, stored as deltas). Pre-computed quantiles are available via `bucket_quantiles`.

### Understanding the histogram data structure

| Column | Merge Function | Description |
|---|---|---|
| `bucket_bounds` | `anyLastMerge()` | Histogram bucket boundaries (e.g., `[0.1, 0.5, inf]`). |
| `bucket_counts` | `sumForEachMerge()` | Counts per bucket (element-wise sum of deltas). |
| `bucket_quantiles` | `histogramQuantile(p)` | Pre-computed quantile function for efficient percentile calculation. |
| `bucket_sum` | `sumMerge()` | Sum of all observed values. |
| `bucket_count` | `sumMerge()` | Total number of observations. |
| `bucket_min` | `minMerge()` | Minimum observed value. |
| `bucket_max` | `maxMerge()` | Maximum observed value. |

[info]
You can calculate the precise average value of a metric using `sumMerge(bucket_sum) / sumMerge(bucket_count)`. This is often more accurate than estimating an average from bucket midpoints.
[/info]

### Visualizing buckets as a table

To visualize the number of events in each bucket over a selected time range for a metric, you'll join the `bucket_bounds` with the `bucket_counts`. Remember that `bucket_counts` stores deltas, so `sumForEachMerge` is used to aggregate them.

Fill in the query variable `{{metric_name}}` with your metric, e.g. `http_request_duration_seconds`.

```sql
[label SQL query to show bucket counts]
SELECT
    time,
    status,
    upper_limit,
    sum(count) AS value
FROM (
    SELECT
        {{time}} AS time,
        label('status') AS status, -- Use any label as series here
        arrayJoin(arrayZip(anyLastMerge(bucket_bounds), sumForEachMerge(bucket_counts))) AS bucket_data,
        bucket_data.1 AS upper_limit,
        bucket_data.2 AS count
    FROM {{source}}
    WHERE name = {{metric_name}}
      AND dt BETWEEN {{start_time}} AND {{end_time}}
    GROUP BY time, tags
)
WHERE isFinite(upper_limit)
GROUP BY time, status, upper_limit
ORDER BY time, status, upper_limit
```

**Key points for this query:**

- `sumForEachMerge(bucket_counts)` correctly aggregates the delta counts across intervals.
- `arrayJoin(arrayZip(bounds, counts))` pairs each bound with its corresponding count and explodes the array into separate rows.
- `isFinite(upper_limit)` filters out the `+Inf` bucket, which is typically the last element in `bucket_bounds` and represents the total count.
- `label('status')` is an example of including other metric labels for further breakdown.

### Creating a heatmap

A heatmap provides a visual representation of how your metric's distribution changes over time. We aggregate `bucket_counts` to get the total count within each bucket for each time interval.

```sql
[label SQL query for a histogram heatmap]
SELECT
  time,
  concat(toString(round(upper_limit * 1000, 2)), 'ms') AS series, -- Format bounds nicely
  toUInt64(sum(count)) AS value
FROM
(
  SELECT
    {{time}} AS time,
    arrayJoin(arrayZip(anyLastMerge(bucket_bounds), sumForEachMerge(bucket_counts))) AS bucket_data,
    bucket_data.1 AS upper_limit,
    bucket_data.2 AS count
  FROM {{source}}
  WHERE name = {{metric_name}}
    AND dt BETWEEN {{start_time}} AND {{end_time}}
    -- Example: Optional filtering by a label
    -- AND label('status') = '200'
  GROUP BY time, series_id
)
WHERE isFinite(upper_limit) -- Exclude +Inf bound
GROUP BY time, series
ORDER BY toFloat64OrNull(replaceRegexpAll(series, '[^0-9\\.]', '')) DESC
```

**Key points for this query:**

- We `GROUP BY time, series_id` first to ensure consistent bucket boundaries before `arrayJoin`.
- `sumForEachMerge(bucket_counts)` is crucial for aggregating the delta counts.
- `isFinite(upper_limit)` filters out the `+Inf` bucket.
- `series` is constructed from the `upper_limit` to provide meaningful labels for the heatmap.

### Calculating custom quantiles (e.g., p95)

Bucket quantiles are stored in the `bucket_quantiles` aggregate column, which works with the `quantilePrometheusHistogramArrayMerge` function. A helper function `histogramQuantile(p)` is available for convenience.

```sql
-- Definition of the helper function (already available in the database)
CREATE OR REPLACE FUNCTION histogramQuantile AS (bound) ->
  quantilePrometheusHistogramArrayMerge(bound)(bucket_quantiles);
```

**Important: When to use `histogramQuantile` vs. Conditional Form:**

- **`histogramQuantile(p)` (helper)**: Use when your `WHERE` clause filters for **exactly one metric name**. If you query multiple metrics or no specific metric, the histogram buckets from different metrics will be mixed, producing incorrect results.
- **`quantilePrometheusHistogramArrayMergeIf(p)(bucket_quantiles, condition)`**: Use when querying **multiple metric names** or when no specific metric is filtered. The `condition` allows you to apply the quantile calculation selectively.

#### Example: Single metric quantile calculation (recommended)

```sql
[label SQL query to calculate quantiles - single metric)]
SELECT
  {{time}} AS time,
  histogramQuantile(0.50) AS p50,
  histogramQuantile(0.90) AS p90,
  histogramQuantile(0.95) AS p95,
  histogramQuantile(0.99) AS p99
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name = 'your_metric_name'
GROUP BY time
ORDER BY time
```

#### Example: Multiple metrics quantile calculation (conditional form)

```sql
[label SQL query to calculate quantiles - multiple metrics]
SELECT
  {{time}} AS time,
  quantilePrometheusHistogramArrayMergeIf(0.95)(bucket_quantiles, name = 'metric_a') AS metric_a_p95,
  quantilePrometheusHistogramArrayMergeIf(0.95)(bucket_quantiles, name = 'metric_b') AS metric_b_p95
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name IN ('metric_a', 'metric_b')
GROUP BY time
ORDER BY time
```

### Histogram Statistics

For basic statistics without manual percentile calculation, you can use the dedicated `bucket_sum`, `bucket_count`, `bucket_min`, and `bucket_max` aggregate columns.

```sql
[label SQL query for histogram statistics]
SELECT
  {{time}} AS time,
  sumMerge(bucket_sum) / sumMerge(bucket_count) AS avg_value,  -- Average
  minMerge(bucket_min) AS min_value,
  maxMerge(bucket_max) AS max_value,
  sumMerge(bucket_count) AS total_count
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name = 'http_request_duration_seconds'
GROUP BY time
ORDER BY time
```

## Need help?

Please let us know at hello@betterstack.com.
We're happy to help! 🙏
