Querying histograms

Histograms are a powerful tool for understanding the distribution of your metric data, such as request latencies or response sizes. Instead of just tracking an average, histograms group observations into configurable buckets, giving you a much richer view of your data's performance and characteristics.

In our system, histogram data is stored in a highly optimized format. This guide will walk you through how this data is structured and how to write SQL queries to visualize it as tables, heatmaps, and even calculate custom quantiles like p95.

For details on how you can send us histograms, see Ingesting metrics API. Histograms and summaries coming from Prometheus and histograms from OpenTelemetry are supported out-of-the-box.

Querying histograms

We store histograms in a unified and efficient way. Histogram data is split into bucket_bounds (the upper limits of the buckets) and bucket_counts (the number of observations in each bucket, stored as deltas). Pre-computed quantiles are available via bucket_quantiles.

Understanding the histogram data structure

Column Merge Function Description
bucket_bounds anyLastMerge() Histogram bucket boundaries (e.g., [0.1, 0.5, inf]).
bucket_counts sumForEachMerge() Counts per bucket (element-wise sum of deltas).
bucket_quantiles histogramQuantile(p) Pre-computed quantile function for efficient percentile calculation.
bucket_sum sumMerge() Sum of all observed values.
bucket_count sumMerge() Total number of observations.
bucket_min minMerge() Minimum observed value.
bucket_max maxMerge() Maximum observed value.

You can calculate the precise average value of a metric using sumMerge(bucket_sum) / sumMerge(bucket_count). This is often more accurate than estimating an average from bucket midpoints.

Visualizing buckets as a table

To visualize the number of events in each bucket over a selected time range for a metric, you'll join the bucket_bounds with the bucket_counts. Remember that bucket_counts stores deltas, so sumForEachMerge is used to aggregate them.

Fill in the query variable {{metric_name}} with your metric, e.g. http_request_duration_seconds.

SQL query to show bucket counts
SELECT
    time,
    status,
    upper_limit,
    sum(count) AS value
FROM (
    SELECT
        {{time}} AS time,
        label('status') AS status, -- Use any label as series here
        arrayJoin(arrayZip(anyLastMerge(bucket_bounds), sumForEachMerge(bucket_counts))) AS bucket_data,
        bucket_data.1 AS upper_limit,
        bucket_data.2 AS count
    FROM {{source}}
    WHERE name = {{metric_name}}
      AND dt BETWEEN {{start_time}} AND {{end_time}}
    GROUP BY time, tags
)
WHERE isFinite(upper_limit)
GROUP BY time, status, upper_limit
ORDER BY time, status, upper_limit

Key points for this query:

  • sumForEachMerge(bucket_counts) correctly aggregates the delta counts across intervals.
  • arrayJoin(arrayZip(bounds, counts)) pairs each bound with its corresponding count and explodes the array into separate rows.
  • isFinite(upper_limit) filters out the +Inf bucket, which is typically the last element in bucket_bounds and represents the total count.
  • label('status') is an example of including other metric labels for further breakdown.

Creating a heatmap

A heatmap provides a visual representation of how your metric's distribution changes over time. We aggregate bucket_counts to get the total count within each bucket for each time interval.

SQL query for a histogram heatmap
SELECT
  time,
  concat(toString(round(upper_limit * 1000, 2)), 'ms') AS series, -- Format bounds nicely
  toUInt64(sum(count)) AS value
FROM
(
  SELECT
    {{time}} AS time,
    arrayJoin(arrayZip(anyLastMerge(bucket_bounds), sumForEachMerge(bucket_counts))) AS bucket_data,
    bucket_data.1 AS upper_limit,
    bucket_data.2 AS count
  FROM {{source}}
  WHERE name = {{metric_name}}
    AND dt BETWEEN {{start_time}} AND {{end_time}}
    -- Example: Optional filtering by a label
    -- AND label('status') = '200'
  GROUP BY time, series_id
)
WHERE isFinite(upper_limit) -- Exclude +Inf bound
GROUP BY time, series
ORDER BY toFloat64OrNull(replaceRegexpAll(series, '[^0-9\\.]', '')) DESC

Key points for this query:

  • We GROUP BY time, series_id first to ensure consistent bucket boundaries before arrayJoin.
  • sumForEachMerge(bucket_counts) is crucial for aggregating the delta counts.
  • isFinite(upper_limit) filters out the +Inf bucket.
  • series is constructed from the upper_limit to provide meaningful labels for the heatmap.

Calculating custom quantiles (e.g., p95)

Bucket quantiles are stored in the bucket_quantiles aggregate column, which works with the quantilePrometheusHistogramArrayMerge function. A helper function histogramQuantile(p) is available for convenience.

 
-- Definition of the helper function (already available in the database)
CREATE OR REPLACE FUNCTION histogramQuantile AS (bound) ->
  quantilePrometheusHistogramArrayMerge(bound)(bucket_quantiles);

Important: When to use histogramQuantile vs. Conditional Form:

  • histogramQuantile(p) (helper): Use when your WHERE clause filters for exactly one metric name. If you query multiple metrics or no specific metric, the histogram buckets from different metrics will be mixed, producing incorrect results.
  • quantilePrometheusHistogramArrayMergeIf(p)(bucket_quantiles, condition): Use when querying multiple metric names or when no specific metric is filtered. The condition allows you to apply the quantile calculation selectively.
SQL query to calculate quantiles - single metric)
SELECT
  {{time}} AS time,
  histogramQuantile(0.50) AS p50,
  histogramQuantile(0.90) AS p90,
  histogramQuantile(0.95) AS p95,
  histogramQuantile(0.99) AS p99
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name = 'your_metric_name'
GROUP BY time
ORDER BY time

Example: Multiple metrics quantile calculation (conditional form)

SQL query to calculate quantiles - multiple metrics
SELECT
  {{time}} AS time,
  quantilePrometheusHistogramArrayMergeIf(0.95)(bucket_quantiles, name = 'metric_a') AS metric_a_p95,
  quantilePrometheusHistogramArrayMergeIf(0.95)(bucket_quantiles, name = 'metric_b') AS metric_b_p95
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name IN ('metric_a', 'metric_b')
GROUP BY time
ORDER BY time

Histogram Statistics

For basic statistics without manual percentile calculation, you can use the dedicated bucket_sum, bucket_count, bucket_min, and bucket_max aggregate columns.

SQL query for histogram statistics
SELECT
  {{time}} AS time,
  sumMerge(bucket_sum) / sumMerge(bucket_count) AS avg_value,  -- Average
  minMerge(bucket_min) AS min_value,
  maxMerge(bucket_max) AS max_value,
  sumMerge(bucket_count) AS total_count
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name = 'http_request_duration_seconds'
GROUP BY time
ORDER BY time

Need help?

Please let us know at hello@betterstack.com. We're happy to help! 🙏