Querying histograms

Histograms are a powerful tool for understanding the distribution of your metric data, such as request latencies or response sizes. Instead of just tracking an average, histograms group observations into configurable buckets, giving you a much richer view of your data's performance and characteristics.

In our system, histogram data is stored in a highly optimized format using special metric-like columns. This guide will walk you through how this data is structured and how to write SQL queries to visualize it as tables, heatmaps, and even calculate custom quantiles like p95.

For details on how you can send us histograms, see Ingesting metrics API. Histograms and summaries coming from Prometheus and histograms from OpenTelemetry are supported out-of-the-box.

Understanding the histogram data structure

When you send histogram data, it's stored in several columns. The most important one is the buckets column. Let's look at its structure and related fields:

Column Data Type Description
buckets Array(Tuple(String, Float64)) An array of (upper_bound, cumulative_count) tuples. This is the core histogram data.
buckets_sum Nullable(Float64) The sum of all observed values in the histogram.
buckets_count Nullable(UInt64) The total count of all observed values.
buckets_rate Array(Tuple(String, Nullable(Float64))) A pre-calculated rate version of the buckets column.

Depending on how you're sending us the data, the buckets column may data in a cumulative format. For example, a value like [('0.1', 10), ('0.5', 25), ('+inf', 30)] could mean:

  • 10 events had a value less than or equal to 0.1.
  • 25 events had a value less than or equal to 0.5.
  • 30 events were observed in total.

or

  • 10 events had a value less than or equal to 0.1.
  • 25 events had a value less than or equal to 0.5 and higher than 0.1.
  • 30 events had a value higher than 0.5.

To work with this data, you'll often need to convert these cumulative counts into individual counts for each bucket. Let's see how.

You can calculate the precise average value of a metric using anyLastMerge(buckets_sum) / anyLastMerge(buckets_count). This is often more accurate than estimating an average from bucket midpoints.

Visualizing buckets as a table

One of the simplest ways to analyze a histogram is to display the number of events in each bucket in a table.

The following query shows the counts for each bucket over the selected time range for a metric. Fill in the query variable {{metric_name}} with your metric, e.g. http_request_duration_seconds:

SQL query to show bucket counts
WITH buckets AS (
  SELECT
    -- use default time intervals
    {{time}} AS time,
    -- all metric tags for grouping/filtering later
    tags,
    -- buckets is an anyLast-aggregated metric holding array of (upper_limit, value)
    -- arrayJoin() explodes it so each row is ONE bucket tuple
    arrayJoin(anyLastMerge(buckets)) AS bucket
  FROM {{source}}
  -- apply basic filtering
  WHERE name = {{metric_name}}
    AND time BETWEEN {{start_time}} AND {{end_time}}
  GROUP BY time, tags
),
per_bucket AS (
  SELECT
    time,
    tags,
    -- Bucket upper bound (string like '0.1', '2.5', or '+infinity')
    bucket.1 AS upper_limit,
    -- Bucket value (usually cumulative count for that bound)         
    toInt64(bucket.2) AS value
  FROM buckets
)
-- Final select of the values
SELECT time, tags, upper_limit, value
FROM per_bucket
-- sort chronologically, then by tags, then numerically by upper_limit
-- toFloat64OrNull() makes sure '0.5' sorts before '2.5', and puts '+infinity' last
ORDER BY time, tags, toFloat64OrNull(upper_limit)

Histogram data as a table

Creating a heatmap

A heatmap is an excellent way to see how the distribution of your metric changes over time. To create one, we need to calculate the number of events in each bucket for each time interval.

This query prepares the data in a (time, bucket_label, value) format, which is perfect for a heatmap visualization.

Since histograms usually contain counters as values, let's use buckets_rate to chart the change per second of those counters.

To make sure the heatmap is clean, we will add up all rates from differently tagged time series, add an optional metric tag filter, and remove all 0 values in the final SELECT.

SQL query for a histogram heatmap
WITH buckets AS (
  SELECT
    -- use max time resolution and all metric tags for grouping/filtering later
    dt, tags,
    -- buckets_rate is an anyLast-aggregated metric holding array of (upper_limit, rate)
    -- arrayJoin() explodes it so each row is ONE bucket tuple
    arrayJoin(anyLastMerge(buckets_rate)) AS bucket
  FROM {{source}}
  -- apply basic filtering
  WHERE name = {{metric_name}}
    AND dt BETWEEN {{start_time}} AND {{end_time}}
  GROUP BY dt, tags
),
per_bucket AS (
  SELECT
    -- group by default interval now, and calculate average rates based on all data
    {{time}} AS time,
    tags,
    -- Bucket upper bound (string like '0.1', '2.5', or '+infinity')
    bucket.1 AS upper_limit,
    -- Bucket rate (assuming counters as values)
    avg(toFloat64(bucket.2)) AS rate
  FROM buckets
  GROUP BY time, tags, upper_limit
)
-- Final select of the values
SELECT time, upper_limit AS series, sum(rate) AS value
FROM per_bucket
[[ WHERE metricTag('path') == {{path}} ]]
GROUP BY time, upper_limit
HAVING value != 0
-- sort numerically by upper_limit
ORDER BY toFloat64OrNull(upper_limit) DESC

Histogram data as a heatmap

Calculating custom quantiles (e.g., p95)

Histograms are perfect for estimating quantiles (or percentiles) like p50 (median) or p95. This tells you the value that a certain percentage of your observations fall under. For example, a p95 latency of 250ms means that 95% of your requests were faster than 250ms.

Because our histograms store counters, we first need to calculate the individual count for each bucket in time. Then, we can use ClickHouse's quantileInterpolatedWeighted function to calculate any quantile we need. This function is ideal for pre-aggregated data like histograms.

The following query calculates p99, p95, p90, p50, and other quantiles for a given metric:

SQL query to calculate quantiles
WITH buckets AS (
  SELECT
    -- use max time resolution and all metric tags for grouping/filtering later
    dt, tags,
    -- buckets is an anyLast-aggregated metric holding array of (upper_limit, value)
    -- arrayJoin() explodes it so each row is ONE bucket tuple
    arrayJoin(anyLastMerge(buckets)) AS bucket
  FROM {{source}}
  -- apply basic filtering
  WHERE name = {{metric_name}}
    AND dt BETWEEN {{start_time}} AND {{end_time}}
  GROUP BY dt, tags
),
per_bucket AS (
  SELECT
    -- group by default interval now, and get the latest counts per bucket
    {{time}} AS time,
    tags,
    -- Bucket upper bound (string like '0.1', '2.5', or '+infinity')
    bucket.1 AS upper_limit,
    -- Latest count as unsigned int
    max(toUInt64(bucket.2)) AS value
  FROM buckets
  GROUP BY time, tags, upper_limit
),
per_time AS (
  SELECT
    time,
    toFloat64OrNull(upper_limit) AS upper_limit, 
    sum(value) AS value
  FROM per_bucket
  -- drop +Inf buckets, since they cannot be used
  WHERE upper_limit IS NOT NULL
    -- here you can add filters for metricTag()
  GROUP BY time, upper_limit
),
with_deltas AS (
  SELECT
    time,
    upper_limit,
    -- calculating integer change in count from the previous step in time
    greatest(
      value - lagInFrame(value, 1, 0) OVER (PARTITION BY time ORDER BY upper_limit),
      toUInt64(0)
    ) AS bucket_count
  FROM per_time
)
-- Final select of the values using quantileInterpolatedWeighted()
SELECT
  time,
  quantileInterpolatedWeighted(0.99)(upper_limit, bucket_count) AS p99,
  quantileInterpolatedWeighted(0.95)(upper_limit, bucket_count) AS p95,
  quantileInterpolatedWeighted(0.9)(upper_limit, bucket_count) AS p90,
  quantileInterpolatedWeighted(0.5)(upper_limit, bucket_count) AS p50,
  quantileInterpolatedWeighted(0.1)(upper_limit, bucket_count) AS p10,
  quantileInterpolatedWeighted(0.05)(upper_limit, bucket_count) AS p05,
  quantileInterpolatedWeighted(0.01)(upper_limit, bucket_count) AS p01
FROM with_deltas 
GROUP BY time

Histogram data as percentiles

Need help?

Please let us know at hello@betterstack.com. We're happy to help! ๐Ÿ™