You can calculate the precise average value of a metric using anyLastMerge(buckets_sum) / anyLastMerge(buckets_count)
. This is often more accurate than estimating an average from bucket midpoints.
Explore documentation
Querying histograms
Histograms are a powerful tool for understanding the distribution of your metric data, such as request latencies or response sizes. Instead of just tracking an average, histograms group observations into configurable buckets, giving you a much richer view of your data's performance and characteristics.
In our system, histogram data is stored in a highly optimized format using special metric-like columns. This guide will walk you through how this data is structured and how to write SQL queries to visualize it as tables, heatmaps, and even calculate custom quantiles like p95.
For details on how you can send us histograms, see Ingesting metrics API. Histograms and summaries coming from Prometheus and histograms from OpenTelemetry are supported out-of-the-box.
Understanding the histogram data structure
When you send histogram data, it's stored in several columns. The most important one is the buckets
column. Let's look at its structure and related fields:
Column | Data Type | Description |
---|---|---|
buckets |
Array(Tuple(String, Float64)) |
An array of (upper_bound, cumulative_count) tuples. This is the core histogram data. |
buckets_sum |
Nullable(Float64) |
The sum of all observed values in the histogram. |
buckets_count |
Nullable(UInt64) |
The total count of all observed values. |
buckets_rate |
Array(Tuple(String, Nullable(Float64))) |
A pre-calculated rate version of the buckets column. |
Depending on how you're sending us the data, the buckets
column may data in a cumulative format. For example, a value like [('0.1', 10), ('0.5', 25), ('+inf', 30)]
could mean:
- 10 events had a value less than or equal to 0.1.
- 25 events had a value less than or equal to 0.5.
- 30 events were observed in total.
or
- 10 events had a value less than or equal to 0.1.
- 25 events had a value less than or equal to 0.5 and higher than 0.1.
- 30 events had a value higher than 0.5.
To work with this data, you'll often need to convert these cumulative counts into individual counts for each bucket. Let's see how.
Visualizing buckets as a table
One of the simplest ways to analyze a histogram is to display the number of events in each bucket in a table.
The following query shows the counts for each bucket over the selected time range for a metric. Fill in the query variable {{metric_name}}
with your metric, e.g. http_request_duration_seconds
:
WITH buckets AS (
SELECT
-- use default time intervals
{{time}} AS time,
-- all metric tags for grouping/filtering later
tags,
-- buckets is an anyLast-aggregated metric holding array of (upper_limit, value)
-- arrayJoin() explodes it so each row is ONE bucket tuple
arrayJoin(anyLastMerge(buckets)) AS bucket
FROM {{source}}
-- apply basic filtering
WHERE name = {{metric_name}}
AND time BETWEEN {{start_time}} AND {{end_time}}
GROUP BY time, tags
),
per_bucket AS (
SELECT
time,
tags,
-- Bucket upper bound (string like '0.1', '2.5', or '+infinity')
bucket.1 AS upper_limit,
-- Bucket value (usually cumulative count for that bound)
toInt64(bucket.2) AS value
FROM buckets
)
-- Final select of the values
SELECT time, tags, upper_limit, value
FROM per_bucket
-- sort chronologically, then by tags, then numerically by upper_limit
-- toFloat64OrNull() makes sure '0.5' sorts before '2.5', and puts '+infinity' last
ORDER BY time, tags, toFloat64OrNull(upper_limit)
Creating a heatmap
A heatmap is an excellent way to see how the distribution of your metric changes over time. To create one, we need to calculate the number of events in each bucket for each time interval.
This query prepares the data in a (time, bucket_label, value)
format, which is perfect for a heatmap visualization.
Since histograms usually contain counters as values, let's use buckets_rate
to chart the change per second of those counters.
To make sure the heatmap is clean, we will add up all rates from differently tagged time series, add an optional metric tag filter, and remove all 0 values in the final SELECT
.
WITH buckets AS (
SELECT
-- use max time resolution and all metric tags for grouping/filtering later
dt, tags,
-- buckets_rate is an anyLast-aggregated metric holding array of (upper_limit, rate)
-- arrayJoin() explodes it so each row is ONE bucket tuple
arrayJoin(anyLastMerge(buckets_rate)) AS bucket
FROM {{source}}
-- apply basic filtering
WHERE name = {{metric_name}}
AND dt BETWEEN {{start_time}} AND {{end_time}}
GROUP BY dt, tags
),
per_bucket AS (
SELECT
-- group by default interval now, and calculate average rates based on all data
{{time}} AS time,
tags,
-- Bucket upper bound (string like '0.1', '2.5', or '+infinity')
bucket.1 AS upper_limit,
-- Bucket rate (assuming counters as values)
avg(toFloat64(bucket.2)) AS rate
FROM buckets
GROUP BY time, tags, upper_limit
)
-- Final select of the values
SELECT time, upper_limit AS series, sum(rate) AS value
FROM per_bucket
[[ WHERE metricTag('path') == {{path}} ]]
GROUP BY time, upper_limit
HAVING value != 0
-- sort numerically by upper_limit
ORDER BY toFloat64OrNull(upper_limit) DESC
Calculating custom quantiles (e.g., p95)
Histograms are perfect for estimating quantiles (or percentiles) like p50 (median) or p95. This tells you the value that a certain percentage of your observations fall under. For example, a p95 latency of 250ms means that 95% of your requests were faster than 250ms.
Because our histograms store counters, we first need to calculate the individual count for each bucket in time. Then, we can use ClickHouse's quantileInterpolatedWeighted
function to calculate any quantile we need. This function is ideal for pre-aggregated data like histograms.
The following query calculates p99, p95, p90, p50, and other quantiles for a given metric:
WITH buckets AS (
SELECT
-- use max time resolution and all metric tags for grouping/filtering later
dt, tags,
-- buckets is an anyLast-aggregated metric holding array of (upper_limit, value)
-- arrayJoin() explodes it so each row is ONE bucket tuple
arrayJoin(anyLastMerge(buckets)) AS bucket
FROM {{source}}
-- apply basic filtering
WHERE name = {{metric_name}}
AND dt BETWEEN {{start_time}} AND {{end_time}}
GROUP BY dt, tags
),
per_bucket AS (
SELECT
-- group by default interval now, and get the latest counts per bucket
{{time}} AS time,
tags,
-- Bucket upper bound (string like '0.1', '2.5', or '+infinity')
bucket.1 AS upper_limit,
-- Latest count as unsigned int
max(toUInt64(bucket.2)) AS value
FROM buckets
GROUP BY time, tags, upper_limit
),
per_time AS (
SELECT
time,
toFloat64OrNull(upper_limit) AS upper_limit,
sum(value) AS value
FROM per_bucket
-- drop +Inf buckets, since they cannot be used
WHERE upper_limit IS NOT NULL
-- here you can add filters for metricTag()
GROUP BY time, upper_limit
),
with_deltas AS (
SELECT
time,
upper_limit,
-- calculating integer change in count from the previous step in time
greatest(
value - lagInFrame(value, 1, 0) OVER (PARTITION BY time ORDER BY upper_limit),
toUInt64(0)
) AS bucket_count
FROM per_time
)
-- Final select of the values using quantileInterpolatedWeighted()
SELECT
time,
quantileInterpolatedWeighted(0.99)(upper_limit, bucket_count) AS p99,
quantileInterpolatedWeighted(0.95)(upper_limit, bucket_count) AS p95,
quantileInterpolatedWeighted(0.9)(upper_limit, bucket_count) AS p90,
quantileInterpolatedWeighted(0.5)(upper_limit, bucket_count) AS p50,
quantileInterpolatedWeighted(0.1)(upper_limit, bucket_count) AS p10,
quantileInterpolatedWeighted(0.05)(upper_limit, bucket_count) AS p05,
quantileInterpolatedWeighted(0.01)(upper_limit, bucket_count) AS p01
FROM with_deltas
GROUP BY time
Need help?
Please let us know at hello@betterstack.com. We're happy to help! ๐