You can calculate the precise average value of a metric using anyLastMerge(buckets_sum) / anyLastMerge(buckets_count). This is often more accurate than estimating an average from bucket midpoints.
Explore documentation
Querying histograms
Histograms are a powerful tool for understanding the distribution of your metric data, such as request latencies or response sizes. Instead of just tracking an average, histograms group observations into configurable buckets, giving you a much richer view of your data's performance and characteristics.
In our system, histogram data is stored in a highly optimized format using special metric-like columns. This guide will walk you through how this data is structured and how to write SQL queries to visualize it as tables, heatmaps, and even calculate custom quantiles like p95.
For details on how you can send us histograms, see Ingesting metrics API. Histograms and summaries coming from Prometheus and histograms from OpenTelemetry are supported out-of-the-box.
Understanding the histogram data structure
When you send histogram data, it's stored in several columns. The most important one is the buckets column. Let's look at its structure and related fields:
| Column | Data Type | Description |
|---|---|---|
buckets |
Array(Tuple(String, Float64)) |
An array of (upper_bound, cumulative_count) tuples. This is the core histogram data. |
buckets_sum |
Nullable(Float64) |
The sum of all observed values in the histogram. |
buckets_count |
Nullable(UInt64) |
The total count of all observed values. |
buckets_rate |
Array(Tuple(String, Nullable(Float64))) |
A pre-calculated rate version of the buckets column. |
Depending on how you're sending us the data, the buckets column may data in a cumulative format. For example, a value like [('0.1', 10), ('0.5', 25), ('+inf', 30)] could mean:
- 10 events had a value less than or equal to 0.1.
- 25 events had a value less than or equal to 0.5.
- 30 events were observed in total.
or
- 10 events had a value less than or equal to 0.1.
- 25 events had a value less than or equal to 0.5 and higher than 0.1.
- 30 events had a value higher than 0.5.
To work with this data, you'll often need to convert these cumulative counts into individual counts for each bucket. Let's see how.
Visualizing buckets as a table
One of the simplest ways to analyze a histogram is to display the number of events in each bucket in a table.
The following query shows the counts for each bucket over the selected time range for a metric. Fill in the query variable {{metric_name}} with your metric, e.g. http_request_duration_seconds:
Creating a heatmap
A heatmap is an excellent way to see how the distribution of your metric changes over time. To create one, we need to calculate the number of events in each bucket for each time interval.
This query prepares the data in a (time, bucket_label, value) format, which is perfect for a heatmap visualization.
Since histograms usually contain counters as values, let's use buckets_rate to chart the change per second of those counters.
To make sure the heatmap is clean, we will add up all rates from differently tagged time series, add an optional metric tag filter, and remove all 0 values in the final SELECT.
Calculating custom quantiles (e.g., p95)
Histograms are perfect for estimating quantiles (or percentiles) like p50 (median) or p95. This tells you the value that a certain percentage of your observations fall under. For example, a p95 latency of 250ms means that 95% of your requests were faster than 250ms.
Because our histograms store counters, we first need to calculate the individual count for each bucket in time. Then, we can use ClickHouse's quantileInterpolatedWeighted function to calculate any quantile we need. This function is ideal for pre-aggregated data like histograms.
The following query calculates p99, p95, p90, p50, and other quantiles for a given metric:
Need help?
Please let us know at hello@betterstack.com. We're happy to help! π