# Schema migration Feb ’26

We’re unifying how we store user-defined metrics extracted from logs and spans alongside Prometheus metrics.

Going forward, there’s just one way to query your metrics.

## Schema overview

Our new row-based schema stores all data in a unified format:

- Metrics stored as named values with labels in a `tags` map
- Original log data accessible via `_row_type = 'log'`
- Aggregated metric functions for efficient querying

| Old schema | New schema |
|----------------------|---|
| `response_status` | `label('response_status')` |
| `buckets` | `bucket_bounds` + `bucket_counts` |
| `p50`, `p90`, `p95`, `p99` | `histogramQuantile(0.5)`, `histogramQuantile(0.9)`, ... |
| `_row_type = 'logs'` | `_row_type = 'log'` |

## Accessing tags and labels

All non-aggregated columns that were previously accessed directly must now use the `label()` function. Labels are always strings, so `toString(label(...))` is redundant.

```sql
[label Accessing labels]
-- Before:
-- SELECT response_status, request_method, user_agent
-- FROM {{source}}
-- WHERE response_status >= 500

-- After:
SELECT label('response_status'), label('request_method'), label('user_agent')
FROM {{source}}
WHERE toUInt16OrNull(label('response_status')) >= 500
```

## Standard columns

The following columns are part of the schema itself and don't need the `label()` wrapper.

```
dt, name, _row_type, 
series_id, tags, events_count, 
value_avg, value_min, value_max, rate_avg, 
bucket_bounds, bucket_counts, bucket_sum, bucket_count, 
bucket_min, bucket_max, bucket_quantiles
```

## Row type filtering

Each ingested event creates multiple rows. One row for the original event (`_row_type = 'log'` or `'span'`), and multiple for extracted metrics (`_row_type = 'extracted_metric'`).

| Value | Alias | Description |
|---|---|---|
| `1` | `'log'` | Original log event |
| `2` | `'extracted_metric'` | Metrics extracted from logs/spans |
| `3` | `'span'` | Original span event |

**Count log events**

```sql
[label Counting log events]
SELECT {{time}} AS time, countMerge(events_count) AS value
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND _row_type = 'log'
GROUP BY time
```

**Don't use `_row_type = 'log'` or `'span'` when querying named metrics**

```sql
[label Querying named metrics]
-- WRONG: name = 'duration' is an extracted metric, not a span
-- SELECT histogramQuantile(0.95) AS p95
-- FROM {{source}}
-- WHERE dt BETWEEN {{start_time}} AND {{end_time}}
--   AND _row_type = 3
--   AND name = 'duration'
-- GROUP BY time

-- CORRECT: Filter by metric name
SELECT histogramQuantile(0.95) AS p95
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name = 'duration'
GROUP BY time
```

## Type handling for labels

Since labels are always strings, you must handle type conversions explicitly when comparing to numbers.

**Numeric comparison**

```sql
[label Numeric comparison on labels]
-- Range check
WHERE toUInt16OrNull(label('response_status')) BETWEEN 200 AND 299

-- Inequality
WHERE toFloat64OrNull(label('response_status')) >= 500
AND toFloat64OrNull(label('response_status')) <= 599
```

## Histograms and buckets

The new schema uses `bucket_bounds` and `bucket_counts`, with the counts storing deltas. `sumForEachMerge()` is used to sum these deltas.

```sql
[label Accessing bucket data]
-- Old pattern:
-- arrayJoin(anyLastMerge(buckets)) AS bucket

-- New pattern:
SELECT
  time,
  arrayJoin(arrayZip(bounds, counts)) AS bucket,
  bucket.1 AS bound,
  bucket.2 AS count
FROM
(
  SELECT
    {{time}} AS time,
    anyLastMerge(bucket_bounds) AS bounds,
    sumForEachMerge(bucket_counts) AS counts
  FROM {{source}}
  WHERE ...
  GROUP BY time
)
```

**Quantiles and percentiles**
Use the `histogramQuantile(p)` helper function for single metrics, or `quantilePrometheusHistogramArrayMergeIf(p)(bucket_quantiles, condition)` for multiple metrics.

```sql
[label Calculating quantiles (single metric)]
SELECT
  {{time}} AS time,
  histogramQuantile(0.95) AS p95
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND name = 'http_request_duration_seconds'
GROUP BY time
```

## Event counting

For aggregated metrics queries, use `countMerge(events_count)`. Because a single event can be represented as multiple metric rows, you need to add the `AND _row_type = 'log'` filter to get accurate event counts.

```sql
[label Counting events in aggregated queries]
SELECT
  {{time}} AS time,
  countMerge(events_count) AS value
FROM {{source}}
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
  AND _row_type = 'log'
GROUP BY time
```

This change will go in effect for existing organizations during February ’26.


Let us know at hello@betterstack.com if you have any feedback.
We hope this change will make the product more straightforward and intuitive going forward.
