Wide-events vs. time series

Better Stack is powered by a purpose-built data processing pipeline for time-based events that leverages recent advances in stream processing, cloud storage, and data warehousing.

We developed the pipeline for processing massive internet-scale datasets: think petabytes to exabytes.

The pipeline has a unique combination of properties:

  • Massively scalable
  • No cardinality limitations
  • Sub-second analytical queries
  • Cost efficient

How do we achieve these seemingly mutually exclusive properties?

We work with two types of data: wide events and time series.

Wide event

Any time-based JSON with arbitrary structure smaller than 10MB stored in object storage. Think OpenTelemetry span or any structured log line.

Time series

A set of highly compressed time-based metrics with a pre-defined schema stored on local NVMe SSD drives. Prometheus metrics, for example.

While you can ingest time series data directly, the secret sauce of our data pipeline is the integration of wide events with time series directly via “logs-to-metrics” expressions.

Logs-to-metrics expressions

Logs-to-metrics expressions are SQL expressions that extract specific JSON attributes from wide events in real-time into highly compressed and locally stored time series columns.

Example: Say your structured logs (wide events) contain the attribute duration that we want to track in a dashboard over large periods of massive data sets.

A logs-to-metrics expression GetJSON(‘duration’) of type Float64 aggregated via avg, min, and max generates 3 columns in your time series schema that you can chart in your dashboards at scale with a high query-speed with:

Logs-to-metrics query example
SELECT
  {{time}} AS time,
  avgMerge(duration_avg),
  minMerge(duration_min),
  maxMerge(duration_max)
FROM {{source}}

Merge() functions are a ClickHouse specialty. You don’t need to worry about them now — most of the time you will be charting trends with our drag&drop query builder anyway.

Ad-hoc Live tail → Explore queries

Imagine you want to see the trend of a different attribute, client_ip over time, but you don’t have such a time series column. You can run an ad-hoc query on your wide events in Live tail → Explore with:

Ad-hoc Explore query on wide events
SELECT
  {{time}} AS time,
  GetJSON('client_ip') AS client_ip,
  COUNT(*)
FROM {{source}}
GROUP BY time, client_ip

Don’t get scared by this query — most of the time, you’ll use an intuitive drag&drop query builder.

This Explore query will work great, but it requires a lot more resources to process the raw wide events from object storage and will be thus much slower at scale. If needed, you can leverage our built-in sampling to make any Explore query faster, even for long time-intervals.

Time series enable you to create very fast analytical dashboards with sub-second queries even for massive datasets.

For everything else there's wide events.

And the best thing? You can always change your mind and add more time series later. We only bill you for the time series you use.

Overview: wide events vs. time series

Wide events (logs & spans) Time series (metrics)
Examples any JSON such as structured logs, OpenTelemetry traces & spans, plain text logs Prometheus metrics, OpenTelemetry metrics, metrics extracted from wide-events
Best used for filtering massive amounts of data; leveraging sampling to chart ad-hoc insights without predefined metrics fast dashboards charting metrics over long time periods, tracking long-term trends with metrics over time
Storage Scalable object storage in the cloud high-speed local NVMe drives
Cardinality High cardinality Low cardinality
Compression Somewhat compressed Heavily compressed
Data format Row store Column store
Sampling Sampling available Always unsampled
Cost Cost-effective Optimized for performance

Are you planning to ingest over 100 TB per month? Need to store data in a custom data region or your own S3 bucket? Need a fast query speed even for large datasets? Please get in touch at hello@betterstack.com.