Are you planning to ingest over 100 TB per month? Need to store data in a custom data region or your own S3 bucket? Need a fast query speed even for large datasets? Please get in touch at hello@betterstack.com.
Explore documentation
Wide-events vs. time series
Better Stack is powered by a purpose-built data processing pipeline for time-based events that leverages recent advances in stream processing, cloud storage, and data warehousing.
We developed the pipeline for processing massive internet-scale datasets: think petabytes to exabytes.
The pipeline has a unique combination of properties:
- Massively scalable
- No cardinality limitations
- Sub-second analytical queries
- Cost efficient
How do we achieve these seemingly mutually exclusive properties?
We work with two types of data: wide events and time series.
Wide event
Any time-based JSON with arbitrary structure smaller than 10MB stored in object storage. Think OpenTelemetry span or any structured log line.
Time series
A set of highly compressed time-based metrics with a pre-defined schema stored on local NVMe SSD drives. Prometheus metrics, for example.
While you can ingest time series data directly, the secret sauce of our data pipeline is the integration of wide events with time series directly via âlogs-to-metricsâ expressions.
Logs-to-metrics expressions
Logs-to-metrics expressions are SQL expressions that extract specific JSON attributes from wide events in real-time into highly compressed and locally stored time series columns.
Example:
Say your structured logs (wide events) contain the attribute duration
that we want to track in a dashboard over large periods of massive data sets.
A logs-to-metrics expression GetJSON(âdurationâ)
of type Float64
aggregated via avg
, min
, and max
generates 3 columns in your time series schema that you can chart in your dashboards at scale with a high query-speed with:
SELECT
{{time}} AS time,
avgMerge(duration_avg),
minMerge(duration_min),
maxMerge(duration_max)
FROM {{source}}
Merge()
functions are a ClickHouse specialty. You donât need to worry about them now â most of the time you will be charting trends with our drag&drop query builder anyway.
Ad-hoc Live tail â Explore queries
Imagine you want to see the trend of a different attribute, client_ip
over time, but you donât have such a time series column.
You can run an ad-hoc query on your wide events in Live tail â Explore with:
SELECT
{{time}} AS time,
GetJSON('client_ip') AS client_ip,
COUNT(*)
FROM {{source}}
GROUP BY time, client_ip
Donât get scared by this query â most of the time, youâll use an intuitive drag&drop query builder.
This Explore query will work great, but it requires a lot more resources to process the raw wide events from object storage and will be thus much slower at scale. If needed, you can leverage our built-in sampling to make any Explore query faster, even for long time-intervals.
Define metrics for trends you need to chart frequently or trends you want to track long-term over large data sets.
Time series enable you to create very fast analytical dashboards with sub-second queries even for massive datasets.
For everything else there's wide events.
And the best thing? You can always change your mind and add more time series later. We only bill you for the time series you use.
Overview: wide events vs. time series
Wide events (logs & spans) | Time series (metrics) | |
---|---|---|
Examples | any JSON such as structured logs, OpenTelemetry traces & spans, plain text logs | Prometheus metrics, OpenTelemetry metrics, metrics extracted from wide-events |
Best used for | filtering massive amounts of data; leveraging sampling to chart ad-hoc insights without predefined metrics | fast dashboards charting metrics over long time periods, tracking long-term trends with metrics over time |
Storage | Scalable object storage in the cloud | high-speed local NVMe drives |
Cardinality | High cardinality | Low cardinality |
Compression | Somewhat compressed | Heavily compressed |
Data format | Row store | Column store |
Sampling | Sampling available | Always unsampled |
Cost | Cost-effective | Optimized for performance |