Want to learn more about time series?
In Telemetry, we use time series for Metrics in a similar way.
You can learn more tips and tricks in Extracting metrics from logs.
Better Stack Warehouse is powered by a purpose-built data processing pipeline for time-based events that leverages recent advances in stream processing, cloud storage, and analytical databases.
We developed the pipeline for processing massive internet-scale datasets: think petabytes to exabytes.
The pipeline has a unique combination of properties:
How do we achieve these seemingly mutually exclusive properties?
We work with two types of data: JSON events and time series.
JSON event is any time-based JSON with arbitrary structure smaller than 10MB stored in object storage. Think OpenTelemetry span, a structured log line or a JavaScript object containing properties of a user action.
Time series is a set of highly compressed time-based metrics with a pre-defined schema stored on local NVMe SSD drives. Prometheus metrics, for example.
While you can ingest time series data directly, the secret sauce of our data pipeline is the integration of JSON events with time series directly
Go to Warehouse -> Sources → Your source → Time series on NVMe SSD and add SQL expressions that extract specific JSON attributes from JSON events in real-time into highly compressed and locally stored time series columns.
In Telemetry, we use time series for Metrics in a similar way.
You can learn more tips and tricks in Extracting metrics from logs.
Say your JSON event contains the attribute duration that we want to use in a client-facing API generated with Queries.
A SQL expression JSONExtract('duration', 'Nullable(Float64)') of type Float64 aggregated via avg, min, and max generates 3 columns in your time series schema.
After creating the time series, you use the new columns in your API:
SELECT
{{time}} AS time,
avgMerge(duration_avg),
minMerge(duration_min),
maxMerge(duration_max)
FROM {{source}}
GROUP BY time
*Merge() functions are a ClickHouse specialty. You don’t need to worry about them now — most of the time you will be charting trends with our Drag & drop query builder anyway.
Define time series for queries you need to execute frequently or trends you want to track long-term over large data sets.
Time series enable you to create very fast analytical client facing API endpoints even for massive datasets.
For everything else there's wide events.
And the best thing? You can always change your mind and add more time series later. We only bill you for the time series you use.
When you select No aggregation in your created time series, we will create a single new column without any *Merge() function. We will split your data into multiple records to keep all combinations distinct.
Non-aggregated time series can be used in WHERE or GROUP BY clauses.
Say your JSON event contains the attribute process alongside aggregated duration, that we want to use in Queries for filtering and grouping the results. A SQL expression JSONExtract('process', 'Nullable(String)') of type String without any aggregation generates the single columns in your schema.
SELECT
{{time}} AS time,
process,
avgMerge(duration_avg),
minMerge(duration_min),
maxMerge(duration_max)
FROM {{source}}
WHERE process LIKE 'order_%'
GROUP BY time, process
Since a row exists for every unique combination of all your non-aggregated time series data, we recommend keeping the cardinality of all time series without aggregations as low as possible.
| JSON events | Time series | |
|---|---|---|
| Examples | Any JSON such as structured logs, OpenTelemetry traces & spans, plain text logs | Prometheus metrics, OpenTelemetry metrics, time series extracted from JSON events |
| Best for | Keeping large amounts of raw unstructured data with high cardinality | Fast frequently executed queries powering analytical APIs |
| Storage | Scalable object storage in the cloud | High-speed local NVMe drives |
| Cardinality | High cardinality | Low cardinality |
| Compression | Somewhat compressed | Heavily compressed |
| Data format | Row store | Column store |
| Sampling | Sampling available | Always unsampled |
| Cost | Cost-effective | Optimized for performance |
Are you planning to ingest over 100 TB per month? Need to store data in a custom data region or your own S3 bucket? Need a fast query speed even for large datasets? Please get in touch at hello@betterstack.com.
We use cookies to authenticate users, improve the product user experience, and for personalized ads. Learn more.