Writing fast s3Cluster queries

Optimize your query performance to get faster results when exploring your raw events.

Running the same query repeatedly?

Extract values as unaggregated fields in Sources β†’ Your source β†’ Time series on NVMe SSD tab instead of querying raw JSONevents each time. This converts your frequently-accessed events into time series metrics for much faster retrieval.

Learn more about Extracting time series from events.

Understanding your data types

Our infrastructure handles JSON events and time series differently, with metrics offering much faster query performance. Learn more about JSON events vs. time series.

If you're frequently running the same queries, consider whether your use case would benefit from converting events to time series.

When to use time series vs. JSON events

Use time series when:

  • Running the same query repeatedly.
  • Need sub-second query performance.
  • Working with numerical data and aggregations.

Use JSON events when:

  • Performing ad-hoc exploration.
  • Need full context and details.
  • Debugging specific issues.
  • Working with unstructured data.

Optimizing ad-hoc event queries

For faster queries on your JSON events, try these optimization techniques:

1. Narrow your time range

Shorter time frames significantly reduce the amount of data processed.

Narrowing the time range

2. Make your s3Cluster function more specific

Using more specific WHERE clause with the s3cluster function
SELECT dt, raw
FROM (
  SELECT dt, raw
  FROM remote(t123456_your_source_logs)
  UNION ALL
  SELECT dt, raw
  FROM s3Cluster(primary, t123456_your_source_s3)
  WHERE _row_type = 1 -- include as many filters here, in the inner query  
)
WHERE raw LIKE '%My text%'
ORDER BY dt ASC
LIMIT 5000
FORMAT JSONEachRow

3. Query specific sources

Instead of searching across all sources, target the specific source containing your data.

  • Select individual sources in the source dropdown.
  • Avoid querying All sources when possible.

Querying a single source

4. Use sampling for exploration

Enable Sampling to work with a representative subset of your data while developing and testing queries.

Using sampling

5. Request additional compute

For consistently slow queries on large datasets, we can add more compute power to your cluster:

  • Share a slow query link with our support team.
  • We'll analyze your data volume and query performance.
  • Small adjustments are often available at no charge.
  • Larger performance improvements for very large datasets may require a custom cluster for an additional cost.

Custom clusters for high performance

For applications requiring consistently fast queries over large datasets and long time periods, we can provision dedicated compute resources:

  • Tailored setup: Custom cluster sized for your specific needs.
  • Dedicated compute: No resource sharing with other workloads.
  • Faster speeds: Optimized for your query patterns and data volume.
  • Additional cost: Comes with dedicated infrastructure pricing.

Contact our support team at hello@betterstack.com to discuss custom cluster options for your use case πŸ“©

Getting help

Generally speaking, we can make querying as fast as needed through query optimization or infrastructure scaling. If you're experiencing slow query performance:

  1. Try the optimization techniques above.
  2. Share a slow Live tail link with our support team using the in-app chat or at hello@betterstack.com.
  3. Describe your performance requirements and use case.

We're happy to help find the right balance of performance and cost for your needs πŸš€

Better Stack support team