Bringing your own embeddings

You can use any third-party API to generate embeddings for your events prior to sending them to Better Stack data warehouse.

Better Stack's built-in model embeddinggemma:300m should cover most use cases but if you want to use an external provider, we recommend OpenAI's embedding models for the best price/performance ratio.

Send us the embedding alongside the original text in a dedicated attribute as an array of floats.

Storing embeddings as a time series

Go to Warehouse -> Sources -> Your source -> Time series on NVMe SSDs and click + Time series. Use JSON dot notation to write the name of the target column from the previous step and choose BFloat16 and choose your Vector index.

Creating a time series for your embeddings

Querying embeddings in Warehouse

Better Stack Warehouse stores embeddings as vector columns that can be indexed and queried efficiently using ClickHouse’s vector type and similarity functions such as distance() or cosineSimilarity().

To find the closest match, generate embedding using the same model and use a Text query variable to send us the embedding:

Querying in time series
SELECT
  text_id, -- fetch raw texts from JSON events to minimize data on NVM
  cosineDistance(
    meta_embeddings_text,
    JSONExtract({{embedding}}, 'Array(Float32)')
  ) AS distance
FROM {{source}}
ORDER BY distance ASC
LIMIT 5