Explore documentation
Bringing your own embeddings
You can use any third-party API to generate embeddings for your events prior to sending them to Better Stack data warehouse.
Better Stack's built-in model embeddinggemma:300m should cover most use cases but if you want to use an external provider, we recommend OpenAI's embedding models for the best price/performance ratio.
Send us the embedding alongside the original text in a dedicated attribute as an array of floats.
Storing embeddings as a time series
Go to Warehouse -> Sources -> Your source -> Time series on NVMe SSDs and click + Time series. Use JSON dot notation to write the name of the target column from the previous step and choose BFloat16 and choose your Vector index.
Querying embeddings in Warehouse
Better Stack Warehouse stores embeddings as vector columns that can be indexed and queried efficiently using ClickHouse’s vector type and similarity functions such as distance() or cosineSimilarity().
To find the closest match, generate embedding using the same model and use a Text query variable to send us the embedding:
SELECT
text_id, -- fetch raw texts from JSON events to minimize data on NVM
cosineDistance(
meta_embeddings_text,
JSONExtract({{embedding}}, 'Array(Float32)')
) AS distance
FROM {{source}}
ORDER BY distance ASC
LIMIT 5