Explore documentation
Vector embedding data structures and indices
The shorter your vectors are, the lower the their dimensions, the faster your queries will be. We recommend using Array(BFloat16)
with our built-in model embeddinggemma:300m
unless you know what you're doing.
Indexing vector columns
If you're working with hundreds of millions of events your queries might benefit from creating vector indices for your embeddings.
Querying embeddings with exact vector distance
Better Stack Warehouse stores embeddings as vector columns that can be indexed and queried efficiently using ClickHouse’s vector type and similarity functions such as distance() or cosineSimilarity().
Querying embeddings in Warehouse
SELECT
description,
cosineDistance(embedding, embedding({{description}})) AS similarity
FROM {{source}}
ORDER BY similarity ASC
LIMIT 5;
This finds the events most semantically similar to a given description.