Vector embedding data structures and indices

The shorter your vectors are, the lower the their dimensions, the faster your queries will be. We recommend using Array(BFloat16) with our built-in model embeddinggemma:300m unless you know what you're doing.

Indexing vector columns

If you're working with hundreds of millions of events your queries might benefit from creating vector indices for your embeddings.

CleanShot 2025-10-17 at 10 .04.09.png

Querying embeddings with exact vector distance

Better Stack Warehouse stores embeddings as vector columns that can be indexed and queried efficiently using ClickHouse’s vector type and similarity functions such as distance() or cosineSimilarity().

Querying embeddings in Warehouse
SELECT
  description,
  cosineDistance(embedding, embedding({{description}})) AS similarity
FROM {{source}}
ORDER BY similarity ASC
LIMIT 5;

This finds the events most semantically similar to a given description.