# Bringing your own embeddings

You can use any third-party API to generate embeddings for your events prior to sending them to Better Stack data warehouse.

Better Stack's [built-in model](https://betterstack.com/docs/warehouse/vector-embeddings/built-in-embeddings/) `embeddinggemma:300m` should cover most use cases but if you want to use an external provider, we recommend [OpenAI's embedding models](https://platform.openai.com/docs/guides/embeddings) for the best price/performance ratio.

Send us the embedding alongside the original text in a dedicated attribute as an array of floats.

## Storing embeddings as a time series

Go to **Warehouse** -> [Sources](https://warehouse.betterstack.com/team/0/sources ";_blank") -> Your source -> **Time series on NVMe SSDs** and click **+ Time series**. Use JSON dot notation to write the name of the target column from the previous step and choose `BFloat16` and choose your Vector index.

![Creating a time series for your embeddings](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/02e26ada-d34f-405e-3a51-3758f6afec00/public =2818x2248)

## Querying embeddings in Warehouse

Better Stack Warehouse stores embeddings as vector columns that can be indexed and queried efficiently using ClickHouse’s vector type and similarity functions such as `distance()` or `cosineSimilarity()`.

To find the closest match, generate embedding using the same model and use a Text query variable to send us the embedding:

[code-tabs]
```sql
[label Querying in time series]
SELECT
  text_id, -- fetch raw texts from JSON events to minimize data on NVM
  cosineDistance(
    meta_embeddings_text,
    JSONExtract({{embedding}}, 'Array(Float32)')
  ) AS distance
FROM {{source}}
ORDER BY distance ASC
LIMIT 5
```
[/code-tabs]
