# Using built-in embeddings

Better Stack Warehouse allows you to generate text embeddings without calling any third-party API.

Go to **Warehouse** -> [Sources](https://warehouse.betterstack.com/team/0/sources ";_blank") -> Your sources -> **Embeddings** and choose what text field to generate embeddings from and what JSON attribute we should store it in.

![Generating embeddings in Better Stack Warehouse](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/1998a799-dafa-4440-69aa-e5dd97ca4f00/md2x =3354x2080)

After setting this up, **we'll calculate the embedding for all new events** you send into the source 🚀

[note]
#### Can't see your events immediately?

Embeddings can take some time to process, especially for large sets of data.

For example, if you send us 10 GB worth of events with each one being 1000 tokens, that could take us quite a long time to fully process the embeddings. Your data will appear as soon as the embeddings are calculated.
[/note]

## Storing embeddings as a time series

Go to **Warehouse** -> [Sources](https://warehouse.betterstack.com/team/0/sources ";_blank") -> Your source -> **Time series on NVMe SSDs** and click **+ Time series**.

Use **JSON dot notation** to write the name of the **target column from the previous step** and choose `BFloat16`.

You'll need to **create a time series with the data you'd want to find** based on the embedding. Easiest way would be to also create a non-aggregate time series for the text field. To minimize the data volume stored in NVMe, create a time series containing an ID, and use it to retrieve the full JSON event afterwards.

![Creating a time series for your embeddings](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/02e26ada-d34f-405e-3a51-3758f6afec00/public =2818x2248)

## Indexing vector columns

If you're working with hundreds of millions of events, your queries might benefit from creating vector indices for your embeddings. Make sure to select the **same Index dimension as in your embedding**.

![Creating a vector index for embeddings](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/aa0bb9ce-38fd-4950-ef74-a04c702f7800/lg2x =2818x2448)

## Querying embeddings in Warehouse

Better Stack Warehouse stores embeddings as vector columns that can be indexed and queried efficiently using ClickHouse’s vector type and similarity functions such as `distance()` or `cosineSimilarity()`.

[code-tabs]
```sql
[label Querying in time series]
SELECT
  text_id, -- fetch raw texts from JSON events to minimize data on NVM
  cosineDistance(
    meta_embeddings_text,
    embedding({{description}})
  ) AS distance
FROM {{source}}
ORDER BY distance ASC
LIMIT 5
```
```sql
[label Querying in events]
SELECT
  JSONExtractString(raw, 'text'),
  cosineDistance(
    JSONExtract(raw, 'meta', 'embeddings', 'text', 'Array(BFloat16)'),
    embedding({{description}})
  ) AS distance
FROM {{source}}
-- to improve performance, use WHERE filtering
WHERE dt BETWEEN {{start_time}} AND {{end_time}}
ORDER BY distance ASC
LIMIT 5
```
[/code-tabs]

This finds the events most semantically similar to a given description.

[note]
#### Curious about the `embedding()` function?

This is not a real function in the underlying ClickHouse database, and can only be used for string values such as hardcoded string or a Query variable.

If you try to use it dynamically on your data, e.g. via `SELECT embedding(text)`, you would get an error about a missing function:

`Code: 46. DB::Exception: Unknown function embedding: While...`
[/note]
