# How to Integrate Elasticsearch With Mysql?

Integrating **Elasticsearch** with **MySQL** allows you to index and search data from a relational database in Elasticsearch, enabling powerful full-text search capabilities and analytical queries. There are several ways to integrate Elasticsearch with MySQL, depending on your use case, including syncing data between MySQL and Elasticsearch or querying both systems.

Here’s how you can do it:

### 1. **Using Logstash (Most Common Approach)**

**Logstash** is a data processing pipeline tool in the **Elastic Stack** that can be used to read data from MySQL and send it to Elasticsearch.

### Steps:

### Step 1: Install Logstash and Elasticsearch

- Ensure that you have **Elasticsearch** and **Logstash** installed on your system.

### Step 2: Install the JDBC Plugin for Logstash

Logstash uses the **JDBC input plugin** to connect to MySQL.

```bash
bin/logstash-plugin install logstash-input-jdbc
```

### Step 3: Download MySQL JDBC Driver

- Download the **MySQL JDBC driver** (`.jar` file) from the official MySQL website:
    - [MySQL Connector/J Download](https://dev.mysql.com/downloads/connector/j/)
- Move the `.jar` file to a directory where Logstash can access it, such as `/path/to/logstash/vendor/jar/jdbc/`.

### Step 4: Create a Logstash Configuration File

Create a configuration file, e.g., `mysql-to-es.conf`, to define how to read from MySQL and send data to Elasticsearch.

### Example Configuration:

```bash
input {
  jdbc {
    jdbc_connection_string => "jdbc:mysql://localhost:3306/your_database"
    jdbc_user => "your_username"
    jdbc_password => "your_password"
    jdbc_driver_library => "/path/to/mysql-connector-java-x.x.x.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    statement => "SELECT * FROM your_table WHERE id > :sql_last_value"
    use_column_value => true
    tracking_column => "id"
    last_run_metadata_path => "/path/to/logstash_jdbc_last_run"
    schedule => "* * * * *" # Runs every minute
  }
}

filter {
  # Optionally transform data here before sending to Elasticsearch
}

output {
  elasticsearch {
    hosts => ["<http://localhost:9200>"]
    index => "mysql_data_index"
    document_id => "%{id}" # Ensure unique document IDs based on MySQL record IDs
  }
  stdout { codec => json }
}

```

### Key Configuration Parameters:

- `jdbc_connection_string`: Connection string to your MySQL database.
- `statement`: The SQL query to run against the MySQL table.
- `tracking_column`: Helps track the last processed column, usually the primary key (`id`).
- `schedule`: Controls how frequently the query runs (using cron syntax).
- `elasticsearch {}`: Defines how and where to send data to Elasticsearch.

### Step 5: Run Logstash

Run Logstash with the configuration file:

```bash
bin/logstash -f /path/to/mysql-to-es.conf
```

Logstash will fetch data from MySQL and send it to Elasticsearch.

### Step 6: Verify Data in Elasticsearch

- Use **Kibana** or **Dev Tools** to verify that the data is indexed in Elasticsearch:

```json
GET /mysql_data_index/_search
```

---

### 2. **Using Elasticsearch JDBC River (Deprecated)**

The **JDBC River Plugin** was once a popular way to stream MySQL data into Elasticsearch, but it’s no longer officially maintained. If you find older tutorials suggesting this method, it's better to use Logstash or custom code now.

---

### 3. **Custom Python Script (Using Python Elasticsearch and MySQL Libraries)**

You can write a custom Python script to sync data from MySQL to Elasticsearch. This method provides flexibility if you need custom processing logic between the two systems.

### Prerequisites:

- Install necessary Python libraries:
    
    ```bash
    pip install pymysql elasticsearch
    ```
    

### Example Python Script:

```python
import pymysql
from elasticsearch import Elasticsearch, helpers

# MySQL connection
mysql_conn = pymysql.connect(
    host="localhost",
    user="your_username",
    password="your_password",
    db="your_database"
)

# Elasticsearch connection
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

# SQL query to retrieve data from MySQL
query = "SELECT * FROM your_table"

def fetch_data():
    with mysql_conn.cursor() as cursor:
        cursor.execute(query)
        result = cursor.fetchall()
        return result

def index_data_to_es(data):
    actions = [
        {
            "_index": "mysql_data_index",
            "_id": row['id'],  # Use the MySQL ID as the document ID in Elasticsearch
            "_source": row
        }
        for row in data
    ]
    helpers.bulk(es, actions)

# Fetch and index data
data = fetch_data()
index_data_to_es(data)

mysql_conn.close()

```

This script retrieves data from MySQL and indexes it into Elasticsearch using the Python **Elasticsearch** and **pymysql** libraries. You can run this script periodically to keep the data in sync.

---

### 4. **Using Debezium for Real-Time Data Sync (Advanced)**

**Debezium** is a distributed platform that allows you to capture changes from MySQL and push them to Elasticsearch in real-time using Kafka.

### Steps:

1. **Set Up Kafka and Debezium**: Install Kafka and Debezium connectors.
2. **Configure Debezium to Capture MySQL Changes**: Set up a MySQL connector with Debezium to listen for changes (inserts, updates, deletes).
3. **Stream Changes to Elasticsearch**: Use Kafka Connect with the **Elasticsearch sink connector** to stream changes from MySQL to Elasticsearch.

This method is more suitable for use cases where you need real-time syncing.

---

### 5. **Using Elasticsearch SQL Plugin (Querying MySQL-Like Data in Elasticsearch)**

Elasticsearch has an **SQL plugin** that allows querying Elasticsearch using SQL-like syntax, but it does not directly integrate with MySQL. However, you can push data from MySQL into Elasticsearch (using Logstash or another method), and then run SQL queries on Elasticsearch to retrieve the data.

### Example SQL Query in Kibana:

```sql
SELECT * FROM mysql_data_index WHERE age > 30 ORDER BY id LIMIT 10
```

This can be useful if you want to replicate SQL-like querying but in Elasticsearch.

---

### Conclusion:

- **Logstash** is the most straightforward and commonly used tool for integrating MySQL with Elasticsearch. It allows you to easily pull data from MySQL and index it in Elasticsearch.
- **Custom scripts** (e.g., using Python) provide flexibility for more complex use cases or custom transformations.
- For **real-time syncing**, tools like **Debezium** offer a more advanced solution.
- Elasticsearch’s **SQL plugin** allows SQL-like querying on Elasticsearch data but does not directly interact with MySQL.

Choose the method based on your use case: periodic syncing (Logstash), custom data processing (Python), or real-time streaming (Debezium).