How to Integrate Elasticsearch With Mysql?

Better Stack Team
Updated on October 26, 2024

Integrating Elasticsearch with MySQL allows you to index and search data from a relational database in Elasticsearch, enabling powerful full-text search capabilities and analytical queries. There are several ways to integrate Elasticsearch with MySQL, depending on your use case, including syncing data between MySQL and Elasticsearch or querying both systems.

Here’s how you can do it:

1. Using Logstash (Most Common Approach)

Logstash is a data processing pipeline tool in the Elastic Stack that can be used to read data from MySQL and send it to Elasticsearch.

Steps:

Step 1: Install Logstash and Elasticsearch

  • Ensure that you have Elasticsearch and Logstash installed on your system.

Step 2: Install the JDBC Plugin for Logstash

Logstash uses the JDBC input plugin to connect to MySQL.

 
bin/logstash-plugin install logstash-input-jdbc

Step 3: Download MySQL JDBC Driver

  • Download the MySQL JDBC driver (.jar file) from the official MySQL website:
  • Move the .jar file to a directory where Logstash can access it, such as /path/to/logstash/vendor/jar/jdbc/.

Step 4: Create a Logstash Configuration File

Create a configuration file, e.g., mysql-to-es.conf, to define how to read from MySQL and send data to Elasticsearch.

Example Configuration:

 
input {
  jdbc {
    jdbc_connection_string => "jdbc:mysql://localhost:3306/your_database"
    jdbc_user => "your_username"
    jdbc_password => "your_password"
    jdbc_driver_library => "/path/to/mysql-connector-java-x.x.x.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    statement => "SELECT * FROM your_table WHERE id > :sql_last_value"
    use_column_value => true
    tracking_column => "id"
    last_run_metadata_path => "/path/to/logstash_jdbc_last_run"
    schedule => "* * * * *" # Runs every minute
  }
}

filter {
  # Optionally transform data here before sending to Elasticsearch
}

output {
  elasticsearch {
    hosts => ["<http://localhost:9200>"]
    index => "mysql_data_index"
    document_id => "%{id}" # Ensure unique document IDs based on MySQL record IDs
  }
  stdout { codec => json }
}

Key Configuration Parameters:

  • jdbc_connection_string: Connection string to your MySQL database.
  • statement: The SQL query to run against the MySQL table.
  • tracking_column: Helps track the last processed column, usually the primary key (id).
  • schedule: Controls how frequently the query runs (using cron syntax).
  • elasticsearch {}: Defines how and where to send data to Elasticsearch.

Step 5: Run Logstash

Run Logstash with the configuration file:

 
bin/logstash -f /path/to/mysql-to-es.conf

Logstash will fetch data from MySQL and send it to Elasticsearch.

Step 6: Verify Data in Elasticsearch

  • Use Kibana or Dev Tools to verify that the data is indexed in Elasticsearch:
 
GET /mysql_data_index/_search

2. Using Elasticsearch JDBC River (Deprecated)

The JDBC River Plugin was once a popular way to stream MySQL data into Elasticsearch, but it’s no longer officially maintained. If you find older tutorials suggesting this method, it's better to use Logstash or custom code now.


3. Custom Python Script (Using Python Elasticsearch and MySQL Libraries)

You can write a custom Python script to sync data from MySQL to Elasticsearch. This method provides flexibility if you need custom processing logic between the two systems.

Prerequisites:

  • Install necessary Python libraries:

     
    pip install pymysql elasticsearch
    

Example Python Script:

 
import pymysql
from elasticsearch import Elasticsearch, helpers

# MySQL connection
mysql_conn = pymysql.connect(
    host="localhost",
    user="your_username",
    password="your_password",
    db="your_database"
)

# Elasticsearch connection
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

# SQL query to retrieve data from MySQL
query = "SELECT * FROM your_table"

def fetch_data():
    with mysql_conn.cursor() as cursor:
        cursor.execute(query)
        result = cursor.fetchall()
        return result

def index_data_to_es(data):
    actions = [
        {
            "_index": "mysql_data_index",
            "_id": row['id'],  # Use the MySQL ID as the document ID in Elasticsearch
            "_source": row
        }
        for row in data
    ]
    helpers.bulk(es, actions)

# Fetch and index data
data = fetch_data()
index_data_to_es(data)

mysql_conn.close()

This script retrieves data from MySQL and indexes it into Elasticsearch using the Python Elasticsearch and pymysql libraries. You can run this script periodically to keep the data in sync.


4. Using Debezium for Real-Time Data Sync (Advanced)

Debezium is a distributed platform that allows you to capture changes from MySQL and push them to Elasticsearch in real-time using Kafka.

Steps:

  1. Set Up Kafka and Debezium: Install Kafka and Debezium connectors.
  2. Configure Debezium to Capture MySQL Changes: Set up a MySQL connector with Debezium to listen for changes (inserts, updates, deletes).
  3. Stream Changes to Elasticsearch: Use Kafka Connect with the Elasticsearch sink connector to stream changes from MySQL to Elasticsearch.

This method is more suitable for use cases where you need real-time syncing.


5. Using Elasticsearch SQL Plugin (Querying MySQL-Like Data in Elasticsearch)

Elasticsearch has an SQL plugin that allows querying Elasticsearch using SQL-like syntax, but it does not directly integrate with MySQL. However, you can push data from MySQL into Elasticsearch (using Logstash or another method), and then run SQL queries on Elasticsearch to retrieve the data.

Example SQL Query in Kibana:

 
SELECT * FROM mysql_data_index WHERE age > 30 ORDER BY id LIMIT 10

This can be useful if you want to replicate SQL-like querying but in Elasticsearch.


Conclusion:

  • Logstash is the most straightforward and commonly used tool for integrating MySQL with Elasticsearch. It allows you to easily pull data from MySQL and index it in Elasticsearch.
  • Custom scripts (e.g., using Python) provide flexibility for more complex use cases or custom transformations.
  • For real-time syncing, tools like Debezium offer a more advanced solution.
  • Elasticsearch’s SQL plugin allows SQL-like querying on Elasticsearch data but does not directly interact with MySQL.

Choose the method based on your use case: periodic syncing (Logstash), custom data processing (Python), or real-time streaming (Debezium).

Got an article suggestion? Let us know
Explore more
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github