Import Csv Into Elasticsearch

Better Stack Team
Updated on November 18, 2024

Importing CSV files into Elasticsearch can be accomplished through several methods, including using Logstash, the Elasticsearch Bulk API, or tools like Kibana. Below, I’ll detail how to do this using Logstash, which is one of the most common and effective approaches.

Method 1: Using Logstash

Logstash can read CSV files and index the data directly into Elasticsearch. Here’s how to do it step-by-step.

Step 1: Install Logstash

If you haven't already installed Logstash, follow the installation instructions for your platform from the official Elastic documentation.

Step 2: Prepare Your CSV File

Ensure your CSV file is well-structured. For example, consider a CSV file named data.csv with the following content:

 
id,name,age
1,John Doe,30
2,Jane Smith,25
3,Bob Johnson,45

Step 3: Create a Logstash Configuration File

Create a configuration file (e.g., csv_to_es.conf) for Logstash with the following contents:

 
input {
  file {
    path => "/path/to/your/data.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"  # Avoid storing the position; useful for testing
    codec => csv {
      separator => ","
      autogenerate_column_names => true  # Use this if your CSV has no header
    }
  }
}

filter {
  # Add any transformation or filtering logic here if needed
}

output {
  elasticsearch {
    hosts => ["<http://localhost:9200>"]  # Update to your Elasticsearch host
    index => "your_index_name"           # Specify your index name
    document_id => "%{id}"                # Use the 'id' field for document IDs
  }
}

Step 4: Run Logstash

Run Logstash with the configuration file you created:

 
bin/logstash -f /path/to/your/csv_to_es.conf

Step 5: Verify the Data in Elasticsearch

After running Logstash, check if the data has been successfully indexed in Elasticsearch. You can do this using Kibana or by querying Elasticsearch directly:

 
curl -X GET "localhost:9200/your_index_name/_search?pretty"

Method 2: Using Elasticsearch Bulk API

If you prefer a more programmatic approach, you can use the Elasticsearch Bulk API. Here’s how:

  1. Convert CSV to JSON: Convert your CSV file into a JSON format compatible with Elasticsearch.
  2. Use the Bulk API: Upload the JSON data to Elasticsearch using curl or any HTTP client.

Example JSON structure for bulk insert:

 
{ "index" : { "_index" : "your_index_name", "_id" : "1" } }
{ "name" : "John Doe", "age" : 30 }
{ "index" : { "_index" : "your_index_name", "_id" : "2" } }
{ "name" : "Jane Smith", "age" : 25 }

You can upload this JSON file using the following curl command:

 
curl -X POST "localhost:9200/_bulk" -H "Content-Type: application/json" --data-binary "@path_to_your_json_file.json"

Conclusion

Importing CSV files into Elasticsearch can be seamlessly done using Logstash or the Bulk API. Logstash is particularly useful for transforming and enriching data during the import process, while the Bulk API provides a direct method for those comfortable with scripting and automation. Choose the method that best fits your needs, and ensure your data is properly structured for efficient indexing.

Got an article suggestion? Let us know
Explore more
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github