How to Collect, Process, and Ship Log Data with Logstash

Stanley Ulili

Updated on November 23, 2023

Prerequisites
Developing a demo logging application
Installing Logstash
How Logstash works
Getting started with Logstash
Transforming logs with Logstash
Collecting logs from Docker containers and centralizing logs
Monitoring Logstash health with Better Stack
Final thoughts

Logs are invaluable assets, originating from various sources such as applications, containers, databases, and operating systems. When analyzed, they offer crucial insights, especially in diagnosing issues. For their effectiveness, it's essential to centralize them, allowing for in-depth analysis and pattern recognition all in one place. This centralization process involves using a log shipper, a tool designed to gather logs from diverse sources, process them, and then forward them to different destinations.

One powerful log shipper is Logstash, a free and open-source tool created by Elastic and an integral part of the Elastic Stack, formerly known as the ELK stack. With a robust framework and over 200 plugins, Logstash offers unparalleled flexibility. These plugins enable Logstash to support various sources and perform complex manipulations, ensuring they are well-prepared before reaching their final destination.

In this comprehensive guide, you'll use Logstash to collect logs from various sources, process and forward them to multiple destinations. First, you'll use Logstash to collect logs from a file and send them to the console. Building upon that, you will use Logstash to gather logs from multiple Docker containers and centralize the logs. Finally, you'll monitor the health of a Logstash instance to ensure it performs optimally and reliably.

Prerequisites

Before you begin, ensure you have access to a system with a non-root user account with sudo privileges. For certain parts of this guide that involve collecting logs from Docker containers, you'll need to have Docker and Docker Compose installed. If you're unfamiliar with log shippers, you can gain insights into their advantages by checking out this article.

With the prerequisites in order, create a root project directory named log-processing-stack. This directory will serve as the core container for your application and its configurations:

Copied!

mkdir log-processing-stack

Next, navigate into the newly created directory:

Copied!

cd log-processing-stack

Within the log-processing-stack directory, create a subdirectory named logify for the demo application:

Copied!

mkdir logify

Move into the logify subdirectory:

Copied!

cd logify

Now that the necessary directories are in place, you're ready to create the application.

The fastest log
search on the planet

Better Stack lets you see inside any stack, debug any issue, and resolve any incident.

Developing a demo logging application

In this section, you will build a sample logging application using Bash that generates logs at regular intervals and appends them to a file.

Within the logify directory, create a logify.sh file using your preferred text editor. For this tutorial, we will use nano:

Copied!

nano logify.sh

In your logify.sh file, add the following code to start generating logs using Bash:

log-processing-stack/logify/logify.sh

Copied!

#!/bin/bash
filepath="/var/log/logify/app.log"

create_log_entry() {
    local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
    local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
    local http_status_code=200
    local ip_address="127.0.0.1"
    local emailAddress="user@mail.com"
    local level=30
    local pid=$$
    local ssn="407-01-2433"
    local time=$(date +%s)
    local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "emailAddress": "'$emailAddress'", "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "timestamp": '$time'}'
    echo "$log"
}

while true; do
    log_record=$(create_log_entry)
    echo "${log_record}" >> "${filepath}"
    sleep 3
done

The create_log_entry() function generates log entries in JSON format, containing essential details such as HTTP status codes, severity levels, and random log messages. We deliberately include sensitive fields like the IP address, Social Security Number (SSN), and email address. The reason is to showcase Logstash's ability to remove or redact sensitive data. For comprehensive guidelines on best practices for handling sensitive data in logs, consult our guide.

In the continuous loop, the create_log_entry() function is invoked every 3 seconds, generating a new log record. These records are then appended to a designated file in the /var/log/logify/ directory.

After you finish writing the code, save your file. To grant the script execution permission, use the following command:

Copied!

chmod +x logify.sh

Next, create the /var/log/logify directory, which will serve as the destination for your application logs:

Copied!

sudo mkdir /var/log/logify

After creating the directory, change its ownership to the currently logged-in user specified in the $USER environment variable:

Copied!

sudo chown -R $USER:$USER /var/log/logify/

Next, run the Bash script in the background to start generating the logs:

Copied!

./logify.sh &

The program will continuously append logs to app.log. To view recent log entries, use the tail command:

Copied!

tail -n 4 /var/log/logify/app.log

This command displays the last four lines of app.log, allowing you to monitor the real-time log records that your application is generating.

When you run the command, you will see logs looking like this:

Output

{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 17089, "ssn": "407-01-2433", "timestamp": 1696150204}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 17089, "ssn": "407-01-2433", "timestamp": 1696150207}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 17089, "ssn": "407-01-2433", "timestamp": 1696150210}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 17089, "ssn": "407-01-2433", "timestamp": 1696150213}

In the output, the logs are structured in the JSON format, containing various fields.

With the program actively generating logs, your next step is installing Logstash on your system to process and analyze this data.

Installing Logstash

In this section, you'll install the latest version of Logstash, which is 8.10 at the time of writing, on an Ubuntu 22.04 system. For other systems, visit the official documentation page for instructions.

Logstash is not available in Ubuntu's default package repositories. You'll need to add the Logstash package source list to install it via apt.

First, import the Logstash public GPG key to apt:

Copied!

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg

Next, install the apt-transport-https package:

Copied!

sudo apt-get install apt-transport-https

Then, add the Logstash source list to the sources.list.d directory:

Copied!

echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list

To ensure apt can read the newly added source, update the package list:

Copied!

sudo apt-get update

Now, install Logstash:

Copied!

sudo apt-get install logstash

Logstash has been successfully installed on your system. You're now ready to process logs with Logstash.

How Logstash works

Before diving into Logstash, it's essential to grasp the fundamental concepts:

Diagram illustrating Logstash data processing pipeline

Logstash is easier to understand when you imagine it as a pipeline. At one end of this pipeline are the inputs, representing the data sources. As log records traverse through the Logstash pipeline, they can be enriched, filtered, or manipulated according to your requirements. Ultimately, when they reach the pipeline's end, Logstash can deliver these logs to configured destinations for storage or analysis.

To create this data processing pipeline, you can configure Logstash using a configuration file.

A typical Logstash configuration file is structured as follows:

Copied!

input {
      plugin_name{...}
 }

filter {
     plugin_name{...}
}

output {
     plugin_name{...}
}

Let's explore the roles of these components:

input: represents the sources of logs, such as files or HTTP endpoints.
filter(optional): unify and transform log records.
output: the destination for forwarding the processed logs.

For these inputs, filters, and outputs to fulfill their roles, they rely on plugins. These plugins are the building blocks that empower Logstash, allowing it to achieve a wide array of tasks. Let's explore these plugins to provide you with a clearer understanding of Logstash's capabilities.

Logstash input plugins

For the inputs, Logstash provides input plugins that can collect logs from various sources, such as:

HTTP: receives log records over HTTP endpoints.
Beats: collect logs from the Beats framework.
Redis: gather log records from a Redis instance.
Unix: read log records via a Unix socket.

Logstash filter plugins

When you want to manipulate, enrich, or modify logs, some of the filter plugins here can help you do that:

JSON: parses JSON logs.
Grok: parsing log data and structuring it.
I18n: removes special characters from your log records.
Geoip: adds geographical information.

Logstash output plugins

After processing data, the following output plugins can be useful:

WebSocket: forward the logs to a WebSocket endpoint.
S3: send log records to Amazon Simple Storage Service (Amazon S3).
Syslog: forward logs to a Syslog server.
Elasticsearch: deliver log entries to Elasticsearch, which is part of the Elastic stack.

Getting started with Logstash

Now that you understand how Logstash operates, you will use it to read log records from a file and display them in the console.

To set up the Logstash pipeline, create a configuration file in the /etc/logstash/conf.d directory:

Copied!

sudo nano /etc/logstash/conf.d/logstash.conf

In your logstash.conf file, add the following lines to instruct Logstash to read logs from a file and forward them to the console:

/etc/logstash/conf.d/logstash.conf

Copied!

input {
  file {
    path => "/var/log/logify/app.log"
    start_position => "beginning"
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

The input component uses the file plugin to read logs from a file. The path parameter specifies the location of the file to be read, and the start_position instructs Logstash to begin reading files from the beginning.

The output component uses the stdout plugin to display logs in the console. The rubydebug codec is used for pretty printing.

After adding the code, save the file. To ensure your configuration file has no errors, run the following command:

Copied!

sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t

Output

...
Configuration OK
[2023-10-01T08:53:31,628][INFO ][logstash.runner          ] Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash

If the output includes "Configuration OK," your configuration file is error-free.

Next, change the ownership of the /usr/share/logstash/data directory to the logstash user:

Copied!

chown -R logstash:logstash /usr/share/logstash/data

Now, start Logstash by passing the path to the configuration file:

Copied!

sudo -u logstash /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

When Logstash starts running, you will see output similar to this:

Output

Using bundled JDK: /usr/share/logstash/jdk
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2023-10-01 08:58:47.142 [main] runner - Starting Logstash {"logstash.version"=>"8.10.2", "jruby.version"=>"jruby 9.4.2.0 (3.1.0) 2023-03-08 90d2913fda OpenJDK 64-Bit Server VM 17.0.8+7 on 17.0.8+7 +indy +jit [x86_64-linux]"}

...
[INFO ] 2023-10-01 08:58:49.053 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2023-10-01 08:58:49.059 [[main]<file] observingtail - START, creating Discoverer, Watch with file and sincedb collections
[INFO ] 2023-10-01 08:58:49.064 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}

Once Logstash starts running, the log events will be formatted and displayed neatly in the console:

Output

{
       "message" => "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 17089, \"ssn\": \"407-01-2433\", \"timestamp\": 1696150640}",
         "event" => {
        "original" => "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 17089, \"ssn\": \"407-01-2433\", \"timestamp\": 1696150640}"
    },
          "host" => {
        "name" => "logstash-host"
    },
           "log" => {
        "file" => {
            "path" => "/var/log/logify/app.log"
        }
    },
    "@timestamp" => 2023-10-01T08:58:49.104572151Z,
      "@version" => "1"
}
...

In the output, Logstash has added additional fields, such as host, file, and version, to add more context.

Now that you can observe the formatted logs in the console, you can exit Logstash by pressing CTRL + C.

In the upcoming section, you will transform these logs before forwarding them to the desired output destination.

Transforming logs with Logstash

In this section, you will enrich, modify fields, and mask sensitive information in your logs to ensure privacy and enhance the usefulness of the log data.

Logstash uses various filter plugins to manipulate log records. Using these plugins, you can perform essential operations such as:

Parsing JSON logs.
Removing unwanted fields.
Adding new fields.
Maskng sensitive data.

Parsing JSON logs with Logstash

Since your application produces logs in JSON format, it is crucial to parse them. Parsing JSON logs is essential because it allows you to retain the benefits of the structured JSON format.

To understand the importance of parsing data, consider the log event output from the previous section:

Output

{
       "message" => "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 17089, \"ssn\": \"407-01-2433\", \"timestamp\": 1696150640}",
         "event" => {
        "original" => "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 17089, \"ssn\": \"407-01-2433\", \"timestamp\": 1696150640}"
    },
    ...
}

Upon inspecting the log event, you will notice that the log message is in the string format, and some special characters are escaped with backslashes. To ensure that Logstash can parse these logs as valid JSON, you need to configure a filter in the Logstash configuration file.

Open the Logstash configuration file for editing:

Copied!

sudo nano /etc/logstash/conf.d/logstash.conf

Add the following code to the configuration file to parse JSON in the message field:

/etc/logstash/conf.d/logstash.conf

Copied!

input {
  file {
    path => "/var/log/logify/app.log"
    start_position => "beginning"
  }
}
filter {
  if [message] =~ /^{.*}$/ {
    json {
      source => "message"
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

In the filter component, a conditional check if the message field contains a JSON object using a regex pattern. If the condition is met, the json plugin parses the message field as valid JSON and adds the parsed fields to the log event.

To verify if the message field is being parsed as JSON, save your file and restart Logstash:

Copied!

sudo -u logstash /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

Output

{
             "msg" => "Operation finished",
             "ssn" => "407-01-2433",
         "message" => "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Operation finished\", \"pid\": 17089, \"ssn\": \"407-01-2433\", \"timestamp\": 1696151020}",
    "emailAddress" => "user@mail.com",
      "@timestamp" => 2023-10-01T09:03:40.987814908Z,
        "@version" => "1",
             "log" => {
        "file" => {
            "path" => "/var/log/logify/app.log"
        }
    },
}

The logs have been successfully parsed as valid JSON and added to the key=value log event. You can stop Logstash now.

While your application produces logs in JSON format, it's important to note that logs can come in various formats. Logstash provides several filter plugins that enable parsing of different log events, such as:

bytes: parses string representations of computer storage sizes.
csv: parses log records in CSV format.
kv: parses entries in the key=value syntax.

These filter plugins are invaluable for parsing logs of diverse formats. In the next section, you will learn how to modify the log entries further to suit your requirements.

Adding and removing fields with Logstash

In this section, you will remove the emailAddress field, which is considered sensitive information, and eliminate some redundant fields. Additionally, you will add a new field to the log event.

To modify the log event, open the Logstash configuration file:

Copied!

sudo nano /etc/logstash/conf.d/logstash.conf

Modify the filter section as follows:

/etc/logstash/conf.d/logstash.conf

Copied!

...
filter {
  if [message] =~ /^{.*}$/ {
    json {
      source => "message"
    }
  }
  mutate {
    remove_field => ["event", "message", "emailAddress"]
    add_field => { "env" => "development" }
  }
}
...

The mutate plugin manipulates log entries. The remove_field option accepts a list of fields to remove. Since the JSON parsing added all the fields to the log event, you no longer need the event and message fields. So you remove them.

To ensure data privacy, you also remove the emailAddress field. The add_field option adds a new field called env with the value "development".

Save the file and restart Logstash:

Copied!

sudo -u logstash /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

The resulting log event will look like this:

Output

{
           "log" => {
        "file" => {
            "path" => "/var/log/logify/app.log"
        }
    },
      "@version" => "1",
     "timestamp" => 1696151284,
    "@timestamp" => 2023-10-01T09:08:05.248324999Z,
           "env" => "development",
           "msg" => "Connected to database",
            "ip" => "127.0.0.1",
           "ssn" => "407-01-2433",
         "level" => 30,
          "host" => {
        "name" => "logstash-host"
    },
        "status" => 200,
           "pid" => 17089
}
...

Removing the event, message, and emailAddress fields reduces the noise in the log event. The logs now contain only the essential information.

In the next section, you will add fields to the log event based on conditional statements.

Working with conditional statements in Logstash

In this section, you will write a conditional statement that checks if the status field equals 200. If true, Logstash will add an is_successful field with the value true; otherwise, it will be set to false.

Open the Logstash configuration file:

Copied!

sudo nano /etc/logstash/conf.d/logstash.conf

To create a conditional statement, add the following code:

/etc/logstash/conf.d/logstash.conf

Copied!

...
filter {
  if [message] =~ /^{.*}$/ {
    json {
      source => "message"
    }
  }

  mutate {
    remove_field => ["event", "message", "emailAddress"]
    add_field => { "env" => "development" }
  }
  # Add the 'is_successful' field based on the 'status' field
  if [status] == 200 {
    mutate {
      add_field => { "is_successful" => "true" }
    }
  } else {
    mutate {
      add_field => { "is_successful" => "false" }
    }
  }
}
...

In the provided code, a conditional statement is implemented to check if the status field equals the value 200. If true, the is_successful field is set to true; otherwise, it is set to false.

Save and exit the configuration file. Restart Logstash with the updated configuration:

Copied!

sudo -u logstash /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

The resulting log event will include the is_successful field, indicating whether the operation was successful:

Output

{
         "@version" => "1",
               "ip" => "127.0.0.1",
              "ssn" => "407-01-2433",
              "msg" => "Task completed successfully",
              "pid" => 17089,
            "level" => 30,
       "@timestamp" => 2023-10-01T09:57:54.025875098Z,
              "log" => {
        "file" => {
            "path" => "/var/log/logify/app.log"
        }
    },
    "is_successful" => "true",
             "host" => {
        "name" => "logstash-host"
    },
           "status" => 200,
        "timestamp" => 1696154270,
              "env" => "development"
}
...

The is_successful field indicates the operation's success in the log event. The field would be set to false if the status code differed.

Redacting sensitive data with Logstash

In the previous section, you removed the emailAddress field from the log event. However, sensitive fields such as the IP address and Social Security Number(SSN) remain. To protect personal information, especially when complete removal isn't possible due to its integration within necessary strings, it's crucial to mask such data.

To redact the IP address and SSN, open the Logstash configuration file:

Copied!

sudo nano /etc/logstash/conf.d/logstash.conf

In the configuration file, add the code below to mask sensitive portions:

/etc/logstash/conf.d/logstash.conf

Copied!


...
input {
  file {
    path => "/var/log/logify/app.log"
    start_position => "beginning"
  }
}

filter {
  # Redact IP addresses
  mutate {
    gsub => [ "message", "(\d{3}-\d{2}-\d{4})", "REDACTED" ]
    gsub => [ "message", "\b(?:\d{1,3}\.){3}\d{1,3}\b", "REDACTED" ]
  }
  # Parse JSON if the message field matches the JSON pattern
  if [message] =~ /^{.*}$/ {
    json {
      source => "message"
    }
  }
  mutate {
    remove_field => ["event", "message", "emailAddress"]
    add_field => { "env" => "development" }

  }

 # Add the 'is_successful' field based on the 'status' field
  if [status] == 200 {
    mutate {
      add_field => { "is_successful" => "true" }
    }
  } else {
    mutate {
      add_field => { "is_successful" => "false" }
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}
...

In this code snippet, the mutate plugin is used with the gsub() method. It takes the message field, applies regular expressions to find sensitive portions, and replaces them with 'REDACTED' text. The first gsub regex replaces SSNs, and the second replaces IP addresses.

Save and exit the configuration file. Restart Logstash with the updated configuration:

Copied!

sudo -u logstash /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

Output

{
       "@timestamp" => 2023-10-01T09:59:45.295997458Z,
              "msg" => "Initialized application",
        "timestamp" => 1696154385,
            "level" => 30,
         "@version" => "1",
              "pid" => 17089,
              "log" => {
        "file" => {
            "path" => "/var/log/logify/app.log"
        }
    },
    "is_successful" => "true",
              "ssn" => "REDACTED",
               "ip" => "REDACTED",
             "host" => {
        "name" => "logstash-host"
    },
           "status" => 200,
              "env" => "development"
}
...

You will notice that sensitive information such as SSN and IP addresses have been successfully redacted from the log events.

Masking sensitive data is crucial, especially when dealing with fields like this:

Output

{..., "privateInfo": "This is a sample message with SSN: 123-45-6789 and IP: 192.168.0.1"}

The masking ensures that only sensitive portions are redacted, preserving the integrity of the rest of the message:

Output

{...,privateInfo": "This is a sample message with SSN: REDACTED and IP: REDACTED"}

Now that you can mask data, you can stop Logstash, and the logify.sh script. To stop the bash program, obtain the process ID:

Copied!

jobs -l | grep "logify"

Output

[1]+ 23750 Running                 ./logify.sh &

Substitute the process ID on the kill command to terminate the process:

Copied!

kill -9 <23750>

With this change, you can move on to collecting logs from Docker containers.

Collecting logs from Docker containers and centralizing logs

In this section, you will containerize the Bash program and leverage the Nginx hello world Docker image, preconfigured to produce JSON Nginx logs every time it receives a request. Logstash will collect logs from the Bash program and the Nginx containers and forward them to Better Stack for centralization.

Dockerizing the Bash script

First, you will containerize the Bash program responsible for generating log data. Containerization offers several benefits, including encapsulating the script and its dependencies and ensuring portability across various environments.

Make sure you are in the log-processing-stack/logify directory. Then, create a Dockerfile:

Copied!

nano Dockerfile

Inside your Dockerfile, include the following instructions for creating a Docker image for your Bash script:

log-processing-stack/logify/Dockerfile

Copied!

FROM ubuntu:latest

COPY . .

RUN chmod +x logify.sh

RUN mkdir -p /var/log/logify

RUN ln -sf /dev/stdout /var/log/logify/app.log

CMD ["./logify.sh"]

In this Dockerfile, you begin with the latest Ubuntu image as the base. You then copy the program file, change permissions to make it executable, create a directory to store log files, and redirect all data written to /var/log/logify/app.log to the standard output. This redirection lets you view the container logs using the docker logs command. Finally, you specify the command to run when the Docker container starts.

Save and exit the file. Change back to the parent project directory:

Copied!

cd ..

In your editor, create a docker-compose.yml file:

Copied!

nano docker-compose.yml

Add the following code to define the Bash program and Nginx services:

log-processing-stack/docker-compose.yml

Copied!

version: '3'
services:
  logify-script:
    build:
      context: ./logify
    container_name: logify
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    container_name: nginx
    ports:
      - '80:80'

In this configuration file, you define two services: logify-script and nginx. The logify-script service is built using the ./logify directory context. The nginx service uses a pre-built Nginx image. You then map port 80 on the host to port 80 within the container. Ensure no other services are running on port 80 on the host to avoid port conflicts.

After defining the services, build the Docker images and create the containers:

Copied!

docker compose up -d

The -d option starts the services in the background.

Check the container status to verify that they are running:

Copied!

docker compose ps

You will see "running" status under the "STATUS" column for the two containers:

Output

NAME                COMMAND              SERVICE             STATUS              PORTS
logify "./logify.sh" logify-script       running
nginx "/runner.sh nginx" nginx               running             0.0.0.0:80->80/tcp, :::80->80/tcp

Now that the containers are running, send five HTTP requests to the Nginx service using curl:

Copied!

curl http://localhost:80/?[1-5]

View all the logs generated by the running containers with:

Copied!

docker compose logs

You will see logs similar to the following output, representing the data generated by both the Nginx service and the Bash program:

Output

nginx  | {"timestamp": "2023-10-01T10:07:13+00:00", "pid": "7", "remote_addr": "172.18.0.1", "remote_user":"  ", "request": "GET /?1 HTTP/1.1", "status": "200", "body_bytes_sent": "11109", "request_time": "0.000", "http_referrer":"  ", "http_user_agent": "curl/7.81.0", "time_taken_ms": "1696154833.915"}
...
logify  | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 1, "ssn": "407-01-2433", "timestamp": 1696154843}

This output displays all the logs generated by both services.

With the Bash program and Nginx service containers running and generating data, you can now move on to collecting these logs with Logstash.

Defining the Logstash service with Docker Compose

In this section, you will define a Logstash service in the Docker Compose setup to gather logs from the existing containers and deliver them to Better Stack. The process involves creating a Logstash configuration file and deploying the Logstash service.

Open the docker-compose.yml file again:

Copied!

nano docker-compose.yml

Update the file with the following code to define the Logstash service:

log-processing-stack/docker-compose.yml

Copied!

version: '3'
services:
  logify-script:
    build:
      context: ./logify
    container_name: logify
    logging:
      driver: gelf
      options:
        gelf-address: "udp://127.0.0.1:5000"
        tag: docker.logify
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    container_name: nginx
    ports:
      - '80:80'
    logging:
      driver: gelf
      options:
        gelf-address: "udp://127.0.0.1:5000"
        tag: docker.nginx

  logstash:
    image: docker.elastic.co/logstash/logstash:8.10.2
    container_name: logstash
    volumes:
      - ./logstash/config/logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    ports:
      - "5000:5000/udp"

The logify-script and nginx services are configured to use the gelf logging driver to send logs over the network to 127.0.0.1:5000 using the UDP protocol. Logstash will run a gelf input on port 5000 to receive log events. You then add the docker.logify and docker.nginx tags to distinguish the log events originating from different services.

Additionally, you define a logstash service using an official Logstash image. It incorporates a volume mapping of Logstash's configuration file logstash.conf, which you will create shortly. The service is configured to expose port 5000 on the UDP protocol to receive log events from the services.

Next, create the logstash/config directory to store the configuration file:

Copied!

mkdir -p logstash/config

Change into the directory:

Copied!

cd logstash/config

Afterward, create the logstash.conf configuration file:

Copied!

nano logstash.conf

Add the input component using gelf:

log-processing-stack/fluentd/fluent.conf

Copied!

input {
  gelf {
    port => 5000
  }
}

This specifies that Logstash should use gelf to listen to events on port 5000.

Now, you need to set up the destination to send these logs for centralization. You will use Better Stack for log management.

Before defining the output source, create a free Better Stack account. Once you've logged in, navigate to the Sources section:

Screenshot pointing to the **Sources** link in the navigation bar

Once you are on the Sources page, click the Connect source button:

Screenshot pointing to the **Connect source** button

Next, provide your source a name of your choosing and select "Logstash" as the platform:

Screenshot showing the name field set to "Logify logs" and the Platform set to "Logstash"

After creating the source, copy the Source Token value to the clipboard:

Screenshot with an arrow pointing to the "Source Token" value

After copying the source token, go back to the logstash.conf file and add the filter component to match and assign tags:

log-processing-stack/logstash/config/logstash.conf

Copied!

input {
  gelf {
    port => 5000
  }
}
filter {
  if [tag] == "docker.logify" {
    mutate { add_tag => "docker_logify" }
  }
  if [tag] == "docker.nginx" {
    mutate { add_tag => "docker_nginx" }
  }
}

In this configuration, if the tag field equals docker.logify, Logstash adds the docker_logify tag. Similarly, if the tag field equals docker.nginx, Logstash adds the docker_nginx tag. The name of the tag you choose doesn't matter; just ensure it is consistent.

Next, add the output component to forward the logs with the docker_logify tag to Better Stack:

log-processing-stack/logstash/config/logstash.conf

Copied!

...

output {
   if "docker_logify" in [tags] {
    http {
      url => "https://in.logs.betterstack.com/"
      http_method => "post"
      headers => {
        "Authorization" => "Bearer <your_logify_source_token>"
      }
      format => "json"
    }
  }
}

Save and exit the configuration file.

Return to the project root directory:

Copied!

cd ../..

Start the newly created Logstash service:

Copied!

docker compose up -d

After a few seconds, visit Better Stack to confirm that Logstash is forwarding the logs:

Screenshot displaying the log entries in Better Stack

The Bash program logs will be forwarded to Better Stack.

To forward Nginx logs, create a second source by following the steps we have covered for creating a source.

After creating the sources, the Better Stack interface will look like this:

Screenshot of Better Stack with two sources: Logify, and Nginx

Now, add the following output to deliver Nginx logs to Better Stack, ensuring you update the source token accordingly:

log-processing-stack/logstash/config/logstash.conf

Copied!

...
output {
  if "docker_logify" in [tags] {
    http {
      url => "https://in.logs.betterstack.com/"
      http_method => "post"
      headers => {
        "Authorization" => "Bearer <your_logify_source_token>"
      }
      format => "json"
    }
  }
  if "docker_nginx" in [tags] {
    http {
      url => "https://in.logs.betterstack.com/"
      http_method => "post"
      headers => {
        "Authorization" => "Bearer <your_nginx_source_token>"
      }
      format => "json"
    }
  }
}

If the tag equals docker_nginx, Logstash sends the logs to Better Stack's Nginx source. When you save the file, run the following command:

Copied!

docker compose up -d

Send more requests to the Nginx service:

Copied!

curl http://localhost:80/?[1-5]

The Nginx logs will now be uploaded to Better Stack:

Screenshot of Nginx logs in Better Stack

Monitoring Logstash health with Better Stack

Logstash offers a monitoring API that starts automatically every time you run it. To monitor whether Logstash is up or down, you can add the provided endpoint to Better Stack to periodically check if the endpoint works.

First, update the docker-compose.yml file to expose and map the port for the Logstash monitoring API:

log-processing-stack/docker-compose.yml

Copied!

    container_name: logstash
    volumes:
      - ./logstash/config/logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    ports:
      - "9600:9600"
      - "5000:5000/udp"

When you're finished, stop and discard all the services:

Copied!

docker compose down

Then start all the services:

Copied!

docker compose up -d

Verify that the Logstash endpoint works:

Copied!

curl -XGET 'localhost:9600/?pretty'

Output

{
  "host" : "7102e1b6ba5b",
  "version" : "8.10.2",
  "http_address" : "0.0.0.0:9600",
  "id" : "dc30a5ab-618f-4b23-99b4-f8d64a8fea6c",
  "name" : "7102e1b6ba5b",
  "ephemeral_id" : "fd94098a-7bf3-4ca4-80d1-bbad59d6f574",
  "status" : "green",
  "snapshot" : false,
  "pipeline" : {
    "workers" : 2,
    "batch_size" : 125,
    "batch_delay" : 50
  },
  "build_date" : "2023-09-18T15:58:34+00:00",
  "build_sha" : "cc67511c41a1531b7d563a04fbcf9782ae6f9f98",
  "build_snapshot" : false
}

Now, log in to Better Stack.

On the Monitors page, click the Create monitor button:

Screenshot of the monitors page with an option to create a monitor on Better Stack

Next, enter and check the relevant information and click the Create monitor button:

Screenshot of Better Stack configured with the necessary options

Choose your preferred way to trigger Better Stack and also provide your server's IP address or domain name on port 9600. Finally, select how you prefer to be notified.

Upon completing the configuration, Better Stack will initiate monitoring the Logstash health endpoint and start providing performance statistics:

Screenshot of Better Stack monitoring the health endpoint

To see what happens when Logstash is down, stop all the services with:

Copied!

docker compose stop

When you return to Better Stack, the status will be updated to "Down" after a few moments pass since the endpoint no longer works:

Better Stack monitor indicating that the Logstash endpoint is "down"

If you configured Better Stack to send email alerts, you will receive a notification email similar to this:

email alert from Better Stack in gmail otifying of the endpoint's downtime

With this, you can proactively manage Logstash's health and address any issues promptly.

Final thoughts

In this article, you explored the comprehensive process of using Logstash to collect, process, and forward logs, integrating it seamlessly with Docker, Nginx, and Better Stack for efficient log management. You should feel comfortable incorporating Logstash into your projects.

As a next step, visit the Logstash documentation to explore more features. If you wish to enhance your knowledge of Docker and Docker Compose, consult their respective documentation pages: Docker and Docker Compose. Additionally, for a comprehensive understanding of Docker logging mechanisms, check out this guide.

If you are interested in exploring Logstash alternatives, consider looking into various log shippers.

Thanks for reading, and happy logging!

Article by

Stanley Ulili

Stanley Ulili is a technical educator at Better Stack based in Malawi. He specializes in backend development and has freelanced for platforms like DigitalOcean, LogRocket, and AppSignal. Stanley is passionate about making complex topics accessible to developers.

Got an article suggestion? Let us know

How to Collect, Process, and Ship Log Data with Filebeat

Learn how to use Filebeat to collect, process, and ship log data at scale, and improve your observability and troubleshooting capabilities

→

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us

Writer of the month

Marin Bezhanov

Marin is a software engineer and architect with a broad range of experience working...

Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github

How to Collect, Process, and Ship Log Data with Logstash

Contents

Prerequisites

Developing a demo logging application

Installing Logstash

How Logstash works

Logstash input plugins

Logstash filter plugins

Logstash output plugins

Getting started with Logstash

Transforming logs with Logstash

Parsing JSON logs with Logstash

Adding and removing fields with Logstash

Working with conditional statements in Logstash

Redacting sensitive data with Logstash

Collecting logs from Docker containers and centralizing logs

Dockerizing the Bash script

Defining the Logstash service with Docker Compose

Monitoring Logstash health with Better Stack

Final thoughts

Make your mark

Join the writer's program

Build on top of Better Stack