How to Collect, Process, and Ship Log Data with Fluentd

Stanley Ulili

Updated on April 22, 2024

Prerequisites
Developing a demo logging application
Installing Fluentd
How Fluentd works
Getting started with Fluentd
Transforming logs with Fluentd
Collecting logs from Docker containers and centralizing logs
Monitoring Fluentd health with Better Stack
Final thoughts

In today's complex computing environments, operating systems, applications, and databases generate logs crucial for understanding system behavior, diagnosing issues, and ensuring smooth operations. Centralizing these logs simplifies error analysis and troubleshooting. To achieve this centralization, you need a log shipper—a tool designed to collect logs from multiple sources, process and forward them to a centralized location for analysis.

Fluentd is a robust, open-source log shipper developed by Treasure Data. It excels at capturing logs from various sources, unifying them for processing, and forwarding them to multiple destinations for analysis and monitoring. Fluentd distinguishes itself with its lightweight memory footprint, consuming as little as 30-40MB of memory. Its pluggable architecture empowers the community to extend its capabilities through plugins, which currently boasts a library of over 1000 plugins. Additionally, Fluentd implements buffering mechanisms to prevent data loss and can handle substantial data volumes. Currently, Fluentd is being used by over 5000 companies, and the documentation claims that the largest user is collecting logs from more than 50,000 servers.

In this comprehensive guide, you'll use Fluentd to collect, process, and forward logs to various destinations. To begin, you'll create a sample application that generates logs to a file. Next, you'll use Fluentd to read the logs from the file and redirect them to the console. As you progress, you'll transform logs, collect them from containerized environments, and centralize log data. Lastly, you'll monitor Fluentd's health to ensure it operates without issues.

Prerequisites

Before you begin, ensure you have access to a system with a non-root user account with sudo privileges. And if you plan to follow along with later sections that involve Fluentd collecting logs from Docker containers, you should install Docker and Docker Compose on your system. If you're not familiar with log shippers, you can explore their benefits by reading this article.

With these prerequisites in place, create a root project directory using the following command:

Copied!

mkdir log-processing-stack

Navigate to the newly created directory:

Copied!

cd log-processing-stack

Inside this project directory, create a subdirectory for the demo application and move into the directory:

Copied!

mkdir logify && cd logify

Now you're ready to proceed with creating the demo logging application.

The fastest log
search on the planet

Better Stack lets you see inside any stack, debug any issue, and resolve any incident.

Developing a demo logging application

In this section, you'll create a sample logging application with Bash that generates logs at regular intervals.

In the logify directory, create a Bash script file named logify.sh using your preferred text editor:

Copied!

nano logify.sh

In your logify.sh file, add the following contents to create the application:

log-processing-stack/logify/logify.sh

Copied!

#!/bin/bash
filepath="/var/log/logify/app.log"

create_log_entry() {
    local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
    local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
    local http_status_code=200
    local ip_address="127.0.0.1"
    local emailAddress="user@mail.com"
    local level=30
    local pid=$$
    local ssn="407-01-2433"
    local time=$(date +%s)
    local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "emailAddress": "'$emailAddress'", "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "timestamp": '$time'}'
    echo "$log"
}

while true; do
    log_record=$(create_log_entry)
    echo "${log_record}" >> "${filepath}"
    sleep 3
done

The create_log_entry() function creates log entries in JSON format, containing details such as the HTTP status code, IP address, severity level, a random log message, and a timestamp. The sensitive fields like the IP address, Social Security Number(SSN), and email address have been intentionally added to demonstrate Fluentd's capability to filter out sensitive information later. For logging best practices, consult this guide.

Following this, you establish an infinite loop that continuously invokes the create_log_entry() function to generate log entries and append them to an app.log file in the /var/log/logify/ directory.

Once you're finished, save your modifications and make the script executable:

Copied!

chmod +x logify.sh

Afterward, create the /var/log/logify directory that will contain the application logs:

Copied!

sudo mkdir /var/log/logify

Change the ownership of the /var/log/logify directory to the user specified in the $USER environment variable, which represents the currently logged-in user:

Copied!

sudo chown -R $USER:$USER /var/log/logify/

Now, run the script in the background:

Copied!

./logify.sh &

The & puts the running script in the background.

When the program starts, it will display output that looks like this:

Output

[1] 2903

2903 is the process ID, which can be used to terminate the script later.

To view the contents of the app.log file, type the tail command:

Copied!

tail -n 4 /var/log/logify/app.log

Output

{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 3727, "ssn": "407-01-2433", "timestamp": 1695368528}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 3727, "ssn": "407-01-2433", "timestamp": 1695368531}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 13682, "ssn": "407-01-2433", "timestamp": 1695380673}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 13682, "ssn": "407-01-2433", "timestamp": 1695380676}

You have now successfully created a logging application that produces sample log entries.

Installing Fluentd

Now that you can generate logs with the demo app, let's install a recent version of Fluentd. This guide will focus on installing Fluentd on an Ubuntu 22.04 system. If you use a different operating system, consult the official documentation page for installation instructions.

Before installing Fluentd, you need to adjust the number of file descriptors, as recommended in the documentation.

To check the current limit, execute the following command:

Copied!

ulimit -n

Output

If the output displays 1022, you should increase this limit.

Open the /etc/security/limits.conf file:

Copied!

sudo nano /etc/security/limits.conf

To increase the limit, add the following lines at the end of the file:

/etc/security/limits.conf

Copied!

root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536

After making these changes, reboot your system and then verify that the new limit has taken effect:

Copied!

 ulimit -n

Output

You should see an output of 65535, indicating that the limit has been successfully increased.

You are now set to install Fluentd on your system.

Fluentd has two variants:

fluent-package(formerly known as td-agent): this package is maintained by the Fluentd project.
calyptia-fluentd: the maintenance of this package is under calyptia.

The primary difference between the two variants lies in the Ruby versions they are bundled with. fluent-package is bundled with Ruby 2.7 for compatibility reasons. In contrast, calyptia-fluentd is bundled with Ruby 3.

In this guide, you will install the fluent-package. Run the following command to install the Fluentd LTS version:

Copied!

curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-jammy-fluent-package5-lts.sh | sh

Once the installation is complete, check the Fluentd version to confirm it installed successfully:

Copied!

fluentd --version

You should see output similar to the following:

Output

fluent-package 5.0.1 fluentd 1.16.2 (d5685ada81ac89a35a79965f1e94bbe5952a5d3a)

When you install Fluentd, it automatically starts as a systemd service. However, for this tutorial, you'll run Fluentd manually. Running Fluentd manually while another instance runs in the background can lead to conflicts.

To prevent conflicts, stop the background service with the following command:

Copied!

sudo systemctl stop fluentd

Check the status to confirm that the service is now inactive:

Copied!

sudo systemctl status fluentd

You should see the "Active: inactive (dead)" status in the output, indicating that the service has been stopped:

Output

fluentd.service - fluentd: All in one package of Fluentd
     Loaded: loaded (/lib/systemd/system/fluentd.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2023-09-22 06:16:56 UTC; 6s ago
       Docs: https://docs.fluentd.org/
    Process: 11992 ExecStop=/bin/kill -TERM ${MAINPID} (code=exited, status=0/SUCCESS)
   Main PID: 1763 (code=exited, status=0/SUCCESS)
        CPU: 7.671s
...

Fluentd is now installed and ready for configuration.

How Fluentd works

With Fluentd successfully installed, let's explore how it works.

Diagram illustrating the Fluentd observability pipeline

To understand Fluentd, you need to visualize it as a pipeline. Fluentd captures logs from multiple sources at one end of the pipeline and transforms them into a standardized log event format. As these log events traverse through the Fluentd pipeline, they can be processed, enriched, or filtered according to your requirements. Finally, at the end of the pipeline, Fluentd can efficiently forward these log events to various destinations for in-depth analysis.

To implement this concept, you can configure Fluentd by defining the log sources, transformations, and destinations in a configuration file. Depending on your installation, this configuration file can be found at either /etc/fluent/fluentd.conf or /etc/calyptia-fluentd/calyptia-fluentd.conf.

The configuration file is structured using the following directives:

Copied!

<source>
  ....
</source>

<filter unique.id>
   ...
</filter>

<match unique.id>
   ...
</match>

Let's explore these directives in detail:

<source>...</source: specifies the log source from which Fluentd should collect logs.
<filter>...</filter>: defines transformations or modifications to log events.
<match>...</match>: the destination where Fluentd should forward the processed logs.

Each of these directives requires you to specify a plugin that carries out its respective task.

Fluentd input plugins

For the <source> directive, you can choose from a variety of input plugins that suit your needs:

in_tail: read log events from the end of a file.
in_syslog: collects logs from the Syslog protocol through UDP or TCP.
in_http: provides log events through a REST endpoint.
in_exec: executes external programs and retrieves event logs from them.

Fluentd filter plugins

When it comes to processing or filtering the data, Fluentd offers a range of filter plugins to cater to your specific requirements:

filter_record_transformer: modifies log events.
grep: filters log events that match a specified pattern, similar to the grep command.
geoip: adds geographic information to log events.
parser: Parses event logs.

Fluentd output plugins

To forward logs to various destinations, Fluentd provides a variety of output plugins to choose from:

out_file: writes log events to files.
out_opensearch: delivers log events to Opensearch.
out_http: uses HTTP/HTTPS to write log records.
roundrobin: distributes log entries to multiple outputs in a round-robin fashion.

In the next section, we will demonstrate how to use the in_tail plugin to read log events from a file and send the log entries to the console using the stdout plugin.

Getting started with Fluentd

Now that you understand how Fluentd works, let's create a configuration file instructing Fluentd to read log entries from a file and display them in the console.

Open the Fluentd configuration file located at /etc/fluent/fluentd.conf:

Copied!

sudo nano /etc/fluent/fluentd.conf

Next, clear any existing contents in the file to start with a clean slate and add the following lines of code:

/etc/fluent/fluentd.conf

Copied!

<source>
  @type tail
  path /var/log/logify/app.log
  pos_file /var/log/fluent/file.log.pos
  tag file.logs
  format none
</source>

<match file.logs>
  @type stdout
</match>

The <source> directive reads log events from the end of a file. The @type option specifies the plugin to use, which is the tail plugin here. The path option defines the file's path to be read. The pos_file option specifies a file that Fluentd will use to keep track of its position when reading the file. Lastly, the tag option provides a unique name for this directive, which the <filter> or <match> directives can reference.

The <match file.logs> directive defines a matching rule that tells Fluentd how to handle data with a specific tag, which in this case is file.logs. To send these logs to the console, you set the type to the stdout plugin.

After making the changes, save the file.

Before running Fluentd, it's a good practice to validate your configuration file for errors. You can do this with the following command:

Copied!

sudo fluentd -c /etc/fluent/fluentd.conf -dry-run

Output

2023-09-22 05:26:34 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-09-22 05:26:34 +0000 [info]: parsing config file is succeeded path="/etc/fluent/fluentd.conf"
2023-09-22 05:26:34 +0000 [info]: gem 'fluentd' version '1.16.2'
...
2023-09-22 05:26:34 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type tail
    path "/var/log/logify/app.log"
    pos_file "/var/log/fluent/file.log.pos"
    tag "file.logs"
    format none
    <parse>
      @type none
      unmatched_lines
    </parse>
  </source>
  <match file.logs>
    @type stdout
  </match>
</ROOT>
2023-09-22 05:26:34 +0000 [info]: starting fluentd-1.16.2 pid=1867 ruby="3.2.2"
2023-09-22 05:26:34 +0000 [info]: spawn command to main:  cmdline=["/opt/fluent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/fluentd", "-c", "/etc/fluent/fluentd.conf", "-dry-run", "--under-supervisor"]

If no errors or issues are reported, the configuration file is ready for execution.

If you rebooted your system during Fluentd installation, move into the logify application subdirectory again:

Copied!

cd log-processing-stack/logify

Rerun the script in the background:

Copied!

./logify.sh &

Now, start Fluentd:

Copied!

sudo fluentd

Fluentd will automatically pick up the configuration file in the /etc/fluent directory. If your configuration file is in a different location, provide the full path when starting Fluentd.

Copied!

sudo fluentd -c </path/to/fluentd.conf>

Once Fluentd is running, you will see output that resembles the following:

Output

...
2023-09-22 05:31:19 +0000 [info]: starting fluentd-1.16.2 pid=1914 ruby="3.2.2"
2023-09-22 05:31:19 +0000 [info]: spawn command to main:  cmdline=["/opt/fluent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/fluentd", "--under-supervisor"]
2023-09-22 05:31:19 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2023-09-22 05:31:19 +0000 [info]: adding match pattern="file.logs" type="stdout"
2023-09-22 05:31:20 +0000 [info]: adding source type="tail"
2023-09-22 05:31:20 +0000 [info]: #0 starting fluentd worker pid=1922 ppid=1914 worker=0
2023-09-22 05:31:20 +0000 [info]: #0 following tail of /var/log/logify/app.log
2023-09-22 05:31:20 +0000 [info]: #0 fluentd worker is now running worker=0

Following that, the log messages will appear:

Output

2023-09-22 05:31:22.150262685 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Connected to database\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360682}"}
2023-09-22 05:31:25.166169434 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360685}"}
2023-09-22 05:31:28.179697560 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360688}"}
2023-09-22 05:31:31.187377861 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Connected to database\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360691}"}
2023-09-22 05:31:34.198776362 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360694}"}
...

Fluentd is now displaying the log messages along with additional context. You can exit Fluentd by pressing CTRL + C.

With Fluentd configured to read logs from a file and display them in the console, you are now ready to explore log transformations in the next section.

Transforming logs with Fluentd

After Fluentd collects logs from various sources, processing and manipulating the records often becomes necessary. This process can involve transforming unstructured logs in plain text into structured formats, such as JSON or Logfmt, which are easier for machines to parse. Additionally, you may need to enrich the logs with crucial fields, remove unwanted data, or mask sensitive information to ensure privacy.

Fluentd provides a range of filter plugins that allow you to manipulate the event streams. In this section, we will explore how to use these filter plugins to perform the following tasks:

Parsing JSON logs.
Removing unwanted fields.
Adding new fields.
Converting Unix timestamps to the ISO format.
Maskng sensitive data.

Parsing JSON logs with Fluentd

When working with logs in JSON format, it's essential to parse them correctly for structured analysis. In this section, you'll configure Fluentd to parse JSON logs effectively.

Let's begin by examining a log event from the output of the previous section:

Output

2023-09-22 05:31:34.198776362 +0000 file.logs: {
"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360694}"
}

You'll notice that the log message is enclosed in double quotes, and many of the double quotes within the JSON structure have been escaped with backslashes.

To ensure that Fluentd can work with these logs effectively and parse them as valid JSON, you need to add a <parse> section to your Fluentd directives. This section supports parser plugins and can be placed within the <source>, <match>, or <filter> directive.

Open the Fluentd configuration file:

Copied!

sudo nano /etc/fluent/fluentd.conf

Next, add the <parse> section under the <source> directive to parse the JSON logs:

/etc/fluent/fluentd.conf

Copied!

<source>
  @type tail
  path /var/log/logify/app.log
  pos_file /var/log/fluent/file.log.pos
  tag file.logs
  format none
  <parse>
    @type json
  </parse>
</source>
<match file.logs>
  @type stdout
</match>

The @type parameter within the <parse> section specifies that the json plugin should be used to parse the log events.

To ensure that Fluentd correctly parses the JSON logs, save your configuration changes and run Fluentd:

Copied!

sudo fluentd

Fluentd will collect the log events as they are generated. You will see output similar to this:

Output

2023-09-22 05:37:50.382134327 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":1695361070}
2023-09-22 05:37:53.393357890 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Task completed successfully","pid":1896,"ssn":"407-01-2433","timestamp":1695361073}
2023-09-22 05:37:56.401670531 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":1695361076}
2023-09-22 05:37:59.410052978 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Connected to database","pid":1896,"ssn":"407-01-2433","timestamp":1695361079}
...

In the output, the log events have been parsed successfully, and the properties are no longer escaped. If you closely look at the output, you will notice that Fluentd adds a timestamp and a tag name next to each JSON log event.

To remove this additional information, you can use the <format> section. Stop Fluentd again and open the /etc/fluent/fluentd.conf file:

Copied!

sudo nano /etc/fluent/fluentd.conf

Add the following code under the <match> directive in your configuration file:

/etc/fluent/fluentd.conf

Copied!

<source>
  @type tail
  path /var/log/logify/app.log
  pos_file /var/log/fluent/file.log.pos
  tag file.logs
  format none
  <parse>
    @type json
  </parse>
</source>

<match file.logs>
  @type stdout
  <format>
  @type json
  </format>
</match>

The <format> section formats the log entries, and the @type parameter specifies the json plugin used for formatting.

Save the changes and start Fluentd again:

Copied!

sudo fluentd

You will observe output similar to this:

Output

{"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":1695361256}
{"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":1695361259}
...

Notice that the output no longer includes timestamps. You can now stop Fluentd.

In this section, we've explored how to parse JSON logs using Fluentd. However, log records come in various formats, and Fluentd provides a range of <parser> plugins to handle different log formats effectively:

nginx: parses Nginx logs.
csv: parses log entries in CSV format.
regexp: parses logs according to the given regex pattern.
apache2: parses Apache2 log entries.

For the <format> section, Fluentd offers several built-in formatter plugins to customize the output format of log events:

csv: outputs log events in the CSV format.
ltsv: formats log events in the LTSV format.
msgpack: converts logs to the Msgpack binary data format.

With these tools at your disposal, you can effectively parse and format logs in various formats to suit your specific needs. In the next section, we'll explore how to add and remove unwanted fields from log entries, providing you with even greater control over your log data.

Adding and removing fields with Fluentd

In this section, you will enhance data privacy by removing sensitive information from the log entries. Specifically, you'll remove the emailAddress field and add a new hostname field to the log events.

To achieve this, open your /etc/fluent/fluentd.conf file in your text editor:

Copied!

sudo nano /etc/fluent/fluentd.conf

Make the following modifications within the source configuration:

/etc/fluent/fluentd.conf

Copied!

<source>
  @type tail
  path /var/log/logify/app.log
  pos_file /var/log/fluent/file.log.pos
  tag file.logs
  format none
  <parse>
    @type json
  </parse>
</source>

<filter file.logs>
  @type record_transformer
  remove_keys emailAddress
  <record>
    hostname "#{Socket.gethostname}"
  </record>
</filter>

<match file.logs>
  @type stdout
  <format>
  @type json
  </format>
</match>

The <filter> section is used to modify log records. The @type specifies that the record_transformer plugin will transform log events. To remove a specific property, such as the emailAddress field, you use the remove_keys parameter. Additionally, you introduce a new hostname field using the <record> section, specifying both the field name and its value.

After making these changes, save the configuration file and restart Fluentd:

Copied!

sudo fluentd

With Fluentd running, you can now observe the updated log entries. These logs will no longer contain the emailAddress field, and a new hostname field will be present:

Output

{"status":200,"ip":"127.0.0.1","level":30,"msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":1695361401,"hostname":"fluentd-host"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":1695361404,"hostname":"fluentd-host"}
...

This modification ensures that sensitive data is excluded from the log entries while enriching them with relevant information like the hostname. You can now stop Fluentd and proceed to the next section to format dates.

Formatting dates with Fluentd

The Bash script you've previously created generates logs with Unix timestamps, representing the number of seconds since January 1st, 1970, at 00:00:00 UTC. These timestamps can be challenging to read. So in this section, you'll convert them into a more human-readable format, precisely the ISO format.

To perform this conversion, open your /etc/fluent/fluentd.conf configuration file:

Copied!

sudo nano /etc/fluent/fluentd.conf

Add the following lines to your file:

/etc/fluent/fluentd.conf

Copied!

...
<filter file.logs>
  @type record_transformer
  enable_ruby true
  remove_keys emailAddress
  <record>
    hostname "#{Socket.gethostname}"
    timestamp ${Time.at(record["timestamp"]).strftime("%Y-%m-%dT%H:%M:%S.%L%z")}
  </record>
</filter>
...

The enable_ruby option lets you use Ruby expressions inside ${...}. You then redefine the timestamp field and use a Ruby expression within ${...} to convert the Unix timestamp to the ISO format. The expression Time.at(record["timestamp"]) creates a Ruby Time object with the Unix timestamp value, and the strftime() method formats the timestamp into the ISO format for readability.

After saving the new changes, start Fluentd with the following command:

Copied!

sudo fluentd

Fluentd will yield output similar to the following:

Output

{"status":200,"ip":"127.0.0.1","level":30,"msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:46:34.000+0000","hostname":"fluentd-host"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:46:37.000+0000","hostname":"fluentd-host"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:46:40.000+0000","hostname":"fluentd-host"}
...

In the output, the timestamp field is presented in a human-readable ISO format, making it much easier to understand and work with. This adjustment enhances the readability of your log data, making it more user-friendly for analysis and troubleshooting.

Working with conditional statements in Fluentd

Fluentd allows you to leverage the Ruby ternary operator when the enable_ruby option is enabled. This operator allows you to write concise conditional statements, facilitating Fluentd's ability to make decisions based on specified conditions. In this section, you'll use the ternary operator to check if the status field equals 200. If the condition is met, Fluentd will add an is_successful field with a value of true; otherwise, it will be set to false.

First, open your /etc/fluent/fluentd.conf configuration file:

Copied!

sudo nano /etc/fluent/fluentd.conf

To implement this conditional statement, enter the following code:

/etc/fluent/fluentd.conf

Copied!

...
<filter file.logs>
  @type record_transformer
  enable_ruby true
  remove_keys emailAddress
  <record>
    hostname "#{Socket.gethostname}"
    timestamp ${Time.at(record["timestamp"]).strftime("%Y-%m-%dT%H:%M:%S.%L%z")}
    is_successful ${record["status"] == 200 ? "true" : "false"}
  </record>
</filter>
...

In the code snippet above, you use the ternary operator to check if the status field equals 200. If the condition is true, the is_successful field is assigned the value true; conversely, if the condition is false, the is_successful field is assigned the value false.

Start Fluentd again:

Copied!

sudo fluentd

Output

{"status":200,"ip":"127.0.0.1","level":30,"msg":"Task completed successfully","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:49:40.000+0000","hostname":"fluentd-host","is_successful":"true"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Task completed successfully","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:49:43.000+0000","hostname":"fluentd-host","is_successful":"true"}
...

As you observe the log entries, you will notice the presence of the is_successful field, indicating whether a log entry corresponds to a successful event(true) or not (false) based on the status field value.

This addition of conditional statements in Fluentd provides a powerful way to manipulate log data and add context or flags to log entries based on specific conditions.

Redacting sensitive data with Fluentd

Even though you have removed the emailAddress from the log messages, sensitive fields like the IP address and Social Security Number are still present. To ensure that personal information remains secure, you should redact this sensitive data, especially when it cannot be removed entirely.

Open your Fluentd configuration file again:

Copied!

sudo nano /etc/fluent/fluentd.conf

You can redact the IP address and social security number with the following code:

/etc/fluent/fluentd.conf

Copied!

...
<filter file.logs>
  @type record_transformer
  enable_ruby true
  remove_keys emailAddress
  <record>
    hostname "#{Socket.gethostname}"
    timestamp ${Time.at(record["timestamp"]).strftime("%Y-%m-%dT%H:%M:%S.%L%z")}
    is_successful ${record["status"] == 200 ? "true" : "false"}
    ip ${record["ip"].gsub(/(\d+\.\d+\.\d+\.\d+)/, 'REDACTED')}
    ssn ${record["ssn"].gsub(/(\d{3}-\d{2}-\d{4})/, 'REDACTED')}
  </record>
</filter>
...

The gsub() method locates specific strings based on the provided regular expressions and replaces them with the text 'REDACTED'. The first gsub() operation replaces IP addresses, and the second one replaces SSNs.

After saving these changes, run Fluentd with the following command:

Copied!

sudo fluentd

Output

{"status":200,"ip":"REDACTED","level":30,"msg":"Connected to database","pid":1896,"ssn":"REDACTED","timestamp":"2023-09-22T05:51:01.000+0000","hostname":"fluentd-host","is_successful":"true"}
{"status":200,"ip":"REDACTED","level":30,"msg":"Connected to database","pid":1896,"ssn":"REDACTED","timestamp":"2023-09-22T05:51:04.000+0000","hostname":"fluentd-host","is_successful":"true"}
...

You will observe that both the IP address and SSN fields have been successfully redacted from the log entries.

In scenarios where you have private information within the same string, like this:

Output

{..., "privateInfo": "This is a sample message with SSN: 123-45-6789 and IP: 192.168.0.1"}

You can simultaneously selectively redact sensitive portions, such as the SSN and IP address:

Copied!

 ...
 privateInfo ${record["privateInfo"].gsub(/(\d{3}-\d{2}-\d{4})/, 'REDACTED').gsub(/(\d+\.\d+\.\d+\.\d+)/, 'REDACTED')}
 ...

Upon redaction, the output will resemble the following:

Output

{...,privateInfo":"This is a sample message with SSN: REDACTED and IP: REDACTED"}

By effectively masking sensitive data, Fluentd enhances the security and privacy of your log entries. You can now stop Fluent and the logify.sh script.

You can stop the logify.sh by entering the following command in your terminal to obtain the process ID:

Copied!

jobs -l | grep "logify"

Output

[1]+  2113 Running                 ./logify.sh &

Kill the program with the command that follows, and be sure to substitute the process ID:

Copied!

kill -9 <2113>

In the next section, we will explore how to collect logs from Docker containers using Fluentd.

Collecting logs from Docker containers and centralizing logs

In this section, you will containerize the Bash script and use the Nginx hello world Docker image, which is preconfigured to generate JSON Nginx logs for each incoming request. Subsequently, you will employ a Fluentd container to collect logs from both containers and transmit them to Better Stack for monitoring and analysis.

Dockerizing the Bash script

In this section, you'll containerize the Bash script responsible for generating log data. Containerization allows us to encapsulate the script and its dependencies, ensuring consistency and portability across different environments.

First, ensure you are still in the log-processing-stack/logify directory. Then, create a Dockerfile that defines how the script should be included in the container:

Copied!

nano Dockerfile

In your Dockerfile, add the following instructions:

log-processing-stack/logify/Dockerfile

Copied!

FROM ubuntu:latest

COPY . .

RUN chmod +x logify.sh

RUN mkdir -p /var/log/logify

RUN ln -sf /dev/stdout /var/log/logify/app.log

CMD ["./logify.sh"]

In this Dockerfile, you use the latest version of Ubuntu as the base image. You then copy the script into the container, ensure it's executable, and create a directory for log files. Additionally, you set up a redirection mechanism that sends any data written to /var/log/logify/app.log to the standard output. This configuration lets you conveniently view the container's logs using the docker logs command. Finally, you specify that the script should be executed when the container is launched.

Next, move into the parent project directory:

Copied!

cd ..

Create a docker-compose.yml with your editor:

Copied!

nano docker-compose.yml

Then, define the Bash script and Nginx services:

log-processing-stack/docker-compose.yml

Copied!

version: '3'
services:
  logify-script:
    build:
      context: ./logify
    container_name: logify
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    container_name: nginx
    ports:
      - '80:80'

In this Docker Compose configuration, you define two services: logify-script and nginx. The logify-script service is built from the ./logify directory context. The nginx service uses a pre-built Nginx image, mapping port 80 on the host to port 80 within the container. it's essential to ensure that no other services currently use port 80 on the host to avoid port conflicts.

Now that you have defined the services, let's build the Docker images and create the containers:

Copied!

docker compose up -d

The -d flag starts the services in the background.

Check the status of the running containers using the following command:

Copied!

docker compose ps

You should observe a "running" status under the "STATUS" column for both containers, similar to this:

Output

NAME                COMMAND              SERVICE             STATUS              PORTS
logify              "./logify.sh"        logify-script       running
nginx               "/runner.sh nginx"   nginx               running             0.0.0.0:80->80/tcp, :::80->80/tcp

With the containers up and running, send HTTP requests to the Nginx service using curl to generate log data:

Copied!

curl http://localhost:80/?[1-5]

To view the logs generated by all running containers, use the following command:

Copied!

docker compose logs

Output

nginx  | {"timestamp":"2023-09-22T07:51:22+00:00","pid":"8","remote_addr":"172.21.0.1","remote_user":"","request":"GET /?1 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1695369082.183"}
...
nginx  | {"timestamp":"2023-09-22T07:51:22+00:00","pid":"8","remote_addr":"172.21.0.1","remote_user":"","request":"GET /?2 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1695369082.190"}
logify  | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 1, "ssn": "407-01-2433", "timestamp": 1695369060}
...
logify  | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 1, "ssn": "407-01-2433", "timestamp": 1695369093}

The output will display the logs generated by each service.

With the Bash script containerized and generating log data, the next step is configuring Fluentd to collect and centralize these logs for further analysis.

Defining the Fluentd service with Docker Compose

In this section, you will define a Fluentd service within the Docker Compose setup. This service will collect logs from the existing containers and forward them to Better Stack. To do this, you will create a Fluentd configuration file, containerize Fluentd, and deploy the Fluentd service.

First, you need to update the docker-compose.yml file as follows:

log-processing-stack/docker-compose.yml

Copied!

version: '3'
services:
  logify-script:
    build:
      context: ./logify
    container_name: logify
    links:
      - fluentd
    depends_on:
      - fluentd
    logging:
      driver: "fluentd"
      options:
        tag: docker.logify
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    logging:
      driver: json-file
    container_name: nginx
    ports:
      - '80:80'
    links:
      - fluentd
    depends_on:
      - fluentd
    logging:
      driver: "fluentd"
      options:
        tag: docker.nginx
  fluentd:
    build:
      context: ./fluentd
    volumes:
      - ./fluentd/fluent.conf:/fluentd/etc/fluent.conf
    container_name: fluent
    ports:
      - "24224:24224"
      - "24224:24224/udp"

In this updated configuration, the logify-script and nginx services are linked to the fluentd service for log aggregation and forwarding. They are configured to use the Fluentd driver for logging. The logify-script service tags log entries as docker.logify, and the Nginx service tags log entries as docker.nginx. These tags help Fluentd distinguish the source of log entries when processing them.

The fluentd service is constructed using the fluentd directory context and incorporates a volume that maps Fluentd's configuration file, (which we will create shortly). It is configured to expose the 24224 port for Fluentd's input and output operations. This comprehensive setup ensures that Fluentd efficiently processes the log entries from both the logify-script and nginx services.

Following that, create the fluentd directory and move into it:

Copied!

mkdir fluentd && cd fluentd

Now, create the Dockerfile to customize the official Fluentd image:

Copied!

nano Dockerfile

In your Dockerfile, enter the following custom instructions:

log-processing-stack/fluentd/Dockerfile

Copied!

FROM fluentd
USER root
RUN ["fluent-gem", "install", "fluent-plugin-logtail"]
USER fluent

This Dockerfile uses the Fluentd base image and installs the fluent-plugin-logtail gem, which enables Fluentd to forward logs to Better Stack. The user is set to fluent to ensure Fluentd runs with appropriate permissions.

Next, create a fluent.conf configuration file:

Copied!

nano fluent.conf

Add the following input source in the file:

log-processing-stack/fluentd/fluent.conf

Copied!

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

This configuration source specifies that Fluentd should listen for logs on port 24224, accepting log data from services configured to send their logs to Fluentd using the forward input.

Now, you need to set up the destination to forward these logs. For this purpose, you will use Better Stack to centralize and manage the logs.

Before defining the output source, you'll need to create a free Better Stack account. Once you've signed in, navigate to the Sources section:

Screenshot pointing to the **Sources** link

On the Sources page, click the Connect source button:

Screenshot indicating the **Connect source** button

Now, provide your source a name and select "Fluentd" as the platform:

Screenshot with the name field filled as "Logify logs" and the Platform set to "Fluentd"

Once your source has been created, copy the Source Token field to your clipboard:

Screenshot with an arrow pointing to the "Source Token" field

Return to the fluent.conf file and add the match directive to forward Docker logs to Better Stack.(Remember to update the source token):

log-processing-stack/fluentd/fluent.conf

Copied!

...
<match docker.logify.**>
  @type logtail
  @id output_logify_logtail
  source_token <your_logify_source_token>
  flush_interval 2 # in seconds
</match>

The <match docker.logify.**> tells Fluentd to match log entries with tags starting with docker.logify and forward them to Better Stack using the logtail plugin. You then provide a unique ID and your Better Stack source token. Finally, you set a flush interval of 2 seconds for log forwarding.

Once you've made these changes, save and exit the configuration file.

Return to the root directory:

Copied!

cd ..

Start the Fluentd service with the following command:

Copied!

docker compose up -d

After a few moments, check Better Stack to confirm if the log entries have been successfully forwarded:

Screenshot displaying the log entries in Better Stack

Your Bash script logs are now being forwarded to Better Stack.

To forward Nginx and Fluentd logs, follow similar steps by creating two additional sources—one for Nginx logs and another for Fluentd logs.

When you create these sources, the interface will look like this:

Screenshot of Better Stack with three sources: Logify, Nginx, and Fluentd

Now, add the following match directives to deliver Nginx and Fluentd logs to Better Stack, ensuring you update the source tokens accordingly:

log-processing-stack/fluentd/fluent.conf

Copied!

...
<match docker.nginx.**>
  @type logtail
  @id output_nginx_logtail
  source_token <your_logify_source_token>
  flush_interval 2 # in seconds
</match>

<label @FLUENT_LOG>
<match fluent.*>
  @type logtail
  @id output_fluent_logtail
  source_token  <your_fluentd_source_token>
  flush_interval 2 # in seconds
</match>
</label>

The <match docker.nginx.**> directive uses the logtail plugin to forward logs with tags starting with docker.nginx. to Better Stack.

To match Fluentd internal logs, you define a <label @FLUENT_LOG> and a matching condition <match fluent.\*\*> to forward logs with tags starting with "fluent" to Better Stack. Fluentd typically produces internal logs with the tag "fluent:. These logs can be seen every time you start Fluentd.

After saving these changes, stop and discard all the services:

Copied!

docker compose down

Start the services again:

Copied!

docker compose up -d

Send more requests to the Nginx service:

Copied!

curl http://localhost:80/?[1-5]

You'll notice the Nginx logs being successfully uploaded to Better Stack.

Screenshot of Nginx logs in Better Stack

And Fluentd logs will look similar to this:

Screenshot of Fluentd logs in Better Stack

Monitoring Fluentd health with Better Stack

While Fluentd lacks a built-in /health endpoint for external monitoring, it features a monitoring agent that collects internal metrics in JSON format and exposes them via a /api/plugins.json endpoint.

To access internal Fluentd metrics via the REST API, first open the fluent.conf configuration file:

Copied!

nano fluentd/fluent.conf

Add these lines at the top of the fluent.conf configuration file:

log-processing-stack/fluentd/fluent.conf

Copied!

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 24220
</source>
...

The <source> sets up a Fluentd monitoring agent that exposes internal metrics on port 24220.

Next, update the docker-compose.yml file to define the port for the Fluentd internal metrics API endpoint:

log-processing-stack/docker-compose.yml

Copied!

  fluentd:
    ...
    ports:
      - "24220:24220"
      - "24224:24224"
      - "24224:24224/udp

When done, restart Fluentd with the following command:

Copied!

docker compose up -d

Verify that Fluentd's /api/plugins.json endpoint works:

Copied!

curl http://localhost:24220/api/plugins.json

Output

{"plugins":[{"plugin_id":"object:8ac","plugin_category":"input","type":"monitor_agent","config":{"@type":"monitor_agent","bind":"0.0.0.0","port":"24220"},"output_plugin":false,"retry_count":null,"emit_records":0,"emit_size":0},
...
buffer_total_queued_size":0,"retry_count":0,"emit_records":3,"emit_size":0,"emit_count":3,"write_count":1,"rollback_count":0,"slow_flush_count":0,"flush_time_count":1429,"buffer_stage_length":0,"buffer_stage_byte_size":0,"buffer_queue_byte_size":0,"buffer_available_buffer_space_ratios":100.0,"retry":{}}]}

Next, log in to Better Stack.

Once you are on the Monitors page, click the Create monitor button:

Screenshot of the monitors page, providing an option to create a monitor

Afterward, enter the relevant information and click the Create monitor button:

Screenshot of Better Stack configured with the necessary options

In this setup, you can choose your preferred method to trigger Better Stack and provide the server's IP address or domain name along with the /api/plugins.json endpoint on port 24220. Finally, select how you would like to be notified.

Once the configuration is complete, Better Stack will initiate monitoring of the Fluentd endpoint, delivering valuable performance statistics:

Screenshot of Better Stack monitoring the REST API endpoint

To demonstrate the response when Fluentd stops running, causing the endpoint to cease functioning, stop all the services with:

Copied!

docker compose stop

Upon returning to Better Stack, you will observe the status updated to "Down" after a few moments pass:

Screenshot of Better Stack indicating that the endpoint is down

If you have configured Better Stack to alert you via email, you will receive an email alert:

Screenshot of the email alert from Better Stack notifying of the endpoint's downtime

With that, you can proactively manage Fluentd's health and promptly address any interruptions in its operation.

Final thoughts

In this comprehensive article, you explored Fluentd and how it integrates seamlessly with Docker, Nginx, and Better Stack for effective log management. You began by creating a Fluentd configuration file, which you then followed with using Fluentd to gather logs from multiple containers and centralize them within Better Stack. Additionally, you learned how to monitor Fluentd's health using Better Stack, ensuring proactive alerts in case of any disruptions.

With this knowledge, you can use Fluentd effectively for log collection and forwarding. For further learning, refer to the Fluentd documentation. To deepen your understanding of Docker and Docker Compose, explore their respective documentation pages: Docker and Docker Compose. For additional insights into Docker logging, refer to our comprehensive guide.

If you're curious about Fluentd alternatives, check out our guide on log shippers.

Thanks for reading, and happy logging!

Article by

Stanley Ulili

Stanley is a freelance web developer and researcher from Malawi. He loves learning new things and writing about them to understand and solidify concepts. He hopes that by sharing his experience, others can learn something from them too!

Got an article suggestion? Let us know

How to Collect, Process, and Ship Log Data with Fluent Bit

Learn how to use Fluent Bit to simplify the collection, processing, and shipping of log data at scale, enhancing observability and troubleshooting capabilities

→

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us

Writer of the month

Marin Bezhanov

Marin is a software engineer and architect with a broad range of experience working...

Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github

How to Collect, Process, and Ship Log Data with Fluentd

Contents

Prerequisites

Developing a demo logging application

Installing Fluentd

How Fluentd works

Fluentd input plugins

Fluentd filter plugins

Fluentd output plugins

Getting started with Fluentd

Transforming logs with Fluentd

Parsing JSON logs with Fluentd

Adding and removing fields with Fluentd

Formatting dates with Fluentd

Working with conditional statements in Fluentd

Redacting sensitive data with Fluentd

Collecting logs from Docker containers and centralizing logs

Dockerizing the Bash script

Defining the Fluentd service with Docker Compose

Monitoring Fluentd health with Better Stack

Final thoughts

Make your mark

Join the writer's program

Build on top of Better Stack