How to Collect, Process, and Ship Log Data with Fluentd
In today's complex computing environments, operating systems, applications, and databases generate logs crucial for understanding system behavior, diagnosing issues, and ensuring smooth operations. Centralizing these logs simplifies error analysis and troubleshooting. To achieve this centralization, you need a log shipper—a tool designed to collect logs from multiple sources, process and forward them to a centralized location for analysis.
Fluentd is a robust, open-source log shipper developed by Treasure Data. It excels at capturing logs from various sources, unifying them for processing, and forwarding them to multiple destinations for analysis and monitoring. Fluentd distinguishes itself with its lightweight memory footprint, consuming as little as 30-40MB of memory. Its pluggable architecture empowers the community to extend its capabilities through plugins, which currently boasts a library of over 1000 plugins. Additionally, Fluentd implements buffering mechanisms to prevent data loss and can handle substantial data volumes. Currently, Fluentd is being used by over 5000 companies, and the documentation claims that the largest user is collecting logs from more than 50,000 servers.
In this comprehensive guide, you'll use Fluentd to collect, process, and forward logs to various destinations. To begin, you'll create a sample application that generates logs to a file. Next, you'll use Fluentd to read the logs from the file and redirect them to the console. As you progress, you'll transform logs, collect them from containerized environments, and centralize log data. Lastly, you'll monitor Fluentd's health to ensure it operates without issues.
Prerequisites
Before you begin, ensure you have access to a system with a non-root user account with sudo
privileges. And if you plan to follow along with later sections that involve Fluentd collecting logs from Docker containers, you should install Docker and Docker Compose on your system. If you're not familiar with log shippers, you can explore their benefits by reading this article.
With these prerequisites in place, create a root project directory using the following command:
mkdir log-processing-stack
Navigate to the newly created directory:
cd log-processing-stack
Inside this project directory, create a subdirectory for the demo application and move into the directory:
mkdir logify && cd logify
Now you're ready to proceed with creating the demo logging application.
Developing a demo logging application
In this section, you'll create a sample logging application with Bash that generates logs at regular intervals.
In the logify
directory, create a Bash script file named logify.sh
using your preferred text editor:
nano logify.sh
In your logify.sh
file, add the following contents to create the application:
#!/bin/bash
filepath="/var/log/logify/app.log"
create_log_entry() {
local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
local http_status_code=200
local ip_address="127.0.0.1"
local emailAddress="user@mail.com"
local level=30
local pid=$$
local ssn="407-01-2433"
local time=$(date +%s)
local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "emailAddress": "'$emailAddress'", "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "timestamp": '$time'}'
echo "$log"
}
while true; do
log_record=$(create_log_entry)
echo "${log_record}" >> "${filepath}"
sleep 3
done
The create_log_entry()
function creates log entries in JSON format, containing details such as the HTTP status code, IP address, severity level, a random log message, and a timestamp. The sensitive fields like the IP address, Social Security Number(SSN), and email address have been intentionally added to demonstrate Fluentd's capability to filter out sensitive information later. For logging best practices, consult this guide.
Following this, you establish an infinite loop that continuously invokes the create_log_entry()
function to generate log entries and append them to an app.log
file in the /var/log/logify/
directory.
Once you're finished, save your modifications and make the script executable:
chmod +x logify.sh
Afterward, create the /var/log/logify
directory that will contain the application logs:
sudo mkdir /var/log/logify
Change the ownership of the /var/log/logify
directory to the user specified in the $USER environment variable, which represents the currently logged-in user:
sudo chown -R $USER:$USER /var/log/logify/
Now, run the script in the background:
./logify.sh &
The &
puts the running script in the background.
When the program starts, it will display output that looks like this:
[1] 2903
2903
is the process ID, which can be used to terminate the script later.
To view the contents of the app.log
file, type the tail
command:
tail -n 4 /var/log/logify/app.log
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 3727, "ssn": "407-01-2433", "timestamp": 1695368528}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 3727, "ssn": "407-01-2433", "timestamp": 1695368531}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 13682, "ssn": "407-01-2433", "timestamp": 1695380673}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 13682, "ssn": "407-01-2433", "timestamp": 1695380676}
You have now successfully created a logging application that produces sample log entries.
Installing Fluentd
Now that you can generate logs with the demo app, let's install a recent version of Fluentd. This guide will focus on installing Fluentd on an Ubuntu 22.04 system. If you use a different operating system, consult the official documentation page for installation instructions.
Before installing Fluentd, you need to adjust the number of file descriptors, as recommended in the documentation.
To check the current limit, execute the following command:
ulimit -n
1022
If the output displays 1022
, you should increase this limit.
Open the /etc/security/limits.conf
file:
sudo nano /etc/security/limits.conf
To increase the limit, add the following lines at the end of the file:
root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536
After making these changes, reboot your system and then verify that the new limit has taken effect:
ulimit -n
65535
You should see an output of 65535
, indicating that the limit has been successfully increased.
You are now set to install Fluentd on your system.
Fluentd has two variants:
fluent-package
(formerly known astd-agent
): this package is maintained by the Fluentd project.calyptia-fluentd
: the maintenance of this package is undercalyptia
.
The primary difference between the two variants lies in the Ruby versions they are bundled with. fluent-package
is bundled with Ruby 2.7 for compatibility reasons. In contrast, calyptia-fluentd
is bundled with Ruby 3.
In this guide, you will install the fluent-package
. Run the following command to install the Fluentd LTS version:
curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-jammy-fluent-package5-lts.sh | sh
Once the installation is complete, check the Fluentd version to confirm it installed successfully:
fluentd --version
You should see output similar to the following:
fluent-package 5.0.1 fluentd 1.16.2 (d5685ada81ac89a35a79965f1e94bbe5952a5d3a)
When you install Fluentd, it automatically starts as a systemd service. However, for this tutorial, you'll run Fluentd manually. Running Fluentd manually while another instance runs in the background can lead to conflicts.
To prevent conflicts, stop the background service with the following command:
sudo systemctl stop fluentd
Check the status to confirm that the service is now inactive:
sudo systemctl status fluentd
You should see the "Active: inactive (dead)" status in the output, indicating that the service has been stopped:
fluentd.service - fluentd: All in one package of Fluentd
Loaded: loaded (/lib/systemd/system/fluentd.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Fri 2023-09-22 06:16:56 UTC; 6s ago
Docs: https://docs.fluentd.org/
Process: 11992 ExecStop=/bin/kill -TERM ${MAINPID} (code=exited, status=0/SUCCESS)
Main PID: 1763 (code=exited, status=0/SUCCESS)
CPU: 7.671s
...
Fluentd is now installed and ready for configuration.
How Fluentd works
With Fluentd successfully installed, let's explore how it works.
To understand Fluentd, you need to visualize it as a pipeline. Fluentd captures logs from multiple sources at one end of the pipeline and transforms them into a standardized log event format. As these log events traverse through the Fluentd pipeline, they can be processed, enriched, or filtered according to your requirements. Finally, at the end of the pipeline, Fluentd can efficiently forward these log events to various destinations for in-depth analysis.
To implement this concept, you can configure Fluentd by defining the log sources, transformations, and destinations in a configuration file. Depending on your installation, this configuration file can be found at either /etc/fluent/fluentd.conf
or /etc/calyptia-fluentd/calyptia-fluentd.conf
.
The configuration file is structured using the following directives:
<source>
....
</source>
<filter unique.id>
...
</filter>
<match unique.id>
...
</match>
Let's explore these directives in detail:
<source>...</source
: specifies the log source from which Fluentd should collect logs.<filter>...</filter>
: defines transformations or modifications to log events.<match>...</match>
: the destination where Fluentd should forward the processed logs.
Each of these directives requires you to specify a plugin that carries out its respective task.
Fluentd input plugins
For the <source>
directive, you can choose from a variety of input plugins that suit your needs:
in_tail
: read log events from the end of a file.in_syslog
: collects logs from the Syslog protocol through UDP or TCP.in_http
: provides log events through a REST endpoint.in_exec
: executes external programs and retrieves event logs from them.
Fluentd filter plugins
When it comes to processing or filtering the data, Fluentd offers a range of filter plugins to cater to your specific requirements:
filter_record_transformer
: modifies log events.grep
: filters log events that match a specified pattern, similar to thegrep
command.geoip
: adds geographic information to log events.parser
: Parses event logs.
Fluentd output plugins
To forward logs to various destinations, Fluentd provides a variety of output plugins to choose from:
out_file
: writes log events to files.out_opensearch
: delivers log events to Opensearch.out_http
: uses HTTP/HTTPS to write log records.roundrobin
: distributes log entries to multiple outputs in a round-robin fashion.
In the next section, we will demonstrate how to use the in_tail
plugin to read log events from a file and send the log entries to the console using the stdout
plugin.
Getting started with Fluentd
Now that you understand how Fluentd works, let's create a configuration file instructing Fluentd to read log entries from a file and display them in the console.
Open the Fluentd configuration file located at /etc/fluent/fluentd.conf
:
sudo nano /etc/fluent/fluentd.conf
Next, clear any existing contents in the file to start with a clean slate and add the following lines of code:
<source>
@type tail
path /var/log/logify/app.log
pos_file /var/log/fluent/file.log.pos
tag file.logs
format none
</source>
<match file.logs>
@type stdout
</match>
The <source>
directive reads log events from the end of a file. The @type
option specifies the plugin to use, which is the tail
plugin here. The path
option defines the file's path to be read. The pos_file
option specifies a file that Fluentd will use to keep track of its position when reading the file. Lastly, the tag
option provides a unique name for this directive, which the <filter>
or <match>
directives can reference.
The <match file.logs>
directive defines a matching rule that tells Fluentd how to handle data with a specific tag, which in this case is file.logs
. To send these logs to the console, you set the type
to the stdout
plugin.
After making the changes, save the file.
Before running Fluentd, it's a good practice to validate your configuration file for errors. You can do this with the following command:
sudo fluentd -c /etc/fluent/fluentd.conf -dry-run
2023-09-22 05:26:34 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-09-22 05:26:34 +0000 [info]: parsing config file is succeeded path="/etc/fluent/fluentd.conf"
2023-09-22 05:26:34 +0000 [info]: gem 'fluentd' version '1.16.2'
...
2023-09-22 05:26:34 +0000 [info]: using configuration file: <ROOT>
<source>
@type tail
path "/var/log/logify/app.log"
pos_file "/var/log/fluent/file.log.pos"
tag "file.logs"
format none
<parse>
@type none
unmatched_lines
</parse>
</source>
<match file.logs>
@type stdout
</match>
</ROOT>
2023-09-22 05:26:34 +0000 [info]: starting fluentd-1.16.2 pid=1867 ruby="3.2.2"
2023-09-22 05:26:34 +0000 [info]: spawn command to main: cmdline=["/opt/fluent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/fluentd", "-c", "/etc/fluent/fluentd.conf", "-dry-run", "--under-supervisor"]
If no errors or issues are reported, the configuration file is ready for execution.
If you rebooted your system during Fluentd installation, move into the logify
application subdirectory again:
cd log-processing-stack/logify
Rerun the script in the background:
./logify.sh &
Now, start Fluentd:
sudo fluentd
Fluentd will automatically pick up the configuration file in the /etc/fluent
directory. If your configuration file is in a different location, provide the full path when starting Fluentd.
sudo fluentd -c </path/to/fluentd.conf>
Once Fluentd is running, you will see output that resembles the following:
...
2023-09-22 05:31:19 +0000 [info]: starting fluentd-1.16.2 pid=1914 ruby="3.2.2"
2023-09-22 05:31:19 +0000 [info]: spawn command to main: cmdline=["/opt/fluent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/fluentd", "--under-supervisor"]
2023-09-22 05:31:19 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2023-09-22 05:31:19 +0000 [info]: adding match pattern="file.logs" type="stdout"
2023-09-22 05:31:20 +0000 [info]: adding source type="tail"
2023-09-22 05:31:20 +0000 [info]: #0 starting fluentd worker pid=1922 ppid=1914 worker=0
2023-09-22 05:31:20 +0000 [info]: #0 following tail of /var/log/logify/app.log
2023-09-22 05:31:20 +0000 [info]: #0 fluentd worker is now running worker=0
Following that, the log messages will appear:
2023-09-22 05:31:22.150262685 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Connected to database\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360682}"}
2023-09-22 05:31:25.166169434 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360685}"}
2023-09-22 05:31:28.179697560 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360688}"}
2023-09-22 05:31:31.187377861 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Connected to database\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360691}"}
2023-09-22 05:31:34.198776362 +0000 file.logs: {"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360694}"}
...
Fluentd is now displaying the log messages along with additional context. You can exit Fluentd by pressing CTRL + C
.
With Fluentd configured to read logs from a file and display them in the console, you are now ready to explore log transformations in the next section.
Transforming logs with Fluentd
After Fluentd collects logs from various sources, processing and manipulating the records often becomes necessary. This process can involve transforming unstructured logs in plain text into structured formats, such as JSON or Logfmt, which are easier for machines to parse. Additionally, you may need to enrich the logs with crucial fields, remove unwanted data, or mask sensitive information to ensure privacy.
Fluentd provides a range of filter plugins that allow you to manipulate the event streams. In this section, we will explore how to use these filter plugins to perform the following tasks:
- Parsing JSON logs.
- Removing unwanted fields.
- Adding new fields.
- Converting Unix timestamps to the ISO format.
- Maskng sensitive data.
Parsing JSON logs with Fluentd
When working with logs in JSON format, it's essential to parse them correctly for structured analysis. In this section, you'll configure Fluentd to parse JSON logs effectively.
Let's begin by examining a log event from the output of the previous section:
2023-09-22 05:31:34.198776362 +0000 file.logs: {
"message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Initialized application\", \"pid\": 1896, \"ssn\": \"407-01-2433\", \"timestamp\": 1695360694}"
}
You'll notice that the log message is enclosed in double quotes, and many of the double quotes within the JSON structure have been escaped with backslashes.
To ensure that Fluentd can work with these logs effectively and parse them as valid JSON, you need to add a <parse>
section to your Fluentd directives. This section supports parser plugins and can be placed within the <source>
, <match>
, or <filter>
directive.
Open the Fluentd configuration file:
sudo nano /etc/fluent/fluentd.conf
Next, add the <parse>
section under the <source>
directive to parse the JSON logs:
<source>
@type tail
path /var/log/logify/app.log
pos_file /var/log/fluent/file.log.pos
tag file.logs
format none
<parse>
@type json
</parse>
</source>
<match file.logs>
@type stdout
</match>
The @type
parameter within the <parse>
section specifies that the json
plugin should be used to parse the log events.
To ensure that Fluentd correctly parses the JSON logs, save your configuration changes and run Fluentd:
sudo fluentd
Fluentd will collect the log events as they are generated. You will see output similar to this:
2023-09-22 05:37:50.382134327 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":1695361070}
2023-09-22 05:37:53.393357890 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Task completed successfully","pid":1896,"ssn":"407-01-2433","timestamp":1695361073}
2023-09-22 05:37:56.401670531 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":1695361076}
2023-09-22 05:37:59.410052978 +0000 file.logs: {"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Connected to database","pid":1896,"ssn":"407-01-2433","timestamp":1695361079}
...
In the output, the log events have been parsed successfully, and the properties are no longer escaped. If you closely look at the output, you will notice that Fluentd adds a timestamp and a tag name next to each JSON log event.
To remove this additional information, you can use the <format>
section. Stop Fluentd again and open the /etc/fluent/fluentd.conf
file:
sudo nano /etc/fluent/fluentd.conf
Add the following code under the <match>
directive in your configuration file:
<source>
@type tail
path /var/log/logify/app.log
pos_file /var/log/fluent/file.log.pos
tag file.logs
format none
<parse>
@type json
</parse>
</source>
<match file.logs>
@type stdout
<format>
@type json
</format>
</match>
The <format>
section formats the log entries, and the @type
parameter specifies the json
plugin used for formatting.
Save the changes and start Fluentd again:
sudo fluentd
You will observe output similar to this:
{"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":1695361256}
{"status":200,"ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":1695361259}
...
Notice that the output no longer includes timestamps. You can now stop Fluentd.
In this section, we've explored how to parse JSON logs using Fluentd. However, log records come in various formats, and Fluentd provides a range of <parser>
plugins to handle different log formats effectively:
nginx
: parses Nginx logs.csv
: parses log entries in CSV format.regexp
: parses logs according to the given regex pattern.apache2
: parses Apache2 log entries.
For the <format>
section, Fluentd offers several built-in formatter plugins to customize the output format of log events:
csv
: outputs log events in the CSV format.ltsv
: formats log events in the LTSV format.msgpack
: converts logs to the Msgpack binary data format.
With these tools at your disposal, you can effectively parse and format logs in various formats to suit your specific needs. In the next section, we'll explore how to add and remove unwanted fields from log entries, providing you with even greater control over your log data.
Adding and removing fields with Fluentd
In this section, you will enhance data privacy by removing sensitive information from the log entries. Specifically, you'll remove the emailAddress
field and add a new hostname
field to the log events.
To achieve this, open your /etc/fluent/fluentd.conf
file in your text editor:
sudo nano /etc/fluent/fluentd.conf
Make the following modifications within the source configuration:
<source>
@type tail
path /var/log/logify/app.log
pos_file /var/log/fluent/file.log.pos
tag file.logs
format none
<parse>
@type json
</parse>
</source>
<filter file.logs>
@type record_transformer
remove_keys emailAddress
<record>
hostname "#{Socket.gethostname}"
</record>
</filter>
<match file.logs>
@type stdout
<format>
@type json
</format>
</match>
The <filter>
section is used to modify log records. The @type
specifies that the record_transformer
plugin will transform log events. To remove a specific property, such as the emailAddress
field, you use the remove_keys
parameter. Additionally, you introduce a new hostname
field using the <record>
section, specifying both the field name and its value.
After making these changes, save the configuration file and restart Fluentd:
sudo fluentd
With Fluentd running, you can now observe the updated log entries. These logs will no longer contain the emailAddress
field, and a new hostname
field will be present:
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":1695361401,"hostname":"fluentd-host"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":1695361404,"hostname":"fluentd-host"}
...
This modification ensures that sensitive data is excluded from the log entries while enriching them with relevant information like the hostname
. You can now stop Fluentd and proceed to the next section to format dates.
Formatting dates with Fluentd
The Bash script you've previously created generates logs with Unix timestamps, representing the number of seconds since January 1st, 1970, at 00:00:00 UTC. These timestamps can be challenging to read. So in this section, you'll convert them into a more human-readable format, precisely the ISO format.
To perform this conversion, open your /etc/fluent/fluentd.conf
configuration file:
sudo nano /etc/fluent/fluentd.conf
Add the following lines to your file:
...
<filter file.logs>
@type record_transformer
enable_ruby true
remove_keys emailAddress
<record>
hostname "#{Socket.gethostname}"
timestamp ${Time.at(record["timestamp"]).strftime("%Y-%m-%dT%H:%M:%S.%L%z")}
</record>
</filter>
...
The enable_ruby
option lets you use Ruby expressions inside ${...}
. You then redefine the timestamp
field and use a Ruby expression within ${...}
to convert the Unix timestamp to the ISO format. The expression Time.at(record["timestamp"])
creates a Ruby Time
object with the Unix timestamp value, and the strftime()
method formats the timestamp into the ISO format for readability.
After saving the new changes, start Fluentd with the following command:
sudo fluentd
Fluentd will yield output similar to the following:
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Operation finished","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:46:34.000+0000","hostname":"fluentd-host"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:46:37.000+0000","hostname":"fluentd-host"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Initialized application","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:46:40.000+0000","hostname":"fluentd-host"}
...
In the output, the timestamp
field is presented in a human-readable ISO format, making it much easier to understand and work with. This adjustment enhances the readability of your log data, making it more user-friendly for analysis and troubleshooting.
Working with conditional statements in Fluentd
Fluentd allows you to leverage the Ruby ternary operator when the enable_ruby
option is enabled. This operator allows you to write concise conditional statements, facilitating Fluentd's ability to make decisions based on specified conditions. In this section, you'll use the ternary operator to check if the status
field equals 200
. If the condition is met, Fluentd will add an is_successful
field with a value of true
; otherwise, it will be set to false
.
First, open your /etc/fluent/fluentd.conf
configuration file:
sudo nano /etc/fluent/fluentd.conf
To implement this conditional statement, enter the following code:
...
<filter file.logs>
@type record_transformer
enable_ruby true
remove_keys emailAddress
<record>
hostname "#{Socket.gethostname}"
timestamp ${Time.at(record["timestamp"]).strftime("%Y-%m-%dT%H:%M:%S.%L%z")}
is_successful ${record["status"] == 200 ? "true" : "false"}
</record>
</filter>
...
In the code snippet above, you use the ternary operator to check if the status
field equals 200
. If the condition is true, the is_successful
field is assigned the value true
; conversely, if the condition is false, the is_successful
field is assigned the value false
.
Start Fluentd again:
sudo fluentd
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Task completed successfully","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:49:40.000+0000","hostname":"fluentd-host","is_successful":"true"}
{"status":200,"ip":"127.0.0.1","level":30,"msg":"Task completed successfully","pid":1896,"ssn":"407-01-2433","timestamp":"2023-09-22T05:49:43.000+0000","hostname":"fluentd-host","is_successful":"true"}
...
As you observe the log entries, you will notice the presence of the is_successful
field, indicating whether a log entry corresponds to a successful event(true
) or not (false
) based on the status
field value.
This addition of conditional statements in Fluentd provides a powerful way to manipulate log data and add context or flags to log entries based on specific conditions.
Redacting sensitive data with Fluentd
Even though you have removed the emailAddress
from the log messages, sensitive fields like the IP address and Social Security Number are still present. To ensure that personal information remains secure, you should redact this sensitive data, especially when it cannot be removed entirely.
Open your Fluentd configuration file again:
sudo nano /etc/fluent/fluentd.conf
You can redact the IP address and social security number with the following code:
...
<filter file.logs>
@type record_transformer
enable_ruby true
remove_keys emailAddress
<record>
hostname "#{Socket.gethostname}"
timestamp ${Time.at(record["timestamp"]).strftime("%Y-%m-%dT%H:%M:%S.%L%z")}
is_successful ${record["status"] == 200 ? "true" : "false"}
ip ${record["ip"].gsub(/(\d+\.\d+\.\d+\.\d+)/, 'REDACTED')}
ssn ${record["ssn"].gsub(/(\d{3}-\d{2}-\d{4})/, 'REDACTED')}
</record>
</filter>
...
The gsub()
method locates specific strings based on the provided regular expressions and replaces them with the text 'REDACTED'. The first gsub()
operation replaces IP addresses, and the second one replaces SSNs.
After saving these changes, run Fluentd with the following command:
sudo fluentd
{"status":200,"ip":"REDACTED","level":30,"msg":"Connected to database","pid":1896,"ssn":"REDACTED","timestamp":"2023-09-22T05:51:01.000+0000","hostname":"fluentd-host","is_successful":"true"}
{"status":200,"ip":"REDACTED","level":30,"msg":"Connected to database","pid":1896,"ssn":"REDACTED","timestamp":"2023-09-22T05:51:04.000+0000","hostname":"fluentd-host","is_successful":"true"}
...
You will observe that both the IP address and SSN fields have been successfully redacted from the log entries.
In scenarios where you have private information within the same string, like this:
{..., "privateInfo": "This is a sample message with SSN: 123-45-6789 and IP: 192.168.0.1"}
You can simultaneously selectively redact sensitive portions, such as the SSN and IP address:
...
privateInfo ${record["privateInfo"].gsub(/(\d{3}-\d{2}-\d{4})/, 'REDACTED').gsub(/(\d+\.\d+\.\d+\.\d+)/, 'REDACTED')}
...
Upon redaction, the output will resemble the following:
{...,privateInfo":"This is a sample message with SSN: REDACTED and IP: REDACTED"}
By effectively masking sensitive data, Fluentd enhances the security and privacy of your log entries. You can now stop Fluent and the logify.sh
script.
You can stop the logify.sh
by entering the following command in your terminal to obtain the process ID:
jobs -l | grep "logify"
[1]+ 2113 Running ./logify.sh &
Kill the program with the command that follows, and be sure to substitute the process ID:
kill -9 <2113>
In the next section, we will explore how to collect logs from Docker containers using Fluentd.
Collecting logs from Docker containers and centralizing logs
In this section, you will containerize the Bash script and use the Nginx hello world Docker image, which is preconfigured to generate JSON Nginx logs for each incoming request. Subsequently, you will employ a Fluentd container to collect logs from both containers and transmit them to Better Stack for monitoring and analysis.
Dockerizing the Bash script
In this section, you'll containerize the Bash script responsible for generating log data. Containerization allows us to encapsulate the script and its dependencies, ensuring consistency and portability across different environments.
First, ensure you are still in the log-processing-stack/logify
directory. Then, create a Dockerfile
that defines how the script should be included in the container:
nano Dockerfile
In your Dockerfile
, add the following instructions:
FROM ubuntu:latest
COPY . .
RUN chmod +x logify.sh
RUN mkdir -p /var/log/logify
RUN ln -sf /dev/stdout /var/log/logify/app.log
CMD ["./logify.sh"]
In this Dockerfile
, you use the latest version of Ubuntu as the base image. You then copy the script into the container, ensure it's executable, and create a directory for log files. Additionally, you set up a redirection mechanism that sends any data written to /var/log/logify/app.log
to the standard output. This configuration lets you conveniently view the container's logs using the docker logs
command. Finally, you specify that the script should be executed when the container is launched.
Next, move into the parent project directory:
cd ..
Create a docker-compose.yml
with your editor:
nano docker-compose.yml
Then, define the Bash script and Nginx services:
version: '3'
services:
logify-script:
build:
context: ./logify
container_name: logify
nginx:
image: betterstackcommunity/nginx-helloworld:latest
container_name: nginx
ports:
- '80:80'
In this Docker Compose configuration, you define two services: logify-script
and nginx
. The logify-script
service is built from the ./logify
directory context. The nginx
service uses a pre-built Nginx image, mapping port 80
on the host to port 80
within the container. it's essential to ensure that no other services currently use port 80
on the host to avoid port conflicts.
Now that you have defined the services, let's build the Docker images and create the containers:
docker compose up -d
The -d
flag starts the services in the background.
Check the status of the running containers using the following command:
docker compose ps
You should observe a "running" status under the "STATUS" column for both containers, similar to this:
NAME COMMAND SERVICE STATUS PORTS
logify "./logify.sh" logify-script running
nginx "/runner.sh nginx" nginx running 0.0.0.0:80->80/tcp, :::80->80/tcp
With the containers up and running, send HTTP requests to the Nginx service using curl
to generate log data:
curl http://localhost:80/?[1-5]
To view the logs generated by all running containers, use the following command:
docker compose logs
nginx | {"timestamp":"2023-09-22T07:51:22+00:00","pid":"8","remote_addr":"172.21.0.1","remote_user":"","request":"GET /?1 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1695369082.183"}
...
nginx | {"timestamp":"2023-09-22T07:51:22+00:00","pid":"8","remote_addr":"172.21.0.1","remote_user":"","request":"GET /?2 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1695369082.190"}
logify | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 1, "ssn": "407-01-2433", "timestamp": 1695369060}
...
logify | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 1, "ssn": "407-01-2433", "timestamp": 1695369093}
The output will display the logs generated by each service.
With the Bash script containerized and generating log data, the next step is configuring Fluentd to collect and centralize these logs for further analysis.
Defining the Fluentd service with Docker Compose
In this section, you will define a Fluentd service within the Docker Compose setup. This service will collect logs from the existing containers and forward them to Better Stack. To do this, you will create a Fluentd configuration file, containerize Fluentd, and deploy the Fluentd service.
First, you need to update the docker-compose.yml
file as follows:
version: '3'
services:
logify-script:
build:
context: ./logify
container_name: logify
links:
- fluentd
depends_on:
- fluentd
logging:
driver: "fluentd"
options:
tag: docker.logify
nginx:
image: betterstackcommunity/nginx-helloworld:latest
logging:
driver: json-file
container_name: nginx
ports:
- '80:80'
links:
- fluentd
depends_on:
- fluentd
logging:
driver: "fluentd"
options:
tag: docker.nginx
fluentd:
build:
context: ./fluentd
volumes:
- ./fluentd/fluent.conf:/fluentd/etc/fluent.conf
container_name: fluent
ports:
- "24224:24224"
- "24224:24224/udp"
In this updated configuration, the logify-script
and nginx
services are linked to the fluentd
service for log aggregation and forwarding. They are configured to use the Fluentd driver for logging. The logify-script
service tags log entries as docker.logify
, and the Nginx service tags log entries as docker.nginx
. These tags help Fluentd distinguish the source of log entries when processing them.
The fluentd
service is constructed using the fluentd
directory context and incorporates a volume that maps Fluentd's configuration file, (which we will create shortly). It is configured to expose the 24224
port for Fluentd's input and output operations. This comprehensive setup ensures that Fluentd efficiently processes the log entries from both the logify-script
and nginx
services.
Following that, create the fluentd
directory and move into it:
mkdir fluentd && cd fluentd
Now, create the Dockerfile
to customize the official Fluentd image:
nano Dockerfile
In your Dockerfile
, enter the following custom instructions:
FROM fluentd
USER root
RUN ["fluent-gem", "install", "fluent-plugin-logtail"]
USER fluent
This Dockerfile
uses the Fluentd base image and installs the fluent-plugin-logtail
gem, which enables Fluentd to forward logs to Better Stack. The user is set to fluent
to ensure Fluentd runs with appropriate permissions.
Next, create a fluent.conf
configuration file:
nano fluent.conf
Add the following input source in the file:
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
This configuration source specifies that Fluentd should listen for logs on port 24224
, accepting log data from services configured to send their logs to Fluentd using the forward input.
Now, you need to set up the destination to forward these logs. For this purpose, you will use Better Stack to centralize and manage the logs.
Before defining the output source, you'll need to create a free Better Stack account. Once you've signed in, navigate to the Sources section:
On the Sources page, click the Connect source button:
Now, provide your source a name and select "Fluentd" as the platform:
Once your source has been created, copy the Source Token field to your clipboard:
Return to the fluent.conf
file and add the match
directive to forward Docker logs to Better Stack.(Remember to update the source token):
...
<match docker.logify.**>
@type logtail
@id output_logify_logtail
source_token <your_logify_source_token>
flush_interval 2 # in seconds
</match>
The <match docker.logify.**>
tells Fluentd to match log entries with tags starting with docker.logify
and forward them to Better Stack using the logtail
plugin. You then provide a unique ID and your Better Stack source token. Finally, you set a flush interval of 2 seconds for log forwarding.
Once you've made these changes, save and exit the configuration file.
Return to the root directory:
cd ..
Start the Fluentd service with the following command:
docker compose up -d
After a few moments, check Better Stack to confirm if the log entries have been successfully forwarded:
Your Bash script logs are now being forwarded to Better Stack.
To forward Nginx and Fluentd logs, follow similar steps by creating two additional sources—one for Nginx logs and another for Fluentd logs.
When you create these sources, the interface will look like this:
Now, add the following match
directives to deliver Nginx and Fluentd logs to Better Stack, ensuring you update the source tokens accordingly:
...
<match docker.nginx.**>
@type logtail
@id output_nginx_logtail
source_token <your_logify_source_token>
flush_interval 2 # in seconds
</match>
<label @FLUENT_LOG>
<match fluent.*>
@type logtail
@id output_fluent_logtail
source_token <your_fluentd_source_token>
flush_interval 2 # in seconds
</match>
</label>
The <match docker.nginx.**>
directive uses the logtail
plugin to forward logs with tags starting with docker.nginx.
to Better Stack.
To match Fluentd internal logs, you define a <label @FLUENT_LOG>
and a matching condition <match fluent.\*\*>
to forward logs with tags starting with "fluent" to Better Stack. Fluentd typically produces internal logs with the tag "fluent:. These logs can be seen every time you start Fluentd.
After saving these changes, stop and discard all the services:
docker compose down
Start the services again:
docker compose up -d
Send more requests to the Nginx service:
curl http://localhost:80/?[1-5]
You'll notice the Nginx logs being successfully uploaded to Better Stack.
And Fluentd logs will look similar to this:
Monitoring Fluentd health with Better Stack
While Fluentd lacks a built-in /health
endpoint for external monitoring, it features a monitoring agent that collects internal metrics in JSON format and exposes them via a /api/plugins.json
endpoint.
To access internal Fluentd metrics via the REST API, first open the fluent.conf
configuration file:
nano fluentd/fluent.conf
Add these lines at the top of the fluent.conf
configuration file:
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
</source>
...
The <source>
sets up a Fluentd monitoring agent that exposes internal metrics on port 24220
.
Next, update the docker-compose.yml
file to define the port for the Fluentd internal metrics API endpoint:
fluentd:
...
ports:
- "24220:24220"
- "24224:24224"
- "24224:24224/udp
When done, restart Fluentd with the following command:
docker compose up -d
Verify that Fluentd's /api/plugins.json
endpoint works:
curl http://localhost:24220/api/plugins.json
{"plugins":[{"plugin_id":"object:8ac","plugin_category":"input","type":"monitor_agent","config":{"@type":"monitor_agent","bind":"0.0.0.0","port":"24220"},"output_plugin":false,"retry_count":null,"emit_records":0,"emit_size":0},
...
buffer_total_queued_size":0,"retry_count":0,"emit_records":3,"emit_size":0,"emit_count":3,"write_count":1,"rollback_count":0,"slow_flush_count":0,"flush_time_count":1429,"buffer_stage_length":0,"buffer_stage_byte_size":0,"buffer_queue_byte_size":0,"buffer_available_buffer_space_ratios":100.0,"retry":{}}]}
Next, log in to Better Stack.
Once you are on the Monitors page, click the Create monitor button:
Afterward, enter the relevant information and click the Create monitor button:
In this setup, you can choose your preferred method to trigger Better Stack and provide the server's IP address or domain name along with the /api/plugins.json
endpoint on port 24220
. Finally, select how you would like to be notified.
Once the configuration is complete, Better Stack will initiate monitoring of the Fluentd endpoint, delivering valuable performance statistics:
To demonstrate the response when Fluentd stops running, causing the endpoint to cease functioning, stop all the services with:
docker compose stop
Upon returning to Better Stack, you will observe the status updated to "Down" after a few moments pass:
If you have configured Better Stack to alert you via email, you will receive an email alert:
With that, you can proactively manage Fluentd's health and promptly address any interruptions in its operation.
Final thoughts
In this comprehensive article, you explored Fluentd and how it integrates seamlessly with Docker, Nginx, and Better Stack for effective log management. You began by creating a Fluentd configuration file, which you then followed with using Fluentd to gather logs from multiple containers and centralize them within Better Stack. Additionally, you learned how to monitor Fluentd's health using Better Stack, ensuring proactive alerts in case of any disruptions.
With this knowledge, you can use Fluentd effectively for log collection and forwarding. For further learning, refer to the Fluentd documentation. To deepen your understanding of Docker and Docker Compose, explore their respective documentation pages: Docker and Docker Compose. For additional insights into Docker logging, refer to our comprehensive guide.
If you're curious about Fluentd alternatives, check out our guide on log shippers.
Thanks for reading, and happy logging!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github