How to Collect, Process, and Ship Log Data with Fluent Bit

Stanley Ulili

Updated on January 9, 2024

Prerequisites
Developing a demo logging application
Installing Fluent Bit
How Fluent Bit works
Getting started with Fluent Bit
Transforming logs with Fluent Bit
Collecting logs from Docker containers and centralizing logs
Monitoring Fluent Bit health with Better Stack
Final thoughts

In distributed systems, efficient log shipping is essential. A log shipper is a tool that gathers logs from various sources, like containers and servers and directs them to a central location for analysis. Several options, including LogStash and Fluentd, are available for this purpose. Among them, Fluent Bit stands out as a lightweight, high-performance log shipper introduced by Treasure Data.

Fluent Bit was developed in response to the growing need for a log shipper that could operate in resource-constrained environments, such as embedded systems and containers. With a minimal memory footprint of 1MB, Fluent Bit efficiently collects logs from multiple sources, transforms the data, and forwards it to diverse destinations for storage and analysis. Key features of Fluent Bit include SQL Stream Processing, backpressure handling, Vendor-Neutral, and Apache 2 Licensed. Fluent Bit also shines with its flexibility because of the pluggable architecture, supporting easy integration and customization. With over 100 built-in plugins, it offers extensive options for collecting, filtering, and forwarding data.

Fluent Bit's reliability is underscored by its adoption by major cloud providers like DigitalOcean, AWS Cloud, and Google Cloud, processing vast amounts of data daily.

In this comprehensive guide, you will use Fluent Bit to gather logs from diverse sources, transform, and deliver them to various destinations. The tutorial will walk you through reading logs from a file and forwarding them to the console. Subsequently, you will explore how Fluent Bit can collect logs from multiple containers and route them to a centralized location. Finally, you will monitor Fluent Bit's health to ensure its smooth operation.

Prerequisites

To follow this guide, you need access to a system that has a non-root user account with sudo privileges. Optionally, you should install Docker and Docker Compose if you intend to follow along with later parts of this tutorial that involve collecting logs from Docker containers. If you're uncertain about the need for a log shipper, you can read this article on log shippers to understand their benefits, how to choose one, and compare a few options.

Once you've met these prerequisites, create a root project directory that will contain the application and configuration files with the following command:

Copied!

mkdir log-processing-stack

Move into the newly created directory:

Copied!

cd log-processing-stack

Next, create a subdirectory named logify for the demo application you'll be building in the upcoming section:

Copied!

mkdir logify

Change into the subdirectory:

Copied!

cd logify

With these directories in place, you're ready to proceed to the next step, where you'll create the demo logging application.

The fastest log
search on the planet

Better Stack lets you see inside any stack, debug any issue, and resolve any incident.

Developing a demo logging application

In this section, you'll create a sample logging script using Bash that generates log entries at regular intervals and writes them to a file.

Create a logify.sh file within the logify directory. You can use your preferred text editor. This tutorial uses nano:

Copied!

nano logify.sh

In your logify.sh file, enter the following contents to generate log entries with Bash:

log-processing-stack/logify/logify.sh

Copied!

#!/bin/bash
filepath="/var/log/logify/app.log"

create_log_entry() {
    local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
    local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
    local http_status_code=200
    local ip_address="127.0.0.1"
    local emailAddress="user@mail.com"
    local level=30
    local pid=$$
    local ssn="407-01-2433"
    local time=$(date +%s)
    local log='{"status": "'$http_status_code'", "ip": "'$ip_address'", "level": '$level', "emailAddress": "'$emailAddress'", "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "timestamp": '$time'}'
    echo "$log"
}

while true; do
    log_record=$(create_log_entry)
    echo "${log_record}" >> "${filepath}"
    sleep 3
done

The create_log_entry() function generates log entries in JSON format and includes various details such as HTTP status codes, severity levels, and random log messages. It also intentionally includes sensitive fields like IP address, Social Security Number (SSN), and email address to demonstrate Fluent Bit's ability to remove or redact sensitive data. To learn more about best practices for logging sensitive data, refer to our guide.

Next, the infinite loop continuously invokes the create_log_entry() function to generate a log record every 3 seconds and append them to a specified file in the /var/log/logify/ directory.

When you are finished, save the new changes and make the script executable:

Copied!

chmod +x logify.sh

Create a directory to store the application logs:

Copied!

sudo mkdir /var/log/logify

Assign ownership of the directory to the currently logged-in user:

Copied!

sudo chown -R $USER:$USER /var/log/logify/

Then, run the Bash script in the background:

Copied!

./logify.sh &

The script will start writing logs to the app.log file. To view the last few log entries, use the tail command:

Copied!

tail -n 4 /var/log/logify/app.log

Output

{"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 2833, "ssn": "407-01-2433", "timestamp": 1696071877}
{"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 2833, "ssn": "407-01-2433", "timestamp": 1696071880}
{"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 2833, "ssn": "407-01-2433", "timestamp": 1696071883}
{"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 2833, "ssn": "407-01-2433", "timestamp": 1696071886}

Each line in the output represents a log event or record.

With the log entries being generated, the next step is to install Fluent Bit.

Installing Fluent Bit

In this section, you'll install the latest version of Fluent Bit on your Ubuntu 22.04 system. If you're using a different operating system, refer to the official documentation page for specific installation instructions.

Fluent Bit is not available in Ubuntu's default package repositories.

To install Fluent Bit on your system, first, add Fluent Bit GPG Keyset:

Copied!

sudo sh -c 'curl https://packages.fluentbit.io/fluentbit.key | gpg --dearmor > /usr/share/keyrings/fluentbit-keyring.gpg'

Next, check your Ubuntu code name:

Copied!

lsb_release -a

Output

No LSB modules are available.
Distributor ID: Ubuntu
Description:  Ubuntu 22.04.3 LTS
Release:  22.04
Codename: jammy

Then export the codename as an environment variable:

Copied!

export CODENAME="jammy"

Following that, add the Fluent Bit source list to the sources.list.d directory:

Copied!

echo "deb [signed-by=/usr/share/keyrings/fluentbit-keyring.gpg] https://packages.fluentbit.io/ubuntu/$CODENAME/ \
  $CODENAME main" | sudo tee /etc/apt/sources.list.d/fluentbit.list

apt will search for new sources in the sources.list.d directory.

To ensure that apt recognises the Fluent Bit source you just added, update your package list using the following command:

Copied!

sudo apt update

Then install Fluent Bit:

Copied!

sudo apt install fluent-bit

Fluent Bit is now installed on your system.

How Fluent Bit works

Fluent Bit operates as a robust pipeline for handling log data. You can imagine it as a sequence where logs flow through distinct stages, each performing a specific task. Let's break down Fluent Bit's core components and plugins to provide a clearer understanding:

Diagram illustrating the Fluent Bit observability pipeline

At the beginning of the pipeline, Fluent Bit collects logs from various sources. These logs then pass through a Parser, transforming unstructured data into structured log events. Subsequently, the log event stream encounters the Filter, which can enrich, exclude, or modify the data according to project requirements. After filtration, the logs are temporarily stored in a Buffer, either in memory or the filesystem, ensuring smooth processing. Finally, the Router directs the data to diverse destinations for analysis and storage.

To put this into practice, you can define Fluent Bit's behavior in a configuration file located at /etc/fluent-bit/fluent-bit.conf:

Copied!

[SERVICE]
    ...

[INPUT]
    ...

[FILTER]
    ...

[OUTPUT]
    ...

Let's look at these components in detail:

[SERVICE]: contains global settings for the running service.
[INPUT]: specifies sources of log records for Fluent Bit to collect.
[FILTER]: applies transformations to log records.
[OUTPUT]: determines the destination where Fluent Bit sends the processed logs.

For these components to do their tasks, they require a plugin. Here is a brief overview of the plugins available for Fluent Bit.

Fluent Bit input plugins

For the [INPUT] component, the following are some of input plugins that can come in handy:

tail: monitors and collects logs from the end of a file, akin to the tail -f command.
syslog: gathers Syslog logs from a Unix socket server.
http: captures logs via a REST endpoint.
opentelemetry: fetches telemetry data from OpenTelemetry sources.

Fluent Bit filter plugins

When you need to transform logs, Fluent Bit provides a range of filter plugins suited for different modifications:

record_modifier: modifies log records.
lua: alters log records using Lua scripts.
grep: matches or excludes log records, similar to the grep command.
modify: changes log records based on specified conditions or rules.

Fluent Bit output plugins

To dispatch logs to various destinations, Fluent Bit offers versatile output plugins:

file: write logs to a specified file.
amazon_s3: sends logs, metrics to Amazon S3.
http: pushes records to an HTTP endpoint.
websocket: forwards log records to a WebSocket endpoint.

Now that you have a rough idea of how Fluent Bit works, you can proceed to the next section to start using Fluent Bit.

Getting started with Fluent Bit

In this section, you will configure Fluent Bit to read logs from a file using the tail input plugin and display them in the console.

First, open the Fluent Bit configuration file located at /etc/fluent-bit/fluent-bit.conf using the following command:

Copied!

sudo nano /etc/fluent-bit/fluent-bit.conf

Clear the existing contents of the file and add the following configuration code:

/etc/fluent-bit/fluent-bit.conf

Copied!

[SERVICE]
    Flush        1
    Daemon       off
    Log_Level    debug

[INPUT]
    Name         tail
    Path         /var/log/logify/app.log
    Tag          filelogs

[OUTPUT]
    Name         stdout
    Match        filelogs

The [SERVICE] defines global settings for Fluent Bit. It specifies that Fluent Bit should flush every 1 second, run in the foreground, and set the log level to debug.

The [INPUT] uses the tail plugin to read logs from the specified file at /var/log/logify/app.log. The Tag allows other Fluent Bit components, such as [FILTER] and [OUTPUT], to identify these log records.

The [OUTPUT] component uses the stdout plugin to forward logs to the console. The Match parameter ensures only logs with thefilelogs tag are delivered to the console.

After making these changes, save the file.

Next, validate your configuration file for errors:

Copied!

sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf  --dry-run

Output

Fluent Bit v2.1.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/09/30 11:11:05] [ info] Configuration:
[2023/09/30 11:11:05] [ info]  flush time     | 1.000000 seconds
[2023/09/30 11:11:05] [ info]  grace          | 5 seconds
[2023/09/30 11:11:05] [ info]  daemon         | 0
[2023/09/30 11:11:05] [ info] ___________
[2023/09/30 11:11:05] [ info]  inputs:
[2023/09/30 11:11:05] [ info]      tail
[2023/09/30 11:11:05] [ info] ___________
[2023/09/30 11:11:05] [ info]  filters:
[2023/09/30 11:11:05] [ info] ___________
[2023/09/30 11:11:05] [ info]  outputs:
[2023/09/30 11:11:05] [ info]      stdout.0
[2023/09/30 11:11:05] [ info] ___________
[2023/09/30 11:11:05] [ info]  collectors:
configuration test is successful

If the output displays "configuration test is successful", your configuration file is valid and error-free.

In the logify directory, run the Bash program in the background:

Copied!

./logify.sh &

Now, start Fluent Bit, specifying the path to the configuration file:

Copied!

sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf

The -c option takes the path to the Fluent Bit configuration file.

When Fluent Bit starts, you should see an output similar to the following:

Output

...
[2023/09/30 11:13:06] [debug] [input:tail:tail.0] scanning path /var/log/logify/app.log
[2023/09/30 11:13:06] [debug] [input:tail:tail.0] inode=255633 with offset=35483 appended as /var/log/logify/app.log
[2023/09/30 11:13:06] [debug] [input:tail:tail.0] scan_glob add(): /var/log/logify/app.log, inode 255633
[2023/09/30 11:13:06] [debug] [input:tail:tail.0] 1 new files found on path '/var/log/logify/app.log'
[2023/09/30 11:13:06] [debug] [stdout:stdout.0] created event channels: read=29 write=30
[2023/09/30 11:13:06] [ info] [sp] stream processor started
[2023/09/30 11:13:06] [debug] [input:tail:tail.0] inode=255633 file=/var/log/logify/app.log promote to TAIL_EVENT
[2023/09/30 11:13:06] [ info] [input:tail:tail.0] inotify_fs_add(): inode=255633 watch_fd=1 name=/var/log/logify/app.log
[2023/09/30 11:13:06] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2023/09/30 11:13:06] [ info] [output:stdout:stdout.0] worker #0 started
[2023/09/30 11:13:09] [debug] [input:tail:tail.0] inode=255633, /var/log/logify/app.log, events: IN_MODIFY
[2023/09/30 11:13:09] [debug] [input chunk] update output instances with new chunk size diff=207, records=1, input=tail.0
[2023/09/30 11:13:09] [debug] [task] created task=0x7f59c2833f80 id=0 OK
[2023/09/30 11:13:09] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0

Following that, you will see the log messages appear:

Output

[0] filelogs: [[1696072389.439042696, {}], {"log"=>"{"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 2833, "ssn": "407-01-2433", "timestamp": 1696072389}"}]
...
[0] filelogs: [[1696072392.449147983, {}], {"log"=>"{"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 2833, "ssn": "407-01-2433", "timestamp": 1696072392}"}]

Fluent Bit is now displaying the log messages along with additional context. You can exit Fluentd by pressing CTRL + C.

Transforming logs with Fluent Bit

When collecting logs with Fluent Bit, processing them to enhance their utility is often necessary. Fluent Bit provides a powerful array of filter plugins designed to transform event streams effectively. In this section, we will explore various essential log transformation tasks:

Parsing JSON logs.
Removing unwanted fields.
Adding new fields.
Converting Unix timestamps to the ISO format.
Maskng sensitive data.

Parsing JSON logs with Fluent Bit

When working with logs generated in JSON format, it's crucial to parse them accurately. This ensures the data maintains its integrity and adheres to the expected structure. This section will focus on parsing JSON log records as valid JSON to provide a well-defined structure.

To do that, lets examine a log event from the last section in detail:

Output

[0] filelogs: [[1696072392.449147983, {}], {"log"=>"{"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 2833, "ssn": "407-01-2433", "timestamp": 1696072392}"}]

Upon close inspection, you will see that Fluent Bit adds key=value pairs, and the data here needs a consistent JSON structure.

You can create a Parser to parse logs as JSON in Fluent Bit.

In your text editor, create a parser_json.conf file:

Copied!

sudo nano /etc/fluent-bit/parser_json.conf

In your parsers.conf file, add the following code:

/etc/fluent-bit/parser_json.conf

Copied!

...
[PARSER]
    Name         json_parser
    Format       json

The [PARSER] component takes the parser's name and the format in which log events should be parsed, which is json here.

In the Fluent Bit configuration file /etc/fluent-bit/fluent-bit.conf, make the following modifications:

/etc/fluent-bit/fluent-bit.conf

Copied!

[SERVICE]
    Flush        1
    Daemon       off
    Log_Level    debug
    Parsers_File parser_json.conf

[INPUT]
    Name         tail
    Path         /var/log/logify/app.log
    Parser       json_parser
    Tag          filelogs

[OUTPUT]
    Name         stdout
    format       json
    Match        filelogs

The Parsers_File parameter references the parser_json.conf file, which defines the json_parser for parsing JSON logs.

In the [INPUT] component, you add the Parser parameter with the value json_parser. This specifies that the incoming logs should be parsed using the JSON parser defined in parser_json.conf.

Finally, in the [OUTPUT] section, you set the format parameter to json, ensuring that the logs forwarded to the output are in the JSON format.

After making these changes, save the configuration file and restart Fluent Bit using the following command:

Copied!

sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf

Output

[{"date":1696075163.805419,"status":"200","ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Operation finished","pid":2833,"ssn":"407-01-2433","timestamp":1696075163}]
...
[{"date":1696075166.815878,"status":"200","ip":"127.0.0.1","level":30,"emailAddress":"user@mail.com","msg":"Initialized application","pid":2833,"ssn":"407-01-2433","timestamp":1696075166}]

You can now observe that the logs are formatted in the JSON format.

Now, you can stop Fluent Bit with CTRL + C.

You have learned how to parse incoming JSON logs correctly. Fluent Bit provides various parsers to handle diverse log formats:

regex: uses regular expressions to parse log events
logfmt: parses log records which are in the Logfmt format.
lstv: parse log events in the LSTV format format.

These parsing methods offer flexibility, allowing Fluent Bit to handle many log formats efficiently.

Now that you can parse the JSON logs, you will alter the log records attribute in the next section.

Adding and removing fields with Fluent Bit

In this section, you'll customize log records by removing sensitive data and adding new fields. Precisely, you will remove the emailAddress field due to its sensitive nature and add a hostname field to enhance log context.

Open your Fluent Bit configuration file in your text editor:

Copied!

sudo nano /etc/fluent-bit/fluent-bit.conf

Integrate the following [FILTER] component into your configuration:

/etc/fluent-bit/fluent-bit.conf

Copied!

[SERVICE]
    Flush        1
    Daemon       off
    Log_Level    debug
    Parsers_File parser_json.conf

[INPUT]
    Name         tail
    Path         /var/log/logify/app.log
    Tag          filelogs

[FILTER]
    Name record_modifier
    Match filelogs
    Remove_key     emailAddress
    Record hostname ${HOSTNAME}

[OUTPUT]
    Name         stdout
    format       json
    Match        filelogs

In the [FILTER] component, the name parameter denotes that the record_modifier plugin is being used. To exclude the emailAddress field, you use the Remove_key parameter. The Record parameter also introduces a new field called hostname, which is automatically populated with the system's hostname information.

Save your changes and restart Fluent Bit to apply the modifications:

Copied!

sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf

When Fluent Bit runs, you will observe the log events without the emailAddress field, and the hostname field will be incorporated into the log events:

Output

[{"date":1696075326.2961,"status":"200","ip":"127.0.0.1","level":30,"msg":"Task completed successfully","pid":2833,"ssn":"407-01-2433","timestamp":1696075326,"hostname":"fluent-bit-host"}]
...
[{"date":1696075329.308298,"status":"200","ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":2833,"ssn":"407-01-2433","timestamp":1696075329,"hostname":"fluent-bit-host"}]
...

That takes care of removing fields and adding new fields. In the next section, you will format the timestamps.

Formatting dates with Fluent Bit

The Bash generates logs with a Unix timestamp, representing the number of seconds that elapsed since January 1st, 1970, at 00:00:00 UTC. While these timestamps are precise, they aren't user-friendly. As a result, you'll convert them into the more human-readable ISO format.

At the time of writing, it isn't easy to do this with existing plugins. A better option is to use a Lua script to perform the conversion and reference it in the configuration file using the lua plugin.

In your /etc/fluent-bit/ directory, create the convert_timestamp.lua file:

Copied!

sudo nano /etc/fluent-bit/convert_timestamp.lua

Next, add the following code to convert the timestamp field from Unix timestamp to ISO format:

/etc/fluent/convert_timestamp.lua

Copied!

function append_converted_timestamp(tag, timestamp, record)
  new_record = record
  new_record["timestamp"] = os.date("!%Y-%m-%dT%TZ", record["timestamp"])
  return 2, timestamp, new_record
end

The append_converted_timestamp() function creates a new record and sets the timestamp field to the value returned by the os.date() method, configured to format dates into the ISO format.

Save and exit your file. Open the Fluent Bit configuration:

Copied!

sudo nano /etc/fluent-bit/fluent-bit.conf

Update the configuration to include the Lua script in the [FILTER] component:

/etc/fluent-bit/fluent-bit.conf

Copied!

...
[FILTER]
    Name lua
    Match filelogs
    Script convert_timestamp.lua
    Call append_converted_timestamp

[OUTPUT]
    Name         stdout
    format       json
    Match        filelogs

The [FILTER] component uses the lua plugin to modify log records dynamically. The Script parameter holds the path to the Lua script file. Meanwhile, the Call parameter specifies the function within the Lua script that will be invoked to perform the conversion.

Upon saving the file, start Fluent Bit:

Copied!

sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf

Fluent Bit will yield output similar to the following:

Output

[{"date":1696075449.671689,"ip":"127.0.0.1","pid":2833,"ssn":"407-01-2433","timestamp":"2023-09-30T12:04:09Z","hostname":"fluent-bit-host","msg":"Initialized application","level":30,"status":"200"}]
...
[{"date":1696075455.691909,"ip":"127.0.0.1","pid":2833,"ssn":"407-01-2433","timestamp":"2023-09-30T12:04:15Z","hostname":"fluent-bit-host","msg":"Operation finished","level":30,"status":"200"}]

The timestamp field is now in a human-readable ISO format. This change will improve the readability of your logs to understand when they occur.

Working with conditional statements in Fluent Bit

While Fluent Bit doesn't natively support conditional statements, you can achieve similar functionality by leveraging the modify plugin. In this section, you'll learn how to check if the status field equals 200 and add an is_successful field set to true when this condition is met.

First, open your /etc/fluent-bit/fluent-bit.conf configuration file:

Copied!

sudo nano /etc/fluent-bit/fluent-bit.conf

Inside the file, add the following [FILTER] component:

/etc/fluent-bit/fluent-bit.conf

Copied!

...
[FILTER]
    Name modify
    Match filelogs
    Condition Key_Value_Equals status "200"
    Add is_successful true

[OUTPUT]
    Name         stdout
    Match        filelogs

The modify plugin provides the Condition parameter with a Key_Value_Equals option that checks if the status field value equals "200". If the condition is met, the Add option appends an is_successful field to the log event.

Save the configuration file and start Fluent Bit:

Copied!

sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf

Output

[{"date":1696075554.97921,"hostname":"fluent-bit-host","level":30,"msg":"Operation finished","timestamp":"2023-09-30T12:05:54Z","pid":2833,"ssn":"407-01-2433","ip":"127.0.0.1","status":"200","is_successful":"true"}]
...
[{"date":1696075557.990212,"hostname":"fluent-bit-host","level":30,"msg":"Operation finished","timestamp":"2023-09-30T12:05:57Z","pid":2833,"ssn":"407-01-2433","ip":"127.0.0.1","status":"200","is_successful":"true"}]

You will now see the is_successful field, indicating the outcomes where the status field equals 200.

Masking sensitive data with Fluentd

In the earlier steps, you successfully removed the emailAddress from the log records, yet sensitive fields like IP addresses and Social Security Numbers remain. For personal information safety in the logs, it's crucial to mask this data. This becomes especially pertinent when sensitive details are part of a field that can't be entirely removed.

While many built-in plugins redact entire value fields, using a Lua script is the best solution since you can easily specify and selectively mask specific data portions.

Create a redact.lua script with your text editor:

Copied!

sudo nano /etc/fluent-bit/redact.lua

Add the following code to the redact.lua script:

/etc/fluent-bit/redact.lua

Copied!

-- Function to redact SSNs and IP addresses in any field
function redact_sensitive_portions(record)
    local redacted_record = {}  -- Initialize a new table for the redacted record

    for key, value in pairs(record) do
        local redacted_value = value  -- Initialize redacted_value with the original value

        -- Redact SSNs
        redacted_value, _ = string.gsub(redacted_value, '%d%d%d%-%d%d%-%d%d%d%d', 'REDACTED')

        -- Redact IP addresses
        redacted_value, _ = string.gsub(redacted_value, '%d+%.%d+%.%d+%.%d+', 'REDACTED')

        redacted_record[key] = redacted_value  -- Add the redacted value to the new table
    end

    return redacted_record
end

-- Entry point for Fluent Bit filter
function filter(tag, timestamp, record)
    local redacted_record = redact_sensitive_portions(record)
    return 1, timestamp, redacted_record
end

-- Return the filter object
return {
    filter = filter
}

In this code snippet, the redact_sensitive_portions() function iterates through each field, using the string.gsub() method to locate and replace IP addresses and Social Security Numbers with the text "REDACTED".

The filter() function acts as the entry point. It calls the redact_sensitive_portions function to mask sensitive portions within the log record. After processing, the modified record is returned through the filter object.

Now, open your Fluent Bit configuration file:

Copied!

sudo nano /etc/fluent-bit/fluent-bit.conf

Add the [FILTER] component to reference the redact.lua script:

/etc/fluent-bit/fluent-bit.conf

Copied!

...
[FILTER]
    Name lua
    Match filelogs
    Script redact.lua
    Call filter

[OUTPUT]
    Name         stdout
    format       json
    Match        filelogs

The [FILTER] component references the redact.lua file, and the CALL parameter invokes the filter function as the entry point.

When you are done, start Fluent Bit:

Copied!

sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf

Output

[{"date":1696075807.732392,"ip":"REDACTED","level":"30","msg":"Connected to database","is_successful":"true","timestamp":"2023-09-30T12:10:07Z","hostname":"fluent-bit-host","pid":"2833","ssn":"REDACTED","status":"200"}]
...
[{"date":1696075810.743,"ip":"REDACTED","level":"30","msg":"Initialized application","is_successful":"true","timestamp":"2023-09-30T12:10:10Z","hostname":"fluent-bit-host","pid":"2833","ssn":"REDACTED","status":"200"}]

The IP address and SSN have now been masked. In scenarios where a field contains both the IP address and SSN like this:

Output

{..., "privateInfo": "This is a sample message with SSN: 123-45-6789 and IP: 192.168.0.1"}

Fluent Bit will redact the sensitive portions only:

Output

{...,privateInfo":"This is a sample message with SSN: REDACTED and IP: REDACTED"}

Lets now stop the logify.sh. To do that, you will need program's process ID:

Copied!

jobs -l | grep "logify"

Output

[1]+  2833 Running                 ./logify.sh &

Then, terminate the program with the kill command and ensure the process ID has been substituted.

Copied!

kill -9 <2833>

Now that you can mask sensitive portions, you can move on to collecting logs from docker containers.

Collecting logs from Docker containers and centralizing logs

In this section, you'll containerize the Bash program and use an Nginx hello world Docker image, which has been preconfigured to generate JSON Nginx logs upon each incoming request. Subsequently, you will deploy a Fluent Bit container to collect logs from Bash and Nginx containers and forward them to Better Stack for centralization.

Dockerizing the Bash script

Containerization lets you encapsulate the script and its dependencies, which makes it portable across different environments.

To containerize the Bash program, ensure you are still in the log-processing-stack/logify directory. After that, create a Dockerfile, which will contain instructions on how to build the image.

Copied!

nano Dockerfile

In your Dockerfile, add the following lines of code:

log-processing-stack/logify/Dockerfile

Copied!

FROM ubuntu:latest

COPY . .

RUN chmod +x logify.sh

RUN mkdir -p /var/log/logify

RUN ln -sf /dev/stdout /var/log/logify/app.log

CMD ["./logify.sh"]

In this Dockerfile, you start with the recent version of Ubuntu as the base image. You then copy the script into the container, make it executable, and create a directory where the application will write the logs. You then redirect all the log data written to /var/log/logify/app.log to the standard output. And finally, you specify the command to run when the container starts.

Now, move into the parent project directory:

Copied!

cd ..

Create a docker-compose.yml:

Copied!

nano docker-compose.yml

Now define the Bash Script and Nginx services:

log-processing-stack/docker-compose.yml

Copied!

version: '3'
services:
  logify-script:
    build:
      context: ./logify
    container_name: logify
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    container_name: nginx
    ports:
      - '80:80'

In this configuration file, you create the logify-script and nginx services. The logify-script service gets built from the ./logify directory context. The nginx service uses the pre-built Nginx image, and you then map port 80 on the host to port 80 within the container. Make sure no other application uses port 80 to avoid conflicts.

Next, build the Bash program Docker image and create the containers:

Copied!

docker compose up -d

The -d flag puts the services in the background.

To see if the containers are running, type the following:

Copied!

docker compose ps

The text "running" will be displayed under the "STATUS" column for both containers resembling this:

Output

NAME                COMMAND              SERVICE             STATUS              PORTS
logify              "./logify.sh"        logify-script       running
nginx               "/runner.sh nginx"   nginx               running             0.0.0.0:80->80/tcp, :::80->80/tcp

Now that the containers are running, send HTTP requests to the Nginx service using curl to generate logs:

Copied!

curl http://localhost:80/?[1-5]

Then, view the logs with the following command:

Copied!

docker compose logs

Output

logify  | {"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 1, "ssn": "407-01-2433", "timestamp": 1696077723}
logify  | {"status": "200", "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 1, "ssn": "407-01-2433", "timestamp": 1696077726}
...
nginx   | {"timestamp":"2023-09-30T12:41:53+00:00","pid":"8","remote_addr":"172.18.0.1","remote_user":"","request":"GET /?1 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1696077713.858"}
nginx   | {"timestamp":"2023-09-30T12:41:53+00:00","pid":"8","remote_addr":"172.18.0.1","remote_user":"","request":"GET /?2 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1696077713.863"}

You will see the logs from the Nginx and Bash program containers in the output.

With both services running and generating log data, it's time to collect these logs using Fluent Bit.

Defining the Fluent Bit service with Docker Compose

In the Docker Compose configuration, you will now integrate a Fluent Bit service to collect logs from the active containers and centralize them to Better Stack. You will define a Fluent Bit configuration, containerize Fluent Bit, and set up the Fluent Bit service.

Begin by opening the docker-compose.yml file:

Copied!

nano docker-compose.yml

Add the following code to the docker-compose.yml file to define the Fluent Bit service:

log-processing-stack/docker-compose.yml

Copied!

version: '3'
services:
  logify-script:
    build:
      context: ./logify
    container_name: logify
    logging:
      driver: "fluentd"
      options:
        tag: docker.logify
        fluentd-address: 127.0.0.1:24224
    depends_on:
      - fluent-bit
    links:
      - fluent-bit
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    container_name: nginx
    ports:
      - '80:80'
    logging:
      driver: "fluentd"
      options:
        tag: docker.nginx
        fluentd-address: 127.0.0.1:24224
    depends_on:
      - fluent-bit
    links:
      - fluent-bit

  fluent-bit:
    image: fluent/fluent-bit:latest
    volumes:
      - ./fluent-bit:/fluent-bit/etc
      - /var/run/docker.sock:/var/run/docker.sock
    command: ["fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.conf"]
    container_name: fluent-bit
    ports:
      - "24224:24224

In the updated docker-compose.yml configuration, the logify-script and nginx services are linked to the fluent-bit service and depend on it. Both services are configured to use the fluentd driver for logging, and the fluentd-address specifies the address to which Docker will send the logs. Tags are added to each container; the logify-script service is tagged as docker.logify, and the Nginx service tag is docker.nginx. These tags will help identify the source of the Docker logs.

The fluent-bit service uses the pre-built `fluent/fluent-bit image and incorporates volume mappings for Fluent Bit's configuration file (which will be created shortly). The command parameter specifies the execution of fluent-bit.conf when the container starts. Additionally, port 24224 is exposed to receive logs from other containers.

Next, create the fluent-bit directory and navigate into it:

Copied!

mkdir fluent-bit && cd fluent-bit

Following that, create a fluent-bit.conf file with your text editor:

Copied!

nano fluent-bit.conf

Define the [INPUT] section to listen for logs on port 24224 using the forward plugin:

log-processing-stack/fluent-bit/fluent-bit.conf

Copied!

[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224

The [INPUT] configuration uses the forward plugin to receive logs sent by services through port 24224.

Next, you will set up the destination to forward the logs. To centralize the logs, you will use Better Stack.

First, create a free Better Stack account. And when you are logged in, visit the Sources section:

Screenshot pointing to the **Sources** link

Once on the Sources page, click the Connect source button:

Screenshot indicating the **Connect source** button

Enter a source name (e.g., "Logify logs") and select "Fluent-bit" as the platform:

Screenshot showing the name field filled as "Logify logs" and the Platform set to "Fluent-bit"

Once the source is created, copy the Source Token field to the clipboard:

Screenshot with an arrow pointing to the "Source Token" field

Return to the fluent-bit.conf file and add the [OUTPUT] component at the end of the file to deliver Docker logs to Better Stack. Make sure to update the source token:

log-processing-stack/fluent-bit/fluent-bit.conf

Copied!


[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224

[OUTPUT]
    name    http
    match   docker.logify
    tls     On
    host    in.logs.betterstack.com
    port    443
    uri     /
    header  Authorization Bearer <your_logify_source_token>
    header  Content-Type application/msgpack
    format  msgpack
    retry_limit 5

In the [OUTPUT] component, Fluent Bit matches log entries tagged with docker.logify and forwards them to Better Stack using the http plugin. The tag is set in the docker-compose.yml file for the logify-script service, allowing Fluent Bit to identify the log entries correctly. The <your_logify_source_token> should be replaced with the source token obtained from Better Stack during source creation.

After adding the configuration, save and exit the file. Return to the project's root directory using the following command:

Copied!

cd ..

Start the newly configured Fluent Bit service using Docker Compose:

Copied!

docker compose up -d

Check Better Stack to verify if the log entries are being successfully delivered. You should see the log entries uploading to Better Stack's interface:

Screenshot displaying the log entries uploading to Better Stack

For the Nginx logs, follow similar steps. Create a new source for Nginx logs on Better Stack. After creating the source, the interface will look like this:

Screenshot of Better Stack with two sources: Logify, and Nginx

Obtain the source token and update the [OUTPUT] component in the fluent-bit.conf file to match and forward Nginx logs.

log-processing-stack/fluent-bit/fluent-bit.conf

Copied!

...
[OUTPUT]
    name    http
    match   docker.nginx
    tls     On
    host    in.logs.betterstack.com
    port    443
    uri     /
    header  Authorization Bearer <your_bash_source_token>
    header  Content-Type application/msgpack
    format  msgpack
    retry_limit 5

After making the necessary changes, stop all services using the command:

Copied!

docker compose down

Start all the services again:

Copied!

docker compose up -d

Send more requests to the Nginx service:

Copied!

curl http://localhost:80/?[1-5]

The Nginx logs will be uploaded to Better Stack:

Screenshot of Nginx logs delivered to Better Stack

That takes care of centralizing data in Better Stack.

Monitoring Fluent Bit health with Better Stack

Fluent Bit provides a health endpoint that allows you to monitor Fluent Bit's health using external tools like Better Stack. Periodically, these tools send requests to determine if Fluent Bit is functioning correctly.

To enable this endpoint, open the Fluent Bit configuration file:

Copied!

nano fluent-bit/fluent-bit.conf

Add the following lines at the top of the file to enable Fluent Bit's health endpoint and configure its settings:

log-processing-stack/fluent-bit/fluent-bit.conf

Copied!

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020
    Health_Check On
    HC_Errors_Count 5
    HC_Retry_Failure_Count 5
    HC_Period 5
...

These configurations instruct Fluent Bit to start listening for requests on port 2020 to check its health status.

Next, update the docker-compose.yml file to expose the port that hosts the health endpoint:

log-processing-stack/docker-compose.yml

Copied!

    container_name: fluent-bit
    ports:
      - "2020:2020"
      - "24224:24224"

Now, start the Fluent Bit service with the updated changes:

Copied!

docker compose up -d

Verify the health endpoint is functioning:

Copied!

curl -s http://127.0.0.1:2020/api/v1/health

Output

ok

Next, log in to Better Stack.

On the Monitors page, click the Create monitor button:

Screenshot of the monitors page, providing an option to create a monitor

Then, select the suitable triggering option in Better Stack, your preferred notification preferences and input your server's IP address or domain name, followed by the /api/v1/health endpoint on port 2020. After that, click the Create monitor button:

Screenshot of Better Stack configured with the necessary options

Upon completion, Better Stack will regularly monitor Fluent Bit's health endpoint:

Screenshot of Better Stack monitoring `health` endpoint

Let's see what happens when Fluent Bit malfunctions. Halt all services using the command:

Copied!

docker compose stop

After a brief interval, check Better Stack. The status will transition to "Down":

Screenshot of Better Stack indicating that the health endpoint doesn't work

When there is an outage, Better Stack will promptly notify you. An email alert will be dispatched detailing the downtime, allowing you to proactively manage Fluent Bit's health and swiftly address the problem:

Screenshot of the email alert from Better Stack notifying of the endpoint's downtime

With these tools, you can proactively manage Fluent Bit's health and swiftly respond to operational interruptions.

Final thoughts

In this comprehensive article, you learned how Fluent Bit can be integrated with tools like Docker, Nginx, and Better Stack for managing logs. First, you created a Fluent Bit configuration to read logs from a file and display them in the output. You then employed Fluent Bit to collect logs from multiple Docker containers and centralize them on Better Stack. Finally, you set up a health endpoint to monitor Fluent Bit's health using Better Stack.

You can now effectively manage logs on your system using Fluent Bit. To delve deeper into Fluent Bit's capabilities, consult the documentation. Fluent Bit offers powerful features such as SQL stream processing, which you can explore further here. Additionally, to hone your skills in Docker and Docker Compose, refer to their respective documentation pages: Docker and Docker Compose. To gain insights into Docker logging, consult our comprehensive guide.

Thanks for reading, and happy logging!

Article by

Stanley Ulili

Stanley Ulili is a technical educator at Better Stack based in Malawi. He specializes in backend development and has freelanced for platforms like DigitalOcean, LogRocket, and AppSignal. Stanley is passionate about making complex topics accessible to developers.

Got an article suggestion? Let us know

How to Collect, Process, and Ship Log Data with Logstash

Learn how Logstash streamlines the collection, processing, and shipping of log data at scale, boosting observability and troubleshooting capabilities.

→

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us

Writer of the month

Marin Bezhanov

Marin is a software engineer and architect with a broad range of experience working...

Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github

How to Collect, Process, and Ship Log Data with Fluent Bit

Contents

Prerequisites

Developing a demo logging application

Installing Fluent Bit

How Fluent Bit works

Fluent Bit input plugins

Fluent Bit filter plugins

Fluent Bit output plugins

Getting started with Fluent Bit

Transforming logs with Fluent Bit

Parsing JSON logs with Fluent Bit

Adding and removing fields with Fluent Bit

Formatting dates with Fluent Bit

Working with conditional statements in Fluent Bit

Masking sensitive data with Fluentd

Collecting logs from Docker containers and centralizing logs

Dockerizing the Bash script

Defining the Fluent Bit service with Docker Compose

Monitoring Fluent Bit health with Better Stack

Final thoughts

Make your mark

Join the writer's program

Build on top of Better Stack