Back to Logging guides

How to Collect, Process, and Ship Log Data with Rsyslog

Stanley Ulili
Updated on November 23, 2023

Modern computing systems generate diverse log messages, encompassing vital information from system logs (including kernel and boot messages), applications, databases, and network services or daemons. These logs play a crucial role in troubleshooting and diagnosing issues when they arise, and are often effective when they have been centralized.

To centralize logs, you can use a log shipper, a tool designed to collect logs from various sources and forward them to diverse locations. Rsyslog is a prominent log shipper operating based on the syslog protocol.

Rsyslog ships with advanced features, such as filtering, and supports both TCP and UDP protocols for transporting messages. It can handle logs related to mail, authorizations, kernel messages, and more.

This comprehensive tutorial will guide you through using Rsyslog to collect, process, and forward log data to a central location. First, you will configure Rsyslog to read logs from a file. Next, you will explore how to process logs using Rsyslog. Following that, you will centralize the logs to another server where Rsyslog is operational. Finally, you will use a Rsyslog Docker container to collect logs from other containers.

Prerequisites

Before you begin, ensure access to a system with a non-root user account with sudo privileges. Additionally, certain parts of this tutorial involve using Docker and Docker Compose. If you intend to explore these tools, make sure you have Docker and Docker Compose installed on your system.

Once you've confirmed these prerequisites, create a directory to store your configuration files and applications:

 
mkdir log-processing-stack

Next, navigate into the newly created directory:

 
cd log-processing-stack

With the directory set up, you're ready to install Rsyslog.

Installing Rsyslog

Rsyslog is pre-installed on many systems and may sometimes need to be updated. It's considered best practice to install the latest version to ensure you have access to the most recent features and security enhancements.

Below are the installation instructions, tested on Ubuntu 22.04. For other systems, consult the Rsyslog documentation for installation guidelines.

First, install the latest version of Rsyslog:

 
sudo apt-get install rsyslog

If you see the message "rsyslog is already the newest version," it indicates that you have the latest version installed.

Confirm the installation and check the Rsyslog version with the following:

 
rsyslogd -v

You should see an output similar to this:

 
rsyslogd 8.2112.0 (aka 2021.12) compiled with:
    PLATFORM: x86_64-pc-linux-gnu
    PLATFORM (lsb_release -d):
    FEATURE_REGEXP: Yes
    GSSAPI Kerberos 5 support: Yes
    FEATURE_DEBUG (debug build, slow code): No
    32bit Atomic operations supported: Yes
    64bit Atomic operations supported: Yes
    memory allocator: system default
    Runtime Instrumentation (slow code): No
    uuid support: Yes
    systemd support: Yes
    Config file: /etc/rsyslog.conf
    PID file: /run/rsyslogd.pid
    Number of Bits in RainerScript integers: 64

See https://www.rsyslog.com for more information.

Additionally, ensure that the Rsyslog service is active and running:

 
systemctl status rsyslog

You should see the status as "active (running)," confirming that Rsyslog is operational:

 
● rsyslog.service - System Logging Service
     Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor prese>
     Active: active (running) since Fri 2023-10-27 08:33:14 UTC; 8min ago
TriggeredBy: ● syslog.socket
       Docs: man:rsyslogd(8)
             man:rsyslog.conf(5)
             https://www.rsyslog.com/doc/
   Main PID: 652 (rsyslogd)
      Tasks: 4 (limit: 2244)
     Memory: 2.9M
        CPU: 47ms
     CGroup: /system.slice/rsyslog.service
             └─652 /usr/sbin/rsyslogd -n -iNONE

Warning: some journal files were not opened due to insufficient permissions.

With Rsyslog successfully installed and running, let's understand how it works.

How Rsyslog works

Before delving into how Rsyslog collects application logs, it's essential to understand how it works with system logs.

Diagram showing daemons sending logs to Rsyslog and redirecting them to separate files

In your system, various applications like SSHD, mail clients/servers, and cron tasks generate logs at frequent intervals. These applications write log messages to the /dev/log file as if it were a regular file (pseudo device).

The Rsyslog daemon monitors this file, collecting logs as they are written, and redirects them to individual plain text files in the /var/log directory, including the /var/log/syslog file. Rsyslog can route logs to their appropriate files by inspecting header information, such as priority and message origin, which it uses for filtering.

The routing of these messages is based on rules defined in the 50-default.conf file, located in the /etc/rsyslog.d/ directory, which we'll explore shortly. Rsyslog operates with default configurations, whether freshly installed or already existing.

However, data originates from diverse sources, and these sources might lack rules in the default configurations.

Building on this knowledge, Rsyslog can be extended to collect logs from additional inputs and redirect them to various destinations, including remote ones, as illustrated in the diagram below:

Rsyslog diagram

To understand this process, imagine Rsyslog as a pipeline. On one end, Rsyslog collects inputs, transforms them, and forwards them to the other end—the destination.

This can be achieved with a custom configuration file in the /etc/rsyslog.d/ directory, structured as follows:

 
module(load="<module_name>")

# Collect logs
input(...)

# Modify logs
template(name="<template_name>") {}

# Redirect logs to the destination
action(type="<module_name>")

The main components include:

  • input: collects logs from various sources.
  • template: modifies the log message format.
  • action: delivers logs to different destinations.

Rsyslog uses modules extensively to accomplish its tasks.

Rsyslog inputs

Rsyslog features modules designed to collect logs from various sources, identifiable by names starting with the im prefix. Here are a few examples of these input modules:

  • imhttp: collects plaintext messages via HTTP.

  • imjournal: fetches system journal messages into Syslog.

  • imfile: reads text files and converts their contents into Syslog messages.

  • imdocker: collects logs from Docker containers using the Docker REST API.

Rsyslog Message Modification Modules

For modifying log messages, Rsyslog provides message modification modules typically prefixed with mm:

  • mmjsonparse: parses structured log messages conforming to the CEE/lumberjack spec.

  • mmfields: extracts specific fields from log entries.

  • mmkubernetes: adds Kubernetes metadata to each log event.

  • mmanon: anonymizes IP addresses for privacy.

Rsyslog output modules

Rsyslog offers a wide array of output modules, recognizable by names starting with the om prefix. These modules allow forwarding log messages to various destinations:

  • omfile: writes log entries to a file on the local system.

  • ommysql: sends log entries to a MySQL database.

  • omrabbitmq: forwards log data to RabbitMQ, a popular message broker.

  • omelasticsearch: delivers log output to Elasticsearch, a robust search and analytics engine.

Now that you have an idea of the available Rsyslog modules and what they do, let's analyze the Rsyslog configuration file in greater detail.

Understanding the Rsyslog configuration

When Rsyslog starts running on your system, it operates with a default configuration file. It collects logs from various processes and directs them to plain text files in the /var/log directories.

Rsyslog relies on rules predefined in the default configuration file. You can also define your own rules, global directives, or modules.

Rsyslog rules

To comprehend how rules work, open the 50-default.conf configuration file in your preferred text editor. This tutorial uses nano, a command-line text editor:

 
sudo nano /etc/rsyslog.d/50-default.conf

In the initial part of the file, you'll find contents similar to this (edited for brevity):

/etc/rsyslog.d/50-default.conf
...
auth,authpriv.*                 /var/log/auth.log
*.*;auth,authpriv.none          -/var/log/syslog
#cron.*                         /var/log/cron.log
#daemon.*                       -/var/log/daemon.log
kern.*                          -/var/log/kern.log
...

The lines in the file are rules. A rule comprises a filter for selecting log messages and an action specifying the path to send the logs. Lines starting with # are comments and won't be executed.

Consider this line:

 
kern.*                          -/var/log/kern.log

This line can be divided into a selector filtering syslog messages kern.* and an action specifying the path to forward the logs -/var/log/kern.log.

Let's examine the selector kern.* in detail. kern.* is a Facility/Priority-based filter, a commonly used method for filtering syslog messages.

kern.* can be interpreted as follows:

 
FACILITY.PRIORITY
  • FACILITY: a subsystem generating log messages. kern is an example of a facility alongside other subsystems like authpriv, cron, user, daemon, mail, auth, syslog, lpr, news, uucp, etc. To define all facilities, you can use *.

  • PRIORITY: specifies the log message priority. Priorities include debug, info, notice, warning, warn (same as warning), err, error (same as err), crit, alert, emerg, panic. If you want to send logs with any priority level, you can use *. Optionally, you can use the priority keyword none for facilities without specified priorities.

Filter and action are separated by one or more spaces or tabs.

The last part, -/var/log/kern.log, is the action indicating the target file where content is sent.

In this configuration file, most rules direct output to various files, which you can find in /var/log.

Close the configuration file and use the following command to list all contents in the /var/log directory:

 
ls -l /var/log/

The output will include files like:

Output
total 444
total 444
-rw-r--r--  1 root      root                 0 Oct 22 04:33 alternatives.log
drwxr-xr-x  2 root      root              4096 Oct 27 08:37 apt
-rw-r-----  1 syslog    adm               7596 Oct 27 08:44 auth.log
-rw-r--r--  1 root      root                 0 Oct 22 04:33 bootstrap.log
-rw-rw----  1 root      utmp                 0 Feb 17  2023 btmp
-rw-r-----  1 syslog    adm             105503 Oct 27 08:33 cloud-init.log
-rw-r-----  1 root      adm               5769 Oct 27 08:33 cloud-init-output.log
drwxr-xr-x  2 root      root              4096 Feb 10  2023 dist-upgrade
-rw-r-----  1 root      adm              46597 Oct 27 08:33 dmesg
-rw-r--r--  1 root      root              6664 Oct 27 08:37 dpkg.log
-rw-r--r--  1 root      root             32032 Oct 27 08:33 faillog
drwxr-sr-x+ 4 root      systemd-journal   4096 Oct 27 08:33 journal
-rw-r-----  1 syslog    adm              70510 Oct 27 08:44 kern.log
drwxr-xr-x  2 landscape landscape         4096 Oct 27 08:33 landscape
-rw-rw-r--  1 root      utmp            292292 Oct 27 08:35 lastlog
drwx------  2 root      root              4096 Feb 17  2023 private
-rw-r-----  1 syslog    adm             136675 Oct 27 08:44 syslog
-rw-r--r--  1 root      root              4748 Oct 27 08:37 ubuntu-advantage.log
-rw-r-----  1 syslog    adm              10487 Oct 27 08:44 ufw.log
drwxr-x---  2 root      adm               4096 Oct 22 04:28 unattended-upgrades
-rw-rw-r--  1 root      utmp              3840 Oct 27 08:35 wtmp

Most files Rsyslog creates belong to the syslog user and the adm group. Other applications besides Rsyslog also create logs in this directory, such as MySQL and Nginx.

This behavior of creating files with these attributes is defined in another default configuration file, /etc/rsyslog.conf.

Rsyslog global directives and modules

When Rsyslog runs, it reads the /etc/rsyslog.conf file, another default configuration already defined. This file contains global directives, modules, and references to all the configuration files in the /etc/rsyslog.d/ directory, including the /etc/rsyslog.d/50-default.conf we examined in the previous section.

Open the /etc/rsyslog.conf configuration file using the following command:

 
nano /etc/rsyslog.conf

Locate the following section near the bottom of the file:

/etc/rsyslog.conf
...
#
# Set the default permissions for all log files.
#
$FileOwner syslog
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022
$PrivDropToUser syslog
$PrivDropToGroup syslog
...

In this file, there are properties such as $FileOwner and $FileGroup that specify the file owner and group, along with file permissions. If you need to change ownership, this is the section to look at. Any keyword prefixed with $ is a variable you can modify.

Further down the configuration file, you'll find lines like:

Output
...
#
# Where to place spool and state files
#
$WorkDirectory /var/spool/rsyslog

#
# Include all config files in /etc/rsyslog.d/
#
$IncludeConfig /etc/rsyslog.d/*.conf

The $WorkDirectory specifies the location Rsyslog uses to store state files, and $IncludeConfig includes all the configuration files defined in the /etc/rsyslog.d directory. Rsyslog will read any configuration file you create in this directory. This is where you will define your custom configurations.

Now that you understand that Rsyslog has default configurations that route most system logs to various files in /var/log, you are ready to create a demo application that generates logs. Later, you'll configure Rsyslog to read these logs.

Developing a demo logging application

In this section, you'll create a logging application built with the Bash scripting language. The application will generate JSON logs at regular intervals, simulating a high-traffic real-world application.

To begin, ensure you are in the processing-stack/logify directory and create a subdirectory for the demo logging application:

 
mkdir logify

Navigate into the directory:

 
cd logify

Next, create a logify.sh file:

 
nano logify.sh

In your logify.sh file, add the following code to produce logs:

log-processing-stack/logify/logify.sh
#!/bin/bash
filepath="/var/log/logify/app.log"

create_log_entry() {
    local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
    local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
    local http_status_code=200
    local ip_address="127.0.0.1"
    local level=30
    local pid=$$
    local ssn="407-01-2433"
    local time=$(date +%s)
    local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "time": '$time'}'
    echo "$log"
}

while true; do
    log_record=$(create_log_entry)
    echo "${log_record}" >> "${filepath}"
    sleep 3
done

The create_log_entry() function generates structured logs in JSON format with details, such as severity level, message, and HTTP status code. It then enters an infinite loop that repeatedly calls the create_log_entry() function to write logs to a specified file in the /var/log/logify directory.

When you finish writing the code, save and exit the file. Then make the file executable:

 
chmod +x logify.sh

Next, create the /var/log/logify directory to store the application logs:

 
sudo mkdir /var/log/logify

Assign the currently logged-in user in the $USER variable as the owner of the /var/log/logify directory:

 
sudo chown -R $USER:$USER /var/log/logify/

Run the logify.sh script in the background:

 
./logify.sh &

The & sign tells the OS to run the script in the background, allowing you to continue using the terminal for other tasks while the program runs.

When you press enter, the script will start running and you'll see something like:

 
[1] 960

Here, 960 is the process ID, which can be used to terminate the script if needed.

Now, view the app.log contents with the tail command:

 
tail -n 4 /var/log/logify/app.log

The output will show structured JSON logs similar to this:

 
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 2666, "ssn": "407-01-2433", "time": 1698396567}
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Connected to database", "pid": 2666, "ssn": "407-01-2433", "time": 1698396570}
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396573}
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396576}

With the application generating structured JSON logs, you are now ready to use Rsyslog to read these log entries.

Getting started with Rsyslog

Now that you have developed an application to produce logs at regular intervals, you will use Rsyslog to read the logs from a file and transform them into syslog messages stored under the /var/log/syslog file.

To begin, create a configuration file with a name of your choosing in the /etc/rsyslog.d directory:

 
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf

In the 52-rsyslog-logify.conf file, add the following configuration:

/etc/rsyslog.d/51-rsyslog-logify.conf
global(
  workDirectory="/var/spool/rsyslog"
)

# Load the imfile module to read logs from a file
module(load="imfile")

# Define a new input for reading logs from a file
input(type="imfile"
      File="/var/log/logify/app.log"
      Tag="FileLogs"
      PersistStateInterval="10"
      Facility="local0")

# Send logs with the specified tag to the console
if $syslogtag == 'FileLogs' then {
    action(type="omfile"
           file="/var/log/syslog")
}

In the first line, the global() directive configures the working directory to store state files. The files allow Rsyslog to track the parts of the logs it has processed.

Next, the module() method is used to load the imfile module, which is used to read logs from files.

Following that, you define an input using the imfile module to read logs from the specified path under the File parameter. You then add a tag, FileLogs,to each log entry processed, and the PersistStateInterval parameter specifies how often the state file should be written when reading the logs.

Finally, a conditional expression checks if the log tag equals the FileLogs tag. If true, an action using the omfile module is defined to forward the logs to the /var/log/syslog file.

After you are finished, save and exit the configuration file.

Before restarting Rsyslog, its a good idea to check the configuration file for syntax errors. Enter the following command to check if the configuration file has no syntax errors:

 
rsyslogd -f /etc/rsyslog.d/51-rsyslog-logify.conf -N1

When the configuration file has no errors, you will see output similar to this:

Output
rsyslogd: version 8.2112.0, config validation run (level 1), master config /etc/rsyslog.d/rsyslog-logify.conf
rsyslogd: End of config validation run. Bye.

Now restart Rsyslog:

 
sudo systemctl restart rsyslog.service

When Rsyslog restarts, it will start sending the logs to /var/log/syslog. To check the logs in real-time as they get written, enter the following command:

 
sudo tail -f /var/log/syslog

The log entries will be displayed, showing the timestamp, hostname, log tag, and the log message:

 
Oct 27 08:52:09 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396729}
Oct 27 08:52:09 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396729}
Oct 27 08:52:12 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396732}
Oct 27 08:52:12 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396732}
Oct 27 08:52:15 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 2666, "ssn": "407-01-2433", "time": 1698396735}
Oct 27 08

:52:08 rsyslog-client kernel: [ 1146.001410] [UFW BLOCK] IN=eth0 OUT= MAC=96:00:02:a8:ea:a1:d2:74:7f:6e:37:e3:08:00 SRC=193.35.18.61 DST=37.27.21.229 LEN=40 TOS=0x00 PREC=0x00 TTL=245 ID=25569 PROTO=TCP SPT=47727 DPT=40209 WINDOW=1024 RES=0x00 SYN URGP=0
...

Since the /var/log/syslog file contains logs from other processes, it's common to see logs from sources such as kernel.

Now that Rsyslog can read application logs, you can further process the log messages as needed.

Transforming Logs with Rsyslog

When Rsyslog reads log entries, you can transform them before sending them to the output. You can enrich them with new fields or format them differently. One common transformation is formatting logs as JSON using Rsyslog templates.

Formatting logs in JSON with Rsyslog templates

Rsyslog allows you to format logs into various formats using templates. By default, Rsyslog automatically formats log messages, even if no templates are specified, using its built-in templates. However, you might want to format your logs in JSON, which is structured and machine parsable.

If you look at the logs Rsyslog is currently formatting, you will notice that the logs are not structured:

 
Oct 27 08:52:15 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 2666, "ssn": "407-01-2433", "time": 1698396735}

Many remote destinations prefer structured logs, so it's a good practice to structure the log messages.

In Rsyslog, you can use templates with the template() object to modify and structure logs. Open the configuration file:

 
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf

Add the template to the configuration file:

/etc/rsyslog.d/51-rsyslog-logify.conf
...
input(type="imfile"
      File="/var/log/logify/app.log"
      Tag="FileLogs"
      PersistStateInterval="10"
      Facility="local0")

template(name="json-template" type="list" option.jsonf="on") {
property(outname="@timestamp" name="timereported" dateFormat="rfc3339" format="jsonf")
property(outname="host" name="hostname" format="jsonf")
property(outname="severity" name="syslogseverity" caseConversion="upper" format="jsonf" datatype="number")
property(outname="facility" name="syslogfacility" format="jsonf" datatype="number")
property(outname="syslog-tag" name="syslogtag" format="jsonf")
property(outname="source" name="app-name" format="jsonf" onEmpty="null")
property(outname="message" name="msg" format="jsonf")
}
if $syslogtag == 'FileLogs' then { action( type="omfile" file="/var/log/syslog"
template="json-template"
) }

In the configuration above, you define a json-template template using the template() object. This template formats the syslog message as JSON. The template includes various property statements to add fields to the syslog message. Each property statement specifies the name of the property to access and the outname, which defines the output field name in the JSON object. The format parameter is set to "jsonf" to format the property as JSON. Some properties include a timestamp, host, syslog-tag, and the syslog message itself.

Finally, you add the template parameter in the action section, referencing the newly defined json-template.

After saving your file, restart Rsyslog:

 
sudo systemctl restart rsyslog

Now, check the logs being written:

 
sudo tail -f /var/log/syslog

The output shows that the syslog messages are now formatted as JSON. They also include additional fields that provide more context:

 
{"@timestamp":"2023-10-27T08:56:43.209622+00:00", "host":"rsyslog-client", "severity":5, "facility":16, "syslog-tag":"FileLogs", "source":"FileLogs", "message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"msg\": \"Initialized application\", \"pid\": 2666, \"ssn\": \"407-01-2433\", \"time\": 1698397003}"}
...

The logs in the output are now structured in JSON format and contain more detailed information. Next, you will add custom fields to the log event.

Adding Custom Fields with Rsyslog

In Rsyslog, you can add custom fields to log entries using constant statements. These statements allow you to insert fixed values into log messages.

First, open the configuration file:

 
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf

Add a new constant statement to include a custom field called environment with the value dev:

/etc/rsyslog.d/52-rsyslog-logify.conf
template(name="json-template" type="list" option.jsonf="on") {
    property(outname="@timestamp" name="timereported" dateFormat="rfc3339" format="jsonf")
    property(outname="host" name="hostname" format="jsonf")
    property(outname="severity" name="syslogseverity" caseConversion="upper" format="jsonf" datatype="number")
    property(outname="facility" name="syslogfacility" format="jsonf" datatype="number")
    property(outname="syslog-tag" name="syslogtag" format="jsonf")
    property(outname="source" name="app-name" format="jsonf" onEmpty="null")
    property(outname="message" name="msg" format="jsonf")
constant(outname="environment" value="dev" format="jsonf")
}

In the configuration above, a constant statement has been added with the outname set to environment and the value set to dev. This constant statement inserts a fixed field named environment with the value dev into each log entry.

Save and exit the configuration file. Then, restart Rsyslog to apply the changes:

 
sudo systemctl restart rsyslog

To verify if the custom field has been added, tail the syslog file:

 
sudo tail -f /var/log/syslog

You will observe that Rsyslog has included an environment field in each log entry at the end of the log event:

 
{"@timestamp":"2023-10-27T09:00:34.693383+00:00", "host":"rsyslog-client", "severity":5, "facility":16, "syslog-tag":"FileLogs", "source":"FileLogs", "message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"msg\": \"Operation finished\", \"pid\": 2666, \"ssn": \"407-01-2433\", \"time\": 1698397234}", "environment": "dev"}

Now that you can add custom fields to log events, you are ready to use an external program to process these log events further.

Processing Rsyslog Messages with an External Program

In this section, you will configure Rsyslog to pass log events to an external Python program for further processing. This enables you to perform complex manipulations not achievable within Rsyslog alone. Specifically, you will use Python to redact sensitive fields like IP addresses and Social Security Numbers (SSN) for data privacy.

Open the Rsyslog configuration file:

 
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf

Add the following configuration to load the omprog module and specify the external Python program:

/etc/rsyslog.d/51-rsyslog-logify.conf
module(load="imfile")
module(load="omprog")
if $syslogtag == 'FileLogs' then {
action(
type="omprog"
name="rsyslog_redact"
binary="/usr/bin/python3 /opt/rsyslog_redact.py"
output="/var/log/rsyslog_redacted.log"
template="json-template"
)
action( type="omfile" file="/var/log/syslog" template="json-template" ) } ...

First, you load the omprog module, which allows Rsyslog to execute an external Python program. You then define an action block with type="omprog" to indicate that you are using the newly added module. The binary parameter specifies the path to the Python script (/usr/bin/python3 /opt/rsyslog_redact.py). The output parameter defines the location where the processed logs will be written (/var/log/rsyslog_redacted.log). The template parameter specifies the template to format the log events before processing.

Next, create the rsyslog_redact.py script in the /opt directory:

 
sudo nano /opt/rsyslog_redact.py

Add the Python code to redact sensitive fields from log events:

/opt/rsyslog_redact.py
#!/usr/bin/env python3
import sys
import traceback
import json
import re

def redact_sensitive_fields(data):
    # Define regex patterns for IP addresses and SSNs
    ip_pattern = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
    ssn_pattern = r'\b\d{3}-\d{2}-\d{4}\b'

    # Redact IP addresses and SSNs with placeholder ("REDACTED")
    for key in data:
        if isinstance(data[key], str):
            data[key] = re.sub(ip_pattern, 'REDACTED_IP', data[key])
            data[key] = re.sub(ssn_pattern, 'REDACTED_SSN', data[key])
    return data

def redact_sensitive_fields_from_json(input_json):
    try:
        data = json.loads(input_json)  # Parse the JSON message
        redacted_data = redact_sensitive_fields(data)
        return redacted_data
    except Exception as e:
        err = traceback.format_exc()
        return err

if __name__ == '__main__':
    while True:
        try:
            line = sys.stdin.readline()
            msg = line.strip()
            if msg != "":
                redacted_json = redact_sensitive_fields_from_json(msg)
                print(redacted_json)
        except Exception as e:
            err = traceback.format_exc()
            print(err)

The script's main logic is within the __name__ == '__main__': block. Within this block, an infinite loop continually reads lines from the standard input using the sys.stdin.readline() method. If a non-empty message is read, the script invokes the redact_sensitive_fields_from_json() function with the syslog message as input.

Inside redact_sensitive_fields_from_json(), the script attempts to parse the provided message as JSON using json.loads(). Upon successful parsing, the redact_sensitive_fields() function is invoked. This function uses regular expressions to identify IP addresses and SSNs within the message. Any occurrences of these sensitive data fields are replaced with the strings REDACTED_IP and REDACTED_SSN, respectively.

If the redaction process is successful, the redacted data is returned. If an error is thrown, the script captures and returns the traceback information, providing valuable context in case of exceptions.

After saving the script, restart Rsyslog to apply the changes:

 
sudo systemctl restart rsyslog

Check if the var/log/rsyslog_redacted.log file has been created:

 
ls -l /var/log/rsyslog_redacted.log

You should see an output similar to this:

Output
-rw------- 1 syslog syslog 7878 Oct 27 09:07 /var/log/rsyslog_redacted.log

Finally, tail the log file to observe the redacted logs:

 
tail -f /var/log/rsyslog_redacted.log

The output will show log events with redacted sensitive fields:

 
{"@timestamp":"2023-10-27T09:07:20.528067+00:00", "host":"rsyslog-client", "severity":5, "facility":16, "syslog-tag":"FileLogs", "source":"FileLogs", "message":"{\"status\": 200, \"ip\": \"REDACTED_IP\", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "REDACTED_SSN", "time": 1698397640}", "environment": "dev"}
...

The Python script successfully redacts sensitive fields from the log events, ensuring data privacy and security.

The advantage of this script is that it selectively masks only the sensitive portions of a field while leaving the rest of the message intact. For example, if the input contains a field like this:

Output
{..., "privateInfo": "This is a sample message with SSN: 123-45-6789 and IP: 192.168.0.1"}

After processing with the script, only the sensitive data within the field is replaced, maintaining the overall structure of the message:

Output
{..., "privateInfo": "This is a sample message with SSN: [REDACTED_SSN] and IP: [REDACTED_IP]"}

This targeted redaction approach preserves the non-sensitive information in the logs, ensuring that essential context is retained for analysis while safeguarding specific private data elements.

Collecting logs from Docker containers and centralizing logs

In this section, you will containerize the Bash program. Additionally, you will use an Nginx hello world Docker image, which is preconfigured to produce Nginx logs upon each incoming request. Subsequently, you will deploy a Rsyslog container to collect logs from all running containers and centralize them to Better Stack for log management and analysis.

Dockerizing the Bash script

Firstly, navigate to the log-processing-stack/logify directory.

Create a Dockerfile to house instructions on how to build the image:

 
nano Dockerfile

In your Dockerfile, paste the following code:

log-processing-stack/logify/Dockerfile
FROM ubuntu:latest

COPY . .

RUN chmod +x logify.sh

RUN mkdir -p /var/log/logify

RUN ln -sf /dev/stdout /var/log/logify/app.log

CMD ["./logify.sh"]

In the first line, you specify Ubuntu as the base image. In the subsequent lines, you specify that the script should be copied into the container, make it executable, create a directory to store logs, and ensure logs can be viewed using docker logs command by redirecting data written to /var/log/logify/app.log to standard output. Finally, you specify the command to run the script when the container is launched.

Save and exit your file. Then, navigate back to the parent project directory:

 
cd ..

Next, create a docker-compose.yml file:

 
nano docker-compose.yml

In the file, add the following code to define the Bash script and Nginx services:

log-processing-stack/docker-compose.yml
version: '3'
services:
  logify-script:
    build:
      context: ./logify
    image: logify:latest
    container_name: logify
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    logging:
      driver: json-file
    container_name: nginx
    ports:
      - '80:80'

The preceding code defines two services: logify-script and nginx. The logify-script service is built from the ./logify directory context and creates an image tagged as logify:latest. The nginx service uses the latest version of the nginx-helloworld image and runs in a container named nginx while logging using the json-file logging driver. Port 80 of the host system is mapped to port 80 of the container, allowing external access to the NGINX web server running inside the container. Ensure no other services use port 80 to prevent conflicts.

Next, build an image for the Bash program and start the containers for each defined service:

 
docker compose up -d

The -d option runs both containers in the background.

Check the status of the containers:

 
docker compose ps

You should see a "running" status under the "STATUS" column for each container:

Output
NAME                COMMAND              SERVICE             STATUS              PORTS
logify              "./logify.sh"        logify-script       running
nginx               "/runner.sh nginx"   nginx               running             0.0.0.0:80->80/tcp

This confirms that both containers are running. Next, send HTTP requests to the Nginx service using the curl command:

 
curl http://localhost:80/?[1-5]

Now, display the logs from both services:

 
docker compose logs
Output
logify  | {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 1, "ssn": "407-01-2433", "time": 1698401389}
...
nginx   | {"timestamp":"2023-10-27T10:02:37+00:00","pid":"8","remote_addr":"172.18.0.1","remote_user":"","request":"GET /?1 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1698400957.375"}

The output displays logs produced from both containers.

Now that the Bash script and Nginx services are running and generating logs, you can collect and centralize these logs using Rsyslog.

Defining the Rsyslog service with Docker Compose

In this section, you will define an Rsyslog service and deploy a Rsyslog container that reads logs from the running containers.

Start by opening the docker-compose.yml file:

 
nano docker-compose.yml

Add the highlighted code to define the logify-script service:

log-processing-stack/docker-compose.yml
version: '3'
services:
  logify-script:
    build:
      context: ./logify
    image: logify:latest
    container_name: logify
logging:
driver: syslog # Set the log driver to syslog
options: # Specify syslog options
syslog-address: "tcp://127.0.0.1:514"
tag: "docker/{{.Name}}"
syslog-format: rfc3164
networks:
- rsyslog-network
nginx: image: betterstackcommunity/nginx-helloworld:latest container_name: nginx ports: - '80:80'
logging:
driver: syslog # Set the log driver to syslog
options: # Specify syslog options
syslog-address: "tcp://127.0.0.1:514"
tag: "docker/{{.Name}}"
syslog-format: rfc3164
networks:
- rsyslog-network
rsyslog:
build:
context: ./rsyslog
ports:
- "514:514/tcp" # TCP port for syslog
volumes:
- ./rsyslog/rsyslog.conf:/etc/rsyslog.d/rsyslog.conf
- ./data:/var/log # this is optional
networks:
- rsyslog-network
networks:
rsyslog-network:
name: rsyslog-network

In the logify-script and nginx services, you set the log driver to syslog, which sends the logs from the containers to a syslog server. In the options block, you customize the syslog behavior: first, syslog-address specifies the address of the syslog server; then, you set a syslog tag to identify the log source, and finally, syslog-format: rfc3164 specifies the syslog format to be RFC 3164. Next, you add the networks section to attach both services to the rsyslog-network.

Following that, you define an rsyslog service, which will be built from the ./rsyslog directory context. You then map the TCP port 514 from the host to the 514 port inside the container, allowing syslog communication over this port. Next, you mount ./rsyslog/rsyslog.conf (which you will define soon) to /etc/rsyslog.d/rsyslog.conf, allowing custom configuration for the Rsyslog service. Additionally, the local ./data directory is mounted into the container at /var/log, providing a location for storing log data.

Next, create a rsyslog directory and move into it:

 
mkdir rsyslog && cd rsyslog

After that, create a Dockerfile:

 
nano Dockerfile

Add the following instructions:

log-processing-stack/rsyslog/Dockerfile
FROM rsyslog/rsyslog_dev_base_ubuntu:22.04_previous

USER root

RUN apt-get update \
    && apt-get install -y rsyslog-gnutls \
    && rm -rf /var/lib/apt/lists/*

RUN sed -i '/imklog/s/^/#/' /etc/rsyslog.conf

CMD ["rsyslogd", "-n"]

In this Dockerfile, you set up the rsyslog/rsyslog_dev_base_ubuntu:22.04_previous as the base image. You then specify to switch to the root user, update the package index, and install the rsyslog-gnutls package, adding TLS support that Better Stack needs.

Following that, you modify the /etc/rsyslog.conf file to comment out the line containing imklog, which is an input module for kernel log messages that are not needed for a Rsyslog instance running in a container.

Finally, you specify the command to start the Rsyslog daemon in the foreground with the -n flag.

Next, create the rsyslog.conf file:

 
nano rsyslog.conf

Add the following code to receive logs from Docker containers:

log-processing-stack/rsyslog/rsyslog.conf
global(DefaultNetstreamDriverCAFile="/etc/ssl/certs/ca-certificates.crt")
# Load modules
module(load="imtcp")

# Configure input for syslog messages over TCP
input(type="imtcp" port="514")

In this code, the first line specifies the location of the Certificate Authority (CA) file for secure connections. Then, you set up Rsyslog to receive syslog messages over TCP on port 514, which you exposed in the docker-compose.yml file, if you recall.

With the server set up, you will configure the destination to forward logs. In this guide, you will centralize all logs on Better Stack.

Begin by creating a free Better Stack account. Once registered, proceed to the Sources section in your dashboard:

Screenshot of Better Stack navigation with the "sources" link pointed at

On the Sources page, click the Connect source button:

Screenshot with an arrow pointing to the "Connect source"

Provide a name for your source, such as "Logify logs," and select "Rsyslog" as the platform:

Screenshot of the Better Stack interface with the source filled as "Logify logs" and platform chosen as "Rsyslog"

After creating the source, copy the Source Token provided by Better Stack:

Screenshot showing an arrow pointing at the "Source Token" field

Now, return to your rsyslog.conf file. Add the destination to forward the logs. Replace <your_logify_source_token> with your actual source token in the highlighted line:

log-processing-stack/rsyslog/rsyslog.conf
...
template(name="LogtailFormat" type="list") {
 constant(value="<")
 property(name="pri")
 constant(value=">")
 constant(value="1")
 constant(value=" ")
 property(name="timestamp" dateFormat="rfc3339")
 constant(value=" ")
 property(name="hostname")
 constant(value=" ")
 property(name="app-name")
 constant(value=" ")
 property(name="procid")
 constant(value=" ")
 property(name="msgid")
 constant(value=" ")
 property(name="structured-data" regex.expression="[^-]" regex.nomatchmode="BLANK" regex.submatch="0")
constant(value="[logtail@11993 source_token=\"<your_logify_source_token>\"]")
constant(value=" ") property(name="msg" droplastlf="on") } if $syslogtag contains "docker/logify" then { action( type="omfwd" protocol="tcp" target="in.logs.betterstack.com" port="6514" template="LogtailFormat" TCP_Framing="octet-counted" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.logs.betterstack.com" queue.spoolDirectory="/var/spool/rsyslog" queue.filename="logtail" queue.maxdiskspace="75m" queue.type="LinkedList" queue.saveonshutdown="on" ) }

Here, you define a LogtailFormat template to format the logs and include your source token. This ensures that only logs with the tag docker/logify are sent to Better Stack.

After making these changes, save and exit the file.

Return to the parent directory:

 
cd ..

Start the Rsyslog service:

 
docker compose up -d

If you encounter an error like the following:

Output
Error response from daemon: failed to create task for container: failed to initialize logging driver: dial tcp 127.0.0.1:514: connect: connection refused

Rerun the command.

After a few seconds, return to Better Stack to confirm that Rsyslog is forwarding the logs. In the screenshot below, Better Stack is shown receiving the logs from Rsyslog:

Screenshot showing the log entries from Rsyslog in Better Stack

With successful forwarding of Bash program logs, it's time to forward Nginx service logs as well.

Begin by creating another source named "Nginx logs" following the same steps used for the first source. Make sure to copy and store the source information in a safe place.

When creating the Nginx source, the interface will appear as shown below:

Screenshot of Better Stack with two sources: Logify, and Nginx

Next, navigate to the rsyslog directory:

 
cd rsyslog

Open the rsyslog.conf configuration file with the command:

 
nano rsyslog.conf

Add the highlighted code to collect logs from the nginx service and forward them to Better Stack. Ensure to update the source token in the newly added template:

log-processing-stack/rsyslog/rsyslog.conf

global(DefaultNetstreamDriverCAFile="/etc/ssl/certs/ca-certificates.crt")
# Load modules
module(load="imtcp")

# Configure input for syslog messages over TCP
input(type="imtcp" port="514")

template(name="LogtailFormat" type="list") {
...
}
template(name="LogtailNginxFormat" type="list") {
constant(value="<")
property(name="pri")
constant(value=">")
constant(value="1")
constant(value=" ")
property(name="timestamp" dateFormat="rfc3339")
constant(value=" ")
property(name="hostname")
constant(value=" ")
property(name="app-name")
constant(value=" ")
property(name="procid")
constant(value=" ")
property(name="msgid")
constant(value=" ")
property(name="structured-data" regex.expression="[^-]" regex.nomatchmode="BLANK" regex.submatch="0")
constant(value="[logtail@11993 source_token=\"<your_nginx_token>\"]")
constant(value=" ")
property(name="msg" droplastlf="on")
}
if $syslogtag contains "docker/logify" then { ... }
if $syslogtag contains "docker/nginx" then {
action(
type="omfwd"
protocol="tcp"
target="in.logs.betterstack.com"
port="6514"
template="LogtailNginxFormat"
TCP_Framing="octet-counted"
StreamDriver="gtls"
StreamDriverMode="1"
StreamDriverAuthMode="x509/name"
StreamDriverPermittedPeers="*.logs.betterstack.com"
queue.spoolDirectory="/var/spool/rsyslog"
queue.filename="logtail"
queue.maxdiskspace="75m"
queue.type="LinkedList"
queue.saveonshutdown="on"
)
}

After making these changes, save and exit the file.

Return to the parent directory using the following command:

 
cd ..

Start the services:

 
docker compose up -d

Finally, use curl to send five requests to the Nginx service:

 
curl http://localhost:80/?[1-5]

Return to Better Stack in your browser to confirm that the "Nginx logs" source is receiving the logs.

Screenshot of Nginx logs in Better Stack

With that, you can centralize logs to Better Stack.

Final thoughts

In this comprehensive guide, you explored Rsyslog functionality and versatility in managing logs effectively. You started by understanding the fundamentals of how Rsyslog operates. Building upon this, you used Rsyslog to read logs generated by various programs, manipulate log data by converting it into JSON format, add custom fields, and employ external programs for further processing. You then deployed a Rsyslog Docker container to collect logs from other containers and forwarded them to Better Stack.

With this knowledge, you are now well-equipped to use Rsyslog in your projects. To further enhance your skills, explore official Rsyslog documentation. Additionally, for a broader understanding of Docker logging, refer to our comprehensive guide on Docker logging.

While Rsyslog is a powerful log shipper, there are various other log shippers available. To explore alternatives and make informed decisions based on your specific requirements, refer to our log shippers guide.

Thank you and happy logging!

Author's avatar
Article by
Stanley Ulili
Stanley Ulili is a technical educator at Better Stack based in Malawi. He specializes in backend development and has freelanced for platforms like DigitalOcean, LogRocket, and AppSignal. Stanley is passionate about making complex topics accessible to developers.
Got an article suggestion? Let us know
Next article
How to Set Up Centralized Logging on Linux with Rsyslog
Learn how to set up a centralized logging on linux with rsyslog
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github