How to Collect, Process, and Ship Log Data with Rsyslog
Modern computing systems generate diverse log messages, encompassing vital information from system logs (including kernel and boot messages), applications, databases, and network services or daemons. These logs play a crucial role in troubleshooting and diagnosing issues when they arise, and are often effective when they have been centralized.
To centralize logs, you can use a log shipper, a tool designed to collect logs from various sources and forward them to diverse locations. Rsyslog is a prominent log shipper operating based on the syslog protocol.
Rsyslog ships with advanced features, such as filtering, and supports both TCP and UDP protocols for transporting messages. It can handle logs related to mail, authorizations, kernel messages, and more.
This comprehensive tutorial will guide you through using Rsyslog to collect, process, and forward log data to a central location. First, you will configure Rsyslog to read logs from a file. Next, you will explore how to process logs using Rsyslog. Following that, you will centralize the logs to another server where Rsyslog is operational. Finally, you will use a Rsyslog Docker container to collect logs from other containers.
Prerequisites
Before you begin, ensure access to a system with a non-root user account with sudo
privileges. Additionally, certain parts of this tutorial involve using Docker and Docker Compose. If you intend to explore these tools, make sure you have Docker and Docker Compose installed on your system.
Once you've confirmed these prerequisites, create a directory to store your configuration files and applications:
mkdir log-processing-stack
Next, navigate into the newly created directory:
cd log-processing-stack
With the directory set up, you're ready to install Rsyslog.
Installing Rsyslog
Rsyslog is pre-installed on many systems and may sometimes need to be updated. It's considered best practice to install the latest version to ensure you have access to the most recent features and security enhancements.
Below are the installation instructions, tested on Ubuntu 22.04. For other systems, consult the Rsyslog documentation for installation guidelines.
First, install the latest version of Rsyslog:
sudo apt-get install rsyslog
If you see the message "rsyslog is already the newest version," it indicates that you have the latest version installed.
Confirm the installation and check the Rsyslog version with the following:
rsyslogd -v
You should see an output similar to this:
rsyslogd 8.2112.0 (aka 2021.12) compiled with:
PLATFORM: x86_64-pc-linux-gnu
PLATFORM (lsb_release -d):
FEATURE_REGEXP: Yes
GSSAPI Kerberos 5 support: Yes
FEATURE_DEBUG (debug build, slow code): No
32bit Atomic operations supported: Yes
64bit Atomic operations supported: Yes
memory allocator: system default
Runtime Instrumentation (slow code): No
uuid support: Yes
systemd support: Yes
Config file: /etc/rsyslog.conf
PID file: /run/rsyslogd.pid
Number of Bits in RainerScript integers: 64
See https://www.rsyslog.com for more information.
Additionally, ensure that the Rsyslog service is active and running:
systemctl status rsyslog
You should see the status as "active (running)," confirming that Rsyslog is operational:
● rsyslog.service - System Logging Service
Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor prese>
Active: active (running) since Fri 2023-10-27 08:33:14 UTC; 8min ago
TriggeredBy: ● syslog.socket
Docs: man:rsyslogd(8)
man:rsyslog.conf(5)
https://www.rsyslog.com/doc/
Main PID: 652 (rsyslogd)
Tasks: 4 (limit: 2244)
Memory: 2.9M
CPU: 47ms
CGroup: /system.slice/rsyslog.service
└─652 /usr/sbin/rsyslogd -n -iNONE
Warning: some journal files were not opened due to insufficient permissions.
With Rsyslog successfully installed and running, let's understand how it works.
How Rsyslog works
Before delving into how Rsyslog collects application logs, it's essential to understand how it works with system logs.
In your system, various applications like SSHD, mail clients/servers, and cron tasks generate logs at frequent intervals. These applications write log messages to the /dev/log
file as if it were a regular file (pseudo device).
The Rsyslog daemon monitors this file, collecting logs as they are written, and redirects them to individual plain text files in the /var/log
directory, including the /var/log/syslog
file. Rsyslog can route logs to their appropriate files by inspecting header information, such as priority and message origin, which it uses for filtering.
The routing of these messages is based on rules defined in the 50-default.conf
file, located in the /etc/rsyslog.d/
directory, which we'll explore shortly. Rsyslog operates with default configurations, whether freshly installed or already existing.
However, data originates from diverse sources, and these sources might lack rules in the default configurations.
Building on this knowledge, Rsyslog can be extended to collect logs from additional inputs and redirect them to various destinations, including remote ones, as illustrated in the diagram below:
To understand this process, imagine Rsyslog as a pipeline. On one end, Rsyslog collects inputs, transforms them, and forwards them to the other end—the destination.
This can be achieved with a custom configuration file in the /etc/rsyslog.d/
directory, structured as follows:
module(load="<module_name>")
# Collect logs
input(...)
# Modify logs
template(name="<template_name>") {}
# Redirect logs to the destination
action(type="<module_name>")
The main components include:
input
: collects logs from various sources.template
: modifies the log message format.action
: delivers logs to different destinations.
Rsyslog uses modules extensively to accomplish its tasks.
Rsyslog inputs
Rsyslog features modules designed to collect logs from various sources, identifiable by names starting with the im
prefix. Here are a few examples of these input modules:
imhttp: collects plaintext messages via HTTP.
imjournal: fetches system journal messages into Syslog.
imfile: reads text files and converts their contents into Syslog messages.
imdocker: collects logs from Docker containers using the Docker REST API.
Rsyslog Message Modification Modules
For modifying log messages, Rsyslog provides message modification modules typically prefixed with mm
:
mmjsonparse: parses structured log messages conforming to the CEE/lumberjack spec.
mmfields: extracts specific fields from log entries.
mmkubernetes: adds Kubernetes metadata to each log event.
mmanon: anonymizes IP addresses for privacy.
Rsyslog output modules
Rsyslog offers a wide array of output modules, recognizable by names starting with the om
prefix. These modules allow forwarding log messages to various destinations:
omfile: writes log entries to a file on the local system.
ommysql: sends log entries to a MySQL database.
omrabbitmq: forwards log data to RabbitMQ, a popular message broker.
omelasticsearch: delivers log output to Elasticsearch, a robust search and analytics engine.
Now that you have an idea of the available Rsyslog modules and what they do, let's analyze the Rsyslog configuration file in greater detail.
Understanding the Rsyslog configuration
When Rsyslog starts running on your system, it operates with a default configuration file. It collects logs from various processes and directs them to plain text files in the /var/log
directories.
Rsyslog relies on rules predefined in the default configuration file. You can also define your own rules, global directives, or modules.
Rsyslog rules
To comprehend how rules work, open the 50-default.conf
configuration file in your preferred text editor. This tutorial uses nano
, a command-line text editor:
sudo nano /etc/rsyslog.d/50-default.conf
In the initial part of the file, you'll find contents similar to this (edited for brevity):
...
auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv.none -/var/log/syslog
#cron.* /var/log/cron.log
#daemon.* -/var/log/daemon.log
kern.* -/var/log/kern.log
...
The lines in the file are rules. A rule comprises a filter for selecting log messages and an action specifying the path to send the logs. Lines starting with #
are comments and won't be executed.
Consider this line:
kern.* -/var/log/kern.log
This line can be divided into a selector filtering syslog messages kern.*
and an action specifying the path to forward the logs -/var/log/kern.log
.
Let's examine the selector kern.*
in detail. kern.*
is a Facility/Priority-based filter, a commonly used method for filtering syslog messages.
kern.*
can be interpreted as follows:
FACILITY.PRIORITY
FACILITY: a subsystem generating log messages.
kern
is an example of a facility alongside other subsystems likeauthpriv
,cron
,user
,daemon
,mail
,auth
,syslog
,lpr
,news
,uucp
, etc. To define all facilities, you can use*
.PRIORITY: specifies the log message priority. Priorities include
debug
,info
,notice
,warning
,warn
(same aswarning
),err
,error
(same as err),crit
,alert
,emerg
,panic
. If you want to send logs with any priority level, you can use*
. Optionally, you can use the priority keywordnone
for facilities without specified priorities.
Filter and action are separated by one or more spaces or tabs.
The last part, -/var/log/kern.log
, is the action indicating the target file where content is sent.
In this configuration file, most rules direct output to various files, which you can find in /var/log
.
Close the configuration file and use the following command to list all contents in the /var/log
directory:
ls -l /var/log/
The output will include files like:
total 444
total 444
-rw-r--r-- 1 root root 0 Oct 22 04:33 alternatives.log
drwxr-xr-x 2 root root 4096 Oct 27 08:37 apt
-rw-r----- 1 syslog adm 7596 Oct 27 08:44 auth.log
-rw-r--r-- 1 root root 0 Oct 22 04:33 bootstrap.log
-rw-rw---- 1 root utmp 0 Feb 17 2023 btmp
-rw-r----- 1 syslog adm 105503 Oct 27 08:33 cloud-init.log
-rw-r----- 1 root adm 5769 Oct 27 08:33 cloud-init-output.log
drwxr-xr-x 2 root root 4096 Feb 10 2023 dist-upgrade
-rw-r----- 1 root adm 46597 Oct 27 08:33 dmesg
-rw-r--r-- 1 root root 6664 Oct 27 08:37 dpkg.log
-rw-r--r-- 1 root root 32032 Oct 27 08:33 faillog
drwxr-sr-x+ 4 root systemd-journal 4096 Oct 27 08:33 journal
-rw-r----- 1 syslog adm 70510 Oct 27 08:44 kern.log
drwxr-xr-x 2 landscape landscape 4096 Oct 27 08:33 landscape
-rw-rw-r-- 1 root utmp 292292 Oct 27 08:35 lastlog
drwx------ 2 root root 4096 Feb 17 2023 private
-rw-r----- 1 syslog adm 136675 Oct 27 08:44 syslog
-rw-r--r-- 1 root root 4748 Oct 27 08:37 ubuntu-advantage.log
-rw-r----- 1 syslog adm 10487 Oct 27 08:44 ufw.log
drwxr-x--- 2 root adm 4096 Oct 22 04:28 unattended-upgrades
-rw-rw-r-- 1 root utmp 3840 Oct 27 08:35 wtmp
Most files Rsyslog creates belong to the syslog
user and the adm
group. Other applications besides Rsyslog also create logs in this directory, such as MySQL and Nginx.
This behavior of creating files with these attributes is defined in another default configuration file, /etc/rsyslog.conf
.
Rsyslog global directives and modules
When Rsyslog runs, it reads the /etc/rsyslog.conf
file, another default configuration already defined. This file contains global directives, modules, and references to all the configuration files in the /etc/rsyslog.d/
directory, including the /etc/rsyslog.d/50-default.conf
we examined in the previous section.
Open the /etc/rsyslog.conf
configuration file using the following command:
nano /etc/rsyslog.conf
Locate the following section near the bottom of the file:
...
#
# Set the default permissions for all log files.
#
$FileOwner syslog
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022
$PrivDropToUser syslog
$PrivDropToGroup syslog
...
In this file, there are properties such as $FileOwner
and $FileGroup
that specify the file owner and group, along with file permissions. If you need to change ownership, this is the section to look at. Any keyword prefixed with $
is a variable you can modify.
Further down the configuration file, you'll find lines like:
...
#
# Where to place spool and state files
#
$WorkDirectory /var/spool/rsyslog
#
# Include all config files in /etc/rsyslog.d/
#
$IncludeConfig /etc/rsyslog.d/*.conf
The $WorkDirectory
specifies the location Rsyslog uses to store state files, and $IncludeConfig
includes all the configuration files defined in the /etc/rsyslog.d
directory. Rsyslog will read any configuration file you create in this directory. This is where you will define your custom configurations.
Now that you understand that Rsyslog has default configurations that route most system logs to various files in /var/log
, you are ready to create a demo application that generates logs. Later, you'll configure Rsyslog to read these logs.
Developing a demo logging application
In this section, you'll create a logging application built with the Bash scripting language. The application will generate JSON logs at regular intervals, simulating a high-traffic real-world application.
To begin, ensure you are in the processing-stack/logify
directory and create a subdirectory for the demo logging application:
mkdir logify
Navigate into the directory:
cd logify
Next, create a logify.sh
file:
nano logify.sh
In your logify.sh
file, add the following code to produce logs:
#!/bin/bash
filepath="/var/log/logify/app.log"
create_log_entry() {
local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
local http_status_code=200
local ip_address="127.0.0.1"
local level=30
local pid=$$
local ssn="407-01-2433"
local time=$(date +%s)
local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "time": '$time'}'
echo "$log"
}
while true; do
log_record=$(create_log_entry)
echo "${log_record}" >> "${filepath}"
sleep 3
done
The create_log_entry()
function generates structured logs in JSON format with details, such as severity level, message, and HTTP status code. It then enters an infinite loop that repeatedly calls the create_log_entry()
function to write logs to a specified file in the /var/log/logify
directory.
When you finish writing the code, save and exit the file. Then make the file executable:
chmod +x logify.sh
Next, create the /var/log/logify
directory to store the application logs:
sudo mkdir /var/log/logify
Assign the currently logged-in user in the $USER
variable as the owner of the /var/log/logify
directory:
sudo chown -R $USER:$USER /var/log/logify/
Run the logify.sh
script in the background:
./logify.sh &
The &
sign tells the OS to run the script in the background, allowing you to continue using the terminal for other tasks while the program runs.
When you press enter, the script will start running and you'll see something like:
[1] 960
Here, 960
is the process ID, which can be used to terminate the script if needed.
Now, view the app.log
contents with the tail
command:
tail -n 4 /var/log/logify/app.log
The output will show structured JSON logs similar to this:
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 2666, "ssn": "407-01-2433", "time": 1698396567}
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Connected to database", "pid": 2666, "ssn": "407-01-2433", "time": 1698396570}
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396573}
{"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396576}
With the application generating structured JSON logs, you are now ready to use Rsyslog to read these log entries.
Getting started with Rsyslog
Now that you have developed an application to produce logs at regular intervals, you will use Rsyslog to read the logs from a file and transform them into syslog messages stored under the /var/log/syslog
file.
To begin, create a configuration file with a name of your choosing in the /etc/rsyslog.d
directory:
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf
In the 52-rsyslog-logify.conf
file, add the following configuration:
global(
workDirectory="/var/spool/rsyslog"
)
# Load the imfile module to read logs from a file
module(load="imfile")
# Define a new input for reading logs from a file
input(type="imfile"
File="/var/log/logify/app.log"
Tag="FileLogs"
PersistStateInterval="10"
Facility="local0")
# Send logs with the specified tag to the console
if $syslogtag == 'FileLogs' then {
action(type="omfile"
file="/var/log/syslog")
}
In the first line, the global()
directive configures the working directory to store state files. The files allow Rsyslog to track the parts of the logs it has processed.
Next, the module()
method is used to load the imfile
module, which is used to read logs from files.
Following that, you define an input using the imfile
module to read logs from the specified path under the File
parameter. You then add a tag, FileLogs
,to each log entry processed, and the PersistStateInterval
parameter specifies how often the state file should be written when reading the logs.
Finally, a conditional expression checks if the log tag equals the FileLogs
tag. If true, an action using the omfile
module is defined to forward the logs to the /var/log/syslog
file.
After you are finished, save and exit the configuration file.
Before restarting Rsyslog, its a good idea to check the configuration file for syntax errors. Enter the following command to check if the configuration file has no syntax errors:
rsyslogd -f /etc/rsyslog.d/51-rsyslog-logify.conf -N1
When the configuration file has no errors, you will see output similar to this:
rsyslogd: version 8.2112.0, config validation run (level 1), master config /etc/rsyslog.d/rsyslog-logify.conf
rsyslogd: End of config validation run. Bye.
Now restart Rsyslog:
sudo systemctl restart rsyslog.service
When Rsyslog restarts, it will start sending the logs to /var/log/syslog
. To check the logs in real-time as they get written, enter the following command:
sudo tail -f /var/log/syslog
The log entries will be displayed, showing the timestamp, hostname, log tag, and the log message:
Oct 27 08:52:09 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396729}
Oct 27 08:52:09 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396729}
Oct 27 08:52:12 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396732}
Oct 27 08:52:12 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "407-01-2433", "time": 1698396732}
Oct 27 08:52:15 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 2666, "ssn": "407-01-2433", "time": 1698396735}
Oct 27 08
:52:08 rsyslog-client kernel: [ 1146.001410] [UFW BLOCK] IN=eth0 OUT= MAC=96:00:02:a8:ea:a1:d2:74:7f:6e:37:e3:08:00 SRC=193.35.18.61 DST=37.27.21.229 LEN=40 TOS=0x00 PREC=0x00 TTL=245 ID=25569 PROTO=TCP SPT=47727 DPT=40209 WINDOW=1024 RES=0x00 SYN URGP=0
...
Since the /var/log/syslog
file contains logs from other processes, it's common to see logs from sources such as kernel
.
Now that Rsyslog can read application logs, you can further process the log messages as needed.
Transforming Logs with Rsyslog
When Rsyslog reads log entries, you can transform them before sending them to the output. You can enrich them with new fields or format them differently. One common transformation is formatting logs as JSON using Rsyslog templates.
Formatting logs in JSON with Rsyslog templates
Rsyslog allows you to format logs into various formats using templates. By default, Rsyslog automatically formats log messages, even if no templates are specified, using its built-in templates. However, you might want to format your logs in JSON, which is structured and machine parsable.
If you look at the logs Rsyslog is currently formatting, you will notice that the logs are not structured:
Oct 27 08:52:15 rsyslog-client FileLogs {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 2666, "ssn": "407-01-2433", "time": 1698396735}
Many remote destinations prefer structured logs, so it's a good practice to structure the log messages.
In Rsyslog, you can use templates with the template()
object to modify and structure logs. Open the configuration file:
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf
Add the template to the configuration file:
...
input(type="imfile"
File="/var/log/logify/app.log"
Tag="FileLogs"
PersistStateInterval="10"
Facility="local0")
template(name="json-template" type="list" option.jsonf="on") {
property(outname="@timestamp" name="timereported" dateFormat="rfc3339" format="jsonf")
property(outname="host" name="hostname" format="jsonf")
property(outname="severity" name="syslogseverity" caseConversion="upper" format="jsonf" datatype="number")
property(outname="facility" name="syslogfacility" format="jsonf" datatype="number")
property(outname="syslog-tag" name="syslogtag" format="jsonf")
property(outname="source" name="app-name" format="jsonf" onEmpty="null")
property(outname="message" name="msg" format="jsonf")
}
if $syslogtag == 'FileLogs' then {
action(
type="omfile"
file="/var/log/syslog"
template="json-template"
)
}
In the configuration above, you define a json-template
template using the template()
object. This template formats the syslog message as JSON. The template includes various property statements to add fields to the syslog message. Each property statement specifies the name
of the property to access and the outname
, which defines the output field name in the JSON object. The format
parameter is set to "jsonf"
to format the property as JSON. Some properties include a timestamp, host, syslog-tag, and the syslog message itself.
Finally, you add the template
parameter in the action section, referencing the newly defined json-template
.
After saving your file, restart Rsyslog:
sudo systemctl restart rsyslog
Now, check the logs being written:
sudo tail -f /var/log/syslog
The output shows that the syslog messages are now formatted as JSON. They also include additional fields that provide more context:
{"@timestamp":"2023-10-27T08:56:43.209622+00:00", "host":"rsyslog-client", "severity":5, "facility":16, "syslog-tag":"FileLogs", "source":"FileLogs", "message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"msg\": \"Initialized application\", \"pid\": 2666, \"ssn\": \"407-01-2433\", \"time\": 1698397003}"}
...
The logs in the output are now structured in JSON format and contain more detailed information. Next, you will add custom fields to the log event.
Adding Custom Fields with Rsyslog
In Rsyslog, you can add custom fields to log entries using constant statements. These statements allow you to insert fixed values into log messages.
First, open the configuration file:
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf
Add a new constant statement to include a custom field called environment
with the value dev
:
template(name="json-template" type="list" option.jsonf="on") {
property(outname="@timestamp" name="timereported" dateFormat="rfc3339" format="jsonf")
property(outname="host" name="hostname" format="jsonf")
property(outname="severity" name="syslogseverity" caseConversion="upper" format="jsonf" datatype="number")
property(outname="facility" name="syslogfacility" format="jsonf" datatype="number")
property(outname="syslog-tag" name="syslogtag" format="jsonf")
property(outname="source" name="app-name" format="jsonf" onEmpty="null")
property(outname="message" name="msg" format="jsonf")
constant(outname="environment" value="dev" format="jsonf")
}
In the configuration above, a constant
statement has been added with the outname
set to environment
and the value
set to dev
. This constant statement inserts a fixed field named environment
with the value dev
into each log entry.
Save and exit the configuration file. Then, restart Rsyslog to apply the changes:
sudo systemctl restart rsyslog
To verify if the custom field has been added, tail the syslog file:
sudo tail -f /var/log/syslog
You will observe that Rsyslog has included an environment
field in each log entry at the end of the log event:
{"@timestamp":"2023-10-27T09:00:34.693383+00:00", "host":"rsyslog-client", "severity":5, "facility":16, "syslog-tag":"FileLogs", "source":"FileLogs", "message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"msg\": \"Operation finished\", \"pid\": 2666, \"ssn": \"407-01-2433\", \"time\": 1698397234}", "environment": "dev"}
Now that you can add custom fields to log events, you are ready to use an external program to process these log events further.
Processing Rsyslog Messages with an External Program
In this section, you will configure Rsyslog to pass log events to an external Python program for further processing. This enables you to perform complex manipulations not achievable within Rsyslog alone. Specifically, you will use Python to redact sensitive fields like IP addresses and Social Security Numbers (SSN) for data privacy.
Open the Rsyslog configuration file:
sudo nano /etc/rsyslog.d/51-rsyslog-logify.conf
Add the following configuration to load the omprog
module and specify the external Python program:
module(load="imfile")
module(load="omprog")
if $syslogtag == 'FileLogs' then {
action(
type="omprog"
name="rsyslog_redact"
binary="/usr/bin/python3 /opt/rsyslog_redact.py"
output="/var/log/rsyslog_redacted.log"
template="json-template"
)
action(
type="omfile"
file="/var/log/syslog"
template="json-template"
)
}
...
First, you load the omprog
module, which allows Rsyslog to execute an external Python program. You then define an action
block with type="omprog"
to indicate that you are using the newly added module. The binary
parameter specifies the path to the Python script (/usr/bin/python3 /opt/rsyslog_redact.py
). The output
parameter defines the location where the processed logs will be written (/var/log/rsyslog_redacted.log
). The template
parameter specifies the template to format the log events before processing.
Next, create the rsyslog_redact.py
script in the /opt
directory:
sudo nano /opt/rsyslog_redact.py
Add the Python code to redact sensitive fields from log events:
#!/usr/bin/env python3
import sys
import traceback
import json
import re
def redact_sensitive_fields(data):
# Define regex patterns for IP addresses and SSNs
ip_pattern = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
ssn_pattern = r'\b\d{3}-\d{2}-\d{4}\b'
# Redact IP addresses and SSNs with placeholder ("REDACTED")
for key in data:
if isinstance(data[key], str):
data[key] = re.sub(ip_pattern, 'REDACTED_IP', data[key])
data[key] = re.sub(ssn_pattern, 'REDACTED_SSN', data[key])
return data
def redact_sensitive_fields_from_json(input_json):
try:
data = json.loads(input_json) # Parse the JSON message
redacted_data = redact_sensitive_fields(data)
return redacted_data
except Exception as e:
err = traceback.format_exc()
return err
if __name__ == '__main__':
while True:
try:
line = sys.stdin.readline()
msg = line.strip()
if msg != "":
redacted_json = redact_sensitive_fields_from_json(msg)
print(redacted_json)
except Exception as e:
err = traceback.format_exc()
print(err)
The script's main logic is within the __name__ == '__main__':
block. Within this block, an infinite loop continually reads lines from the standard input using the sys.stdin.readline()
method. If a non-empty message is read, the script invokes the redact_sensitive_fields_from_json()
function with the syslog message as input.
Inside redact_sensitive_fields_from_json()
, the script attempts to parse the provided message as JSON using json.loads()
. Upon successful parsing, the redact_sensitive_fields()
function is invoked. This function uses regular expressions to identify IP addresses and SSNs within the message. Any occurrences of these sensitive data fields are replaced with the strings REDACTED_IP
and REDACTED_SSN
, respectively.
If the redaction process is successful, the redacted data is returned. If an error is thrown, the script captures and returns the traceback information, providing valuable context in case of exceptions.
After saving the script, restart Rsyslog to apply the changes:
sudo systemctl restart rsyslog
Check if the var/log/rsyslog_redacted.log
file has been created:
ls -l /var/log/rsyslog_redacted.log
You should see an output similar to this:
-rw------- 1 syslog syslog 7878 Oct 27 09:07 /var/log/rsyslog_redacted.log
Finally, tail the log file to observe the redacted logs:
tail -f /var/log/rsyslog_redacted.log
The output will show log events with redacted sensitive fields:
{"@timestamp":"2023-10-27T09:07:20.528067+00:00", "host":"rsyslog-client", "severity":5, "facility":16, "syslog-tag":"FileLogs", "source":"FileLogs", "message":"{\"status\": 200, \"ip\": \"REDACTED_IP\", "level": 30, "msg": "Initialized application", "pid": 2666, "ssn": "REDACTED_SSN", "time": 1698397640}", "environment": "dev"}
...
The Python script successfully redacts sensitive fields from the log events, ensuring data privacy and security.
The advantage of this script is that it selectively masks only the sensitive portions of a field while leaving the rest of the message intact. For example, if the input contains a field like this:
{..., "privateInfo": "This is a sample message with SSN: 123-45-6789 and IP: 192.168.0.1"}
After processing with the script, only the sensitive data within the field is replaced, maintaining the overall structure of the message:
{..., "privateInfo": "This is a sample message with SSN: [REDACTED_SSN] and IP: [REDACTED_IP]"}
This targeted redaction approach preserves the non-sensitive information in the logs, ensuring that essential context is retained for analysis while safeguarding specific private data elements.
Collecting logs from Docker containers and centralizing logs
In this section, you will containerize the Bash program. Additionally, you will use an Nginx hello world Docker image, which is preconfigured to produce Nginx logs upon each incoming request. Subsequently, you will deploy a Rsyslog container to collect logs from all running containers and centralize them to Better Stack for log management and analysis.
Dockerizing the Bash script
Firstly, navigate to the log-processing-stack/logify
directory.
Create a Dockerfile
to house instructions on how to build the image:
nano Dockerfile
In your Dockerfile
, paste the following code:
FROM ubuntu:latest
COPY . .
RUN chmod +x logify.sh
RUN mkdir -p /var/log/logify
RUN ln -sf /dev/stdout /var/log/logify/app.log
CMD ["./logify.sh"]
In the first line, you specify Ubuntu as the base image. In the subsequent lines, you specify that the script should be copied into the container, make it executable, create a directory to store logs, and ensure logs can be viewed using docker logs
command by redirecting data written to /var/log/logify/app.log
to standard output. Finally, you specify the command to run the script when the container is launched.
Save and exit your file. Then, navigate back to the parent project directory:
cd ..
Next, create a docker-compose.yml
file:
nano docker-compose.yml
In the file, add the following code to define the Bash script and Nginx services:
version: '3'
services:
logify-script:
build:
context: ./logify
image: logify:latest
container_name: logify
nginx:
image: betterstackcommunity/nginx-helloworld:latest
logging:
driver: json-file
container_name: nginx
ports:
- '80:80'
The preceding code defines two services: logify-script
and nginx
. The logify-script
service is built from the ./logify
directory context and creates an image tagged as logify:latest
. The nginx
service uses the latest version of the nginx-helloworld image and runs in a container named nginx
while logging using the json-file
logging driver. Port 80
of the host system is mapped to port 80
of the container, allowing external access to the NGINX web server running inside the container. Ensure no other services use port 80
to prevent conflicts.
Next, build an image for the Bash program and start the containers for each defined service:
docker compose up -d
The -d
option runs both containers in the background.
Check the status of the containers:
docker compose ps
You should see a "running" status under the "STATUS" column for each container:
NAME COMMAND SERVICE STATUS PORTS
logify "./logify.sh" logify-script running
nginx "/runner.sh nginx" nginx running 0.0.0.0:80->80/tcp
This confirms that both containers are running. Next, send HTTP requests to the Nginx service using the curl
command:
curl http://localhost:80/?[1-5]
Now, display the logs from both services:
docker compose logs
logify | {"status": 200, "ip": "127.0.0.1", "level": 30, "msg": "Task completed successfully", "pid": 1, "ssn": "407-01-2433", "time": 1698401389}
...
nginx | {"timestamp":"2023-10-27T10:02:37+00:00","pid":"8","remote_addr":"172.18.0.1","remote_user":"","request":"GET /?1 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1698400957.375"}
The output displays logs produced from both containers.
Now that the Bash script and Nginx services are running and generating logs, you can collect and centralize these logs using Rsyslog.
Defining the Rsyslog service with Docker Compose
In this section, you will define an Rsyslog service and deploy a Rsyslog container that reads logs from the running containers.
Start by opening the docker-compose.yml
file:
nano docker-compose.yml
Add the highlighted code to define the logify-script
service:
version: '3'
services:
logify-script:
build:
context: ./logify
image: logify:latest
container_name: logify
logging:
driver: syslog # Set the log driver to syslog
options: # Specify syslog options
syslog-address: "tcp://127.0.0.1:514"
tag: "docker/{{.Name}}"
syslog-format: rfc3164
networks:
- rsyslog-network
nginx:
image: betterstackcommunity/nginx-helloworld:latest
container_name: nginx
ports:
- '80:80'
logging:
driver: syslog # Set the log driver to syslog
options: # Specify syslog options
syslog-address: "tcp://127.0.0.1:514"
tag: "docker/{{.Name}}"
syslog-format: rfc3164
networks:
- rsyslog-network
rsyslog:
build:
context: ./rsyslog
ports:
- "514:514/tcp" # TCP port for syslog
volumes:
- ./rsyslog/rsyslog.conf:/etc/rsyslog.d/rsyslog.conf
- ./data:/var/log # this is optional
networks:
- rsyslog-network
networks:
rsyslog-network:
name: rsyslog-network
In the logify-script
and nginx
services, you set the log driver to syslog
, which sends the logs from the containers to a syslog server. In the options
block, you customize the syslog behavior: first, syslog-address
specifies the address of the syslog server; then, you set a syslog tag to identify the log source, and finally, syslog-format: rfc3164
specifies the syslog format to be RFC 3164. Next, you add the networks
section to attach both services to the rsyslog-network
.
Following that, you define an rsyslog
service, which will be built from the ./rsyslog
directory context. You then map the TCP port 514
from the host to the 514
port inside the container, allowing syslog communication over this port. Next, you mount ./rsyslog/rsyslog.conf
(which you will define soon) to /etc/rsyslog.d/rsyslog.conf
, allowing custom configuration for the Rsyslog service. Additionally, the local ./data
directory is mounted into the container at /var/log
, providing a location for storing log data.
Next, create a rsyslog
directory and move into it:
mkdir rsyslog && cd rsyslog
After that, create a Dockerfile
:
nano Dockerfile
Add the following instructions:
FROM rsyslog/rsyslog_dev_base_ubuntu:22.04_previous
USER root
RUN apt-get update \
&& apt-get install -y rsyslog-gnutls \
&& rm -rf /var/lib/apt/lists/*
RUN sed -i '/imklog/s/^/#/' /etc/rsyslog.conf
CMD ["rsyslogd", "-n"]
In this Dockerfile
, you set up the rsyslog/rsyslog_dev_base_ubuntu:22.04_previous
as the base image. You then specify to switch to the root
user, update the package index, and install the rsyslog-gnutls
package, adding TLS support that Better Stack needs.
Following that, you modify the /etc/rsyslog.conf
file to comment out the line containing imklog
, which is an input
module for kernel log messages that are not needed for a Rsyslog instance running in a container.
Finally, you specify the command to start the Rsyslog daemon in the foreground with the -n
flag.
Next, create the rsyslog.conf
file:
nano rsyslog.conf
Add the following code to receive logs from Docker containers:
global(DefaultNetstreamDriverCAFile="/etc/ssl/certs/ca-certificates.crt")
# Load modules
module(load="imtcp")
# Configure input for syslog messages over TCP
input(type="imtcp" port="514")
In this code, the first line specifies the location of the Certificate Authority (CA) file for secure connections.
Then, you set up Rsyslog to receive syslog messages over TCP on port 514, which you exposed in the docker-compose.yml
file, if you recall.
With the server set up, you will configure the destination to forward logs. In this guide, you will centralize all logs on Better Stack.
Begin by creating a free Better Stack account. Once registered, proceed to the Sources section in your dashboard:
On the Sources page, click the Connect source button:
Provide a name for your source, such as "Logify logs," and select "Rsyslog" as the platform:
After creating the source, copy the Source Token provided by Better Stack:
Now, return to your rsyslog.conf
file. Add the destination to forward the logs. Replace <your_logify_source_token>
with your actual source token in the highlighted line:
...
template(name="LogtailFormat" type="list") {
constant(value="<")
property(name="pri")
constant(value=">")
constant(value="1")
constant(value=" ")
property(name="timestamp" dateFormat="rfc3339")
constant(value=" ")
property(name="hostname")
constant(value=" ")
property(name="app-name")
constant(value=" ")
property(name="procid")
constant(value=" ")
property(name="msgid")
constant(value=" ")
property(name="structured-data" regex.expression="[^-]" regex.nomatchmode="BLANK" regex.submatch="0")
constant(value="[logtail@11993 source_token=\"<your_logify_source_token>\"]")
constant(value=" ")
property(name="msg" droplastlf="on")
}
if $syslogtag contains "docker/logify" then {
action(
type="omfwd"
protocol="tcp"
target="in.logs.betterstack.com"
port="6514"
template="LogtailFormat"
TCP_Framing="octet-counted"
StreamDriver="gtls"
StreamDriverMode="1"
StreamDriverAuthMode="x509/name"
StreamDriverPermittedPeers="*.logs.betterstack.com"
queue.spoolDirectory="/var/spool/rsyslog"
queue.filename="logtail"
queue.maxdiskspace="75m"
queue.type="LinkedList"
queue.saveonshutdown="on"
)
}
Here, you define a LogtailFormat
template to format the logs and include your source token. This ensures that only logs with the tag docker/logify
are sent to Better Stack.
After making these changes, save and exit the file.
Return to the parent directory:
cd ..
Start the Rsyslog service:
docker compose up -d
If you encounter an error like the following:
Error response from daemon: failed to create task for container: failed to initialize logging driver: dial tcp 127.0.0.1:514: connect: connection refused
Rerun the command.
After a few seconds, return to Better Stack to confirm that Rsyslog is forwarding the logs. In the screenshot below, Better Stack is shown receiving the logs from Rsyslog:
With successful forwarding of Bash program logs, it's time to forward Nginx service logs as well.
Begin by creating another source named "Nginx logs" following the same steps used for the first source. Make sure to copy and store the source information in a safe place.
When creating the Nginx source, the interface will appear as shown below:
Next, navigate to the rsyslog
directory:
cd rsyslog
Open the rsyslog.conf
configuration file with the command:
nano rsyslog.conf
Add the highlighted code to collect logs from the nginx
service and forward them to Better Stack. Ensure to update the source token in the newly added template:
global(DefaultNetstreamDriverCAFile="/etc/ssl/certs/ca-certificates.crt")
# Load modules
module(load="imtcp")
# Configure input for syslog messages over TCP
input(type="imtcp" port="514")
template(name="LogtailFormat" type="list") {
...
}
template(name="LogtailNginxFormat" type="list") {
constant(value="<")
property(name="pri")
constant(value=">")
constant(value="1")
constant(value=" ")
property(name="timestamp" dateFormat="rfc3339")
constant(value=" ")
property(name="hostname")
constant(value=" ")
property(name="app-name")
constant(value=" ")
property(name="procid")
constant(value=" ")
property(name="msgid")
constant(value=" ")
property(name="structured-data" regex.expression="[^-]" regex.nomatchmode="BLANK" regex.submatch="0")
constant(value="[logtail@11993 source_token=\"<your_nginx_token>\"]")
constant(value=" ")
property(name="msg" droplastlf="on")
}
if $syslogtag contains "docker/logify" then {
...
}
if $syslogtag contains "docker/nginx" then {
action(
type="omfwd"
protocol="tcp"
target="in.logs.betterstack.com"
port="6514"
template="LogtailNginxFormat"
TCP_Framing="octet-counted"
StreamDriver="gtls"
StreamDriverMode="1"
StreamDriverAuthMode="x509/name"
StreamDriverPermittedPeers="*.logs.betterstack.com"
queue.spoolDirectory="/var/spool/rsyslog"
queue.filename="logtail"
queue.maxdiskspace="75m"
queue.type="LinkedList"
queue.saveonshutdown="on"
)
}
After making these changes, save and exit the file.
Return to the parent directory using the following command:
cd ..
Start the services:
docker compose up -d
Finally, use curl
to send five requests to the Nginx service:
curl http://localhost:80/?[1-5]
Return to Better Stack in your browser to confirm that the "Nginx logs" source is receiving the logs.
With that, you can centralize logs to Better Stack.
Final thoughts
In this comprehensive guide, you explored Rsyslog functionality and versatility in managing logs effectively. You started by understanding the fundamentals of how Rsyslog operates. Building upon this, you used Rsyslog to read logs generated by various programs, manipulate log data by converting it into JSON format, add custom fields, and employ external programs for further processing. You then deployed a Rsyslog Docker container to collect logs from other containers and forwarded them to Better Stack.
With this knowledge, you are now well-equipped to use Rsyslog in your projects. To further enhance your skills, explore official Rsyslog documentation. Additionally, for a broader understanding of Docker logging, refer to our comprehensive guide on Docker logging.
While Rsyslog is a powerful log shipper, there are various other log shippers available. To explore alternatives and make informed decisions based on your specific requirements, refer to our log shippers guide.
Thank you and happy logging!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github