How to Collect, Process, and Ship Log Data with Vector
In most systems, logs are crucial in maintaining the system's health and troubleshooting issues. While application-specific log records are valuable, they often fall short when it comes to gaining comprehensive insights. To achieve a deeper understanding, you must gather and analyze logs from various sources, including Docker containers, syslog, databases, and more. This is where a log aggregator comes into play. A log aggregator is a tool designed to collect, transform, and route logs from diverse sources to a central location, enhancing your ability to analyze and troubleshoot effectively. Many log aggregators are available, such as Vector, Fluentd, and Filebeat, to mention a few. However, in this article, we will focus on Vector.
Vector is a robust open-source log aggregator developed by Datadog. It empowers you to build observability pipelines by seamlessly fetching logs from many sources, transforming the data as needed, and routing it to your preferred destination. Vector stands out for its lightweight nature, exceptional speed, and memory efficiency, mainly owing to its implementation in Rust, a programming language renowned for its memory management capabilities.
Vector offers a rich set of features commonly found in log aggregators, including support for plugins that enable integration with various data sources and destinations, real-time monitoring, and robust security features. Additionally, Vector can be configured for high availability, ensuring it can handle substantial volumes of logs without compromising performance.
This comprehensive guide will explore how to leverage Vector to collect, forward, and manage logs effectively. we'll start by building a sample application that writes logs to a file. Next, we'll walk you through using Vector to read and direct the logs to the console. Finally, we'll delve into log transformation, centralization, and monitoring to ensure the health and reliability of your Vector-based log management setup.
Prerequisites
To complete this tutorial, you will need a system with a non-root user that has sudo
privileges. Optionally, you can install Docker and Docker Compose on your system. If you're unfamiliar with log shippers, you can read this article to learn more about their advantages.
Once you've met these requirements, create a root project directory to house your application, configurations, and Dockerfiles:
mkdir log-processing-stack
This directory will serve as the foundation for your project as you progress through the tutorial.
Afterward, move into the directory:
cd log-processing-stack
Next, create a directory dedicated to your demo application. Then move into the newly created directory:
mkdir logify && cd logify
Developing a demo logging application
In this section, you will create a sample Bash script that generates logs at regular intervals.
In the logify
directory, create a new file named logify.sh
with the text editor of your choice:
nano logify.sh
In your logify.sh
file, and add the following code:
#!/bin/bash
filepath="/var/log/logify/app.log"
create_log_entry() {
local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
local http_status_code=200
local ip_address="127.0.0.1"
local emailAddress="user@mail.com"
local level=30
local pid=$$
local ssn="407-01-2433"
local time=$(date +%s)
local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "emailAddress": "'$emailAddress'", "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "time": '$time'}'
echo "$log"
}
while true; do
log_record=$(create_log_entry)
echo "${log_record}" >> "${filepath}"
sleep 3
done
The create_log_entry()
function creates a log entry in the JSON format, which includes fields such as the HTTP status code, IP address, a random log message, process ID, social security number, and a timestamp. The script then enters an infinite loop, repeatedly calling this function to generate the log entries and appending them to the specified log file in the /var/log/logify
directory.
Note that while this example includes personal information, such as email addresses, social security numbers, and IP addresses, it is primarily intended for demonstration purposes. Vector can filter out sensitive data by either removing personal information fields or redacting them, which is crucial for maintaining data privacy and security. You'll learn how to implement it later in the tutorial.
Once you are finished, save the changes you've made to the file. Run the following command to make the script executable:
chmod +x logify.sh
Next, create the /var/log/logify
where the application will store the logs:
sudo mkdir /var/log/logify
Change the directory ownership to the user specified in the $USER environment variable, which contains the currently logged-in user:
sudo chown -R $USER:$USER /var/log/logify/
Now, execute the script in the background by adding &
at the end:
./logify.sh &
The bash
job control system yields output that includes the process ID:
[1] 2933
The process ID, which is 2933
in this case, will be used to terminate the script later.
Next, view the contents of the log file using the tail
command:
tail -n 4 /var/log/logify/app.log
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 12655, "ssn": "407-01-2433", "time": 1694551048}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 12655, "ssn": "407-01-2433", "time": 1694551051}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 12665, "ssn": "407-01-2433", "time": 1694551072}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 12665, "ssn": "407-01-2433", "time": 1694551075}
Installing Vector
Now that you can generate logs, you will install the latest version of Vector. In this article, we will install Vector on Ubuntu 22.04 through the apt
package manager. If you're using a different system, you can select the appropriate option based on your operating system on the documentation page.
To add the Vector repository, use the following command:
bash -c "$(curl -L https://setup.vector.dev)"
Install Vector with the following command:
sudo apt install vector
Next, confirm that the installation was successful:
vector --version
vector 0.32.1 (x86_64-unknown-linux-gnu 9965884 2023-08-21 14:52:38.330227446)
When you install Vector, it will automatically launch in the background as a systemd service. However, in this tutorial, we will run Vector manually, so we don't need the service to be running. It can lead to conflicts if you intend to run Vector manually while the background service is running.
To stop the Vector service, use the following command:
sudo systemctl stop vector
How Vector works
With Vector now installed, let's explore how it works.
To understand Vector, imagine it as a pipeline. At one end, Vector ingests raw logs and standardizes them into a unified log event format. As the log event travels through Vector, it can undergo various manipulations using "transforms" to manipulate and enhance its content. Finally, at the end of the pipeline, the log event can be sent to multiple destinations for storage or analysis.
You can define the data sources, transforms, and destinations in a configuration file at /etc/vector/vector.yaml
. This configuration file is organized into the following components:
sources:
<unique_source_name>:
# source configuration properties go here
transforms:
<unique_transform_name>:
# transform configuration properties go here
sinks:
<unique_destination_name>:
# sink configuration properties go here
This structure allows you to configure and customize Vector to suit your specific log aggregation and processing needs.
Let's analyze the components:
sources
: this section defines the data sources that Vector should read.transforms
: specifies how the data should be manipulated or transformed.sinks
: defines the destinations where Vector should route the data.
Each component requires you to specify a plugin. For sources, the following are some of the inputs you can use:
- File: fetch logs from files.
- Docker Logs: gather logs from Docker containers.
- Socket: collect logs sent via the socket client.
- Syslog: fetches logs from Syslog.
To process the data, here are some of the transforms that can come in handy:
Remap with VRL: an expression-oriented language designed to transform your data.
Lua: use the Lua programming language to transform log events.
Filter: filter events according to the specified conditions.
Throttle: rate limit log streams.
Finally, let's look at some of the sinks available for Vector:
- HTTP: forward logs to an HTTP endpoint.
- WebSocket: deliver observability data to a WebSocket endpoint.
- Loki: forward logs to Grafana Loki.
- Elasticsearch: deliver logs to Elasticsearch.
In the next section, you will use file source to read logs from a file and forward the records to the console using the console sink.
Getting started with Vector
Now that you know how Vector works, you will configure it to read log records from the /var/log/logify/app.log
file and redirect them to the console.
Open the /etc/vector/vector.yaml
file and ensure you have the necessary superuser privileges:
sudo nano /etc/vector/vector.yaml
Remove all the existing contents and add the following lines:
sources:
app_logs:
type: "file"
include:
- "/var/log/logify/app.log"
sinks:
print:
type: "console"
inputs:
- "app_logs"
encoding:
codec: "json"
In the sources
component, you define a app_logs
source to reads logs from a file. The type
option specifies the file
source, and you define the include
option, which contains the path to the file that should be read.
In the sinks
component, you define a print
sink, which specifies the destination to send the logs. To redirect them to the console, you set the type
to the console
sink. Next, you specify the source component from which the logs will originate, which is the app_logs
source in this case. Finally, you specify that logs should be in JSON format using encoding.codec
.
Once you have made these configurations, save the file and validate your changes in the terminal:
sudo vector validate /etc/vector/vector.yaml
√ Loaded ["/etc/vector/vector.yaml"]
√ Component configuration
√ Health check "print"
------------------------------------
Validated
Now you can run Vector:
sudo vector
Upon starting, it will pick up the configuration file automatically.
If you defined vector.yaml
in a different location, you need to pass the full path to the configuration file:
sudo vector --config </path/to/vector.yaml>
When Vector starts, you will see output confirming that it has started:
2023-09-12T05:56:41.803796Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-09-12T05:56:41.804202Z WARN vector::app: DEPRECATED The openssl legacy provider provides algorithms and key sizes no longer recommended for use. Set `--openssl-legacy-provider=false` or `VECTOR_OPENSSL_LEGACY_PROVIDER=false` to disable. See https://vector.dev/highlights/2023-08-15-0-32-0-upgrade-guide/#legacy-openssl for details.
2023-09-12T05:56:41.805079Z INFO vector::app: Loaded openssl provider. provider="legacy"
2023-09-12T05:56:41.805287Z INFO vector::app: Loaded openssl provider. provider="default"
2023-09-12T05:56:41.806105Z INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2023-09-12T05:56:41.809530Z INFO vector::topology::running: Running healthchecks.
2023-09-12T05:56:41.810125Z INFO vector: Vector has started. debug="false" version="0.32.1" arch="x86_64" revision="9965884 2023-08-21 14:52:38.330227446"
2023-09-12T05:56:41.810335Z INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
...
After a few seconds, you will start seeing log messages in JSON format appear at the end:
{"file":"/var/log/logify/app.log","host":"vector-test","message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}","source_type":"file","timestamp":"2023-09-12T20:40:21.582883690Z"}
{"file":"/var/log/logify/app.log","host":"vector-test","message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Connected to database\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551051}","source_type":"file","timestamp":"2023-09-12T20:40:21.582980072Z"}
...
The output confirms that Vector can successfully read the log files and route the logs to the console. Vector has automatically added several fields such as file
, host
, message
, source_type
, and timestamp
to each log entry for further context.
You can now press CTRL + C
to exit Vector.
Transforming the logs
It's uncommon to send logs without processing them in some way. Often, you may need to enrich them with important fields, redact sensitive data, or transform plain text logs into a structured format like JSON, which is easier for machines to parse.
Vector offers a powerful language for data manipulation called Vector Remap Language (VRL). VRL is a high-performance, expression-oriented language designed for transforming data. It provides functions for parsing data, converting data types, and even includes conditional statements, among other capabilities.
In this section, you will use VRL to process data in the following ways:
- Parsing JSON logs.
- Removing fields.
- Adding new fields.
- Converting timestamps.
- Redacting sensitive data.
Vector Remap Language(VRL) dot operator
Before we dive into transforming logs with VRL, let's cover some fundamentals that will help you understand how to use it efficiently.
To get familiar with the syntax, Vector provides a vector vrl
subcommand, which starts a Read-Eval-Print Loop (REPL). To use it, you need to provide it with the --input
option, which accepts a JSON file with log events.
First, make sure you are in the log-processing-stack/logify
and create an input.json
file:
nano input.json
In your input.json
file, add the following log event from the output in the last section:
{"file":"/var/log/logify/app.log","host":"vector-test","message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}","source_type":"file","timestamp":"2023-09-12T20:40:21.582883690Z"}
Make sure there are no trailing spaces at the end to avoid errors.
Then, start the REPL:
vector vrl --input input.json
Type a single dot into the REPL prompt:
.
When Vector reads the log event in the input.json
file, the dot operator will return the following:
{ "file": "/var/log/logify/app.log", "host": "vector-test", "message": "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}", "source_type": "file", "timestamp": "2023-09-12T20:40:21.582883690Z" }
The .
references the incoming event, and every event that Vector processes can be accessed using the dot notation.
To access a property, you prefix it with a .
like so:
.host
"vector-host"
You can also reassign the value of .
to another property:
. = .host
Now, type the .
again:
.
It will no longer refer to the original object but to the "host" property:
"vector-host"
Now you can exit the REPL by typing exit
:
exit
Now that you are familiar with the dot operator, you will explore VRL in more detail in the upcoming sections, starting with parsing JSON logs.
Parsing JSON logs using Vector
To begin, if you examine the log in the output closely on the message
property, you will notice that even though the log entry was originally in the JSON format, Vector has converted it into a string:
{
"file": "/var/log/logify/app.log",
"host": "vector-test",
"message": "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}",
"source_type": "file",
"timestamp": "2023-09-12T20:40:21.582883690Z"
}
However, our goal is to have Vector parse the JSON logs. To achieve this, open the configuration file again:
sudo nano /etc/vector/vector.yaml
Next, define a transform and set it to use the remap
transform:
...
transforms:
app_logs_parser:
inputs:
- "app_logs"
type: "remap"
source: |
# Parse JSON logs
., err = parse_json(.message)
sinks:
print:
type: "console"
inputs:
- "app_logs_parser"
encoding:
codec: "json"
You define a transform named app_logs_parser
to process the logs. You specify that the input for this component should come from the source reading the records, which is app_logs
here. Next, you configure the component to use the remap
transform, which enables you to use the Vector Remap Language (VRL).
The source
option contains the VRL syntax ., err = parse_json(.message)
wrapped in triple quotes. In VRL, you need to enclose your syntax in triple quotes whenever you write VRL code.
As explored in the previous section, the .
refers to the entire object Vector processes. To select a specific attribute within the object, you use a field name and prefix it with a dot. With that, here is how Vector executes ., err = parse_json(.message)
:
.message
: returns the entire string within themessage
field.parse_json(.message)
: The method parses the JSON data.., err
: If parsing JSON is successful, the.
is set to the result of calling theparse_json()
method; otherwise, theerr
variable is initialized.
Finally, in the sinks.print
component, you update the inputs
to specify that the logs now come from the transforms.app_logs_parser
component.
Save the changes you have made. Without exiting the configuration file, switch to another terminal and start Vector with the watch mode:
sudo vector --watch-config
The --watch-config
option automatically restarts Vector when you save changes in the configuration file. Moving forward, you won't need to stop Vector manually; you can make configuration adjustments in another terminal, streamlining the process.
When Vector runs, you will be able to observe that the log messages are being parsed successfully:
{"emailAddress":"user@mail.com","ip":"127.0.0.1","level":30,"msg":"Initialized application","pid":13611,"ssn":"407-01-2433","status":200,"time":1694551588}
...
In the output, the object has now been parsed, and we no longer see the additional fields that were added by Vector; only the logs remain. If the fields Vector added are helpful, you can replace ., err = ...
with .message, err = ....
. However, for brevity in the output, we will keep them removed for the rest of this tutorial.
So far, we've explored how to parse JSON logs. However, Vector also comes with parser functions for various other formats, including:
parse_csv
: useful for parsing CSV log data.parse_logfmt
: helpful for parsing structured logs in the Logfmt format.parse_syslog
: suitable for parsing Syslog.parse_grok
: useful for parsing unstructured log data.
These parsers provide flexibility for handling various log formats and structures.
When working with parser functions, it's a recommended practice to address potential runtime errors. For additional details on this practice, you can refer to the runtime errors on Vector's website.
Adding and removing fields with Vector
Now that you can parse JSON logs, you will remove sensitive details, such as the emailAddress
. After that, you will add a new environment
field to indicate whether the logs are from production or development.
Return to the terminal where you have the /etc/vector/vector.yaml
file open. Then, update the source configuration with the following lines:
transforms:
app_logs_parser:
...
source: |
# Parse JSON logs
., err = parse_json(.message)
# Remove emailAddress field
del(.emailAddress)
# Add an environment field
.environment = "dev"
In the above snippet, the del()
function removes the emailAddress
field. Afterward, a new environment
field is added to the JSON object with the value dev
. If the field already exists, its value would be overwritten.
After making the changes to the Vector configuration file, Vector should automatically restart. If it doesn't, you can manually restart Vector. When you do, you will see output similar to the following:
{"environment":"dev","ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":13647,"ssn":"407-01-2433","status":200,"time":1694551695}
...
As you can see in the output, the emailAddress
field has been deleted, and a new environment
field has been added to the object.
The del()
function is one of the path functions that Vector provides. Other helpful functions are listed below:
exists
: helpful when you want to check if a field or an array element exists.remove
: useful when you want to remove a field whose path you don't know.set
: helpful when you want to dynamically insert a value into an object or array.
That takes care of modifying attributes on a log event. In the next section, you will format dates using Vector.
Formatting dates with Vector
The application produces logs in Unix timestamps, representing the number of seconds elapsed since January 1st, 1970, at 00:00:00 UTC. To make the timestamps human-readable, you must convert them into a readable format.
In the configuration file, add the following lines:
transforms:
app_logs_parser:
...
source: |
# Parse JSON logs
., err = parse_json(.message)
# Remove emailAddress field
del(.emailAddress)
# Add an environment field
.environment = "dev"
# Format date to the ISO format
.time = from_unix_timestamp!(.time)
.time = format_timestamp!(.time, format: "%+")
The from_unix_timestamp!()
function converts a Unix timestamp to a VRL timestamp. It's return value overwrites the time field, which is subsequently overwritten once again with the value from the format_timestamp!()
function. The function formats the date in ISO format according to the %+
format directive.
You may notice that the functions end with !
. This signifies that the functions are fallible, meaning they can fail and require error handling.
After saving the configuration file, Vector will reload, and you'll see output similar to the following:
{"environment":"dev","ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":13691,"ssn":"407-01-2433","status":200,"time":"2023-09-12T20:49:43+00:00"}
...
The date is now in a human-readable ISO format.
The from_unix_timestamp!()
function used in this section is one of the conversion functions that Vector provides. The following functions can also be helpful when converting data of various types:
to_unix_timestamp
: converts a value into the Unix timestamp.to_syslog_facility
: helpful when converting values into Syslog facility code.to_syslog_level
: coerce a value into a Syslog severity level.
To better understand date formatting in logs, see our comprehensive log formatting guide.
Working with conditional statements
VRL also provides conditional statements, instructing the computer to decide based on conditions. They work similarly to other programming languages like JavaScript. In this section, you will use a conditional statement to check if the status
equals 200
and add a success
field if the condition evaluates to true
.
To accomplish this, add the following conditional statement:
transforms:
app_logs_parser:
...
source: |
# Parse JSON logs
., err = parse_json(.message)
# Remove emailAddress field
del(.emailAddress)
# Add an environment field
.environment = "dev"
# Format date to the ISO format
.time = from_unix_timestamp!(.time)
.time = format_timestamp!(.time, format: "%+")
if .status == 200 {
.success = true
}
The if
statement checks if the status
equals 200
and adds a new success
field.
After saving, Vector will reload, and you will see output that looks like this:
{"environment":"dev","ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":13722,"ssn":"407-01-2433","status":200,"success":true,"time":"2023-09-12T20:50:55+00:00"}
The success
field has been added successfully to the object.
When working with conditional statements, it would be helpful to be familiar with type functions:
is_json
: useful when you want to check if a value is valid JSON.is_boolean
: handy when you want to check if a value is a boolean.is_string
: helpful when you want to check if the given value is a string.
Redacting sensitive data
The log message still contains sensitive fields, like IP addresses and social security numbers. Private user information shouldn't be logged to avoid it falling into the wrong hands. Therefore, redacting sensitive data is a good practice, especially when you can't remove a field entirely. To accomplish this, Vector provides the redact()
function, which can redact any data.
In the configuration, add the following code to redact the IP address and social security number:
transforms:
app_logs_parser:
...
source: |
# Parse JSON logs
., err = parse_json(.message)
# Remove emailAddress field
del(.emailAddress)
# Add an environment field
.environment = "dev"
# Format date to the ISO format
.time = from_unix_timestamp!(.time)
.time = format_timestamp!(.time, format: "%+")
if .status == 200 {
.success = true
}
# Redact field values
. = redact(., filters: ["us_social_security_number", r'^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$'])
The redact()
method takes the entire object and applies the filters. Filters can be either regular expressions (regex) or built-in filters. Currently, Vector only has one built-in filter that can redact social security numbers, us_social_security_number
. For other sensitive information, you need to use regex. In this example, the regex filter matches any IPV4 IP address and redacts them.
Save the changes, and Vector will yield output that looks like this:
{"environment":"dev","ip":"[REDACTED]","level":30,"msg":"Connected to database","pid":13759,"ssn":"[REDACTED]","status":200,"success":true,"time":"2023-09-12T20:52:13+00:00"}
...
You can now stop Vector, and exit the configuration file. To stop the logify.sh
script, type the following command to obtain the process ID:
jobs -l | grep "logify"
[1]+ 2933 Running ./logify.sh &
Terminate the program with its process ID:
kill -9 2933
Now that you can transform logs, you will use Vector to collect records from multiple sources and forward them to a central location.
Collecting logs from Docker containers and centralizing logs
In this section, you will containerize the Bash script and use a Nginx hello world Docker image preconfigured to produce Nginx logs in JSON format every time it receives a request. Then, you will use Vector to collect logs from both containers and centralize the logs on Better Stack for analysis and monitoring.
Dockerizing the Bash script
In this section, you will create a Dockerfile to containerize the Bash script you wrote earlier.
Make sure you are in the log-processing-stack/logify
directory. Next, create a Dockerfile
, which specifies what should be included in a container when it is running:
nano Dockerfile
In your Dockerfile
, add the instructions:
FROM ubuntu:latest
COPY . .
RUN chmod +x logify.sh
RUN mkdir -p /var/log/logify
RUN ln -sf /dev/stdout /var/log/logify/app.log
CMD ["./logify.sh"]
In the Dockerfile, you specify the latest Ubuntu image, copy the contents of the local directory into the container, make the script executable, and then create a dedicated directory to store the application logs. To ensure that logs are accessible, you redirect them to the standard output (stdout) using a symbolic link. Lastly, you specify the command to execute the script when the container initiates.
You will now write a docker-compose.yml
file to define the Bash script and Nginx services.
First, change the directory into the root project directory:
cd ..
Create a docker-compose.yml
in your text editor:
nano docker-compose.yml
Then add the following Docker Compose instructions:
version: '3'
services:
logify-script:
build:
context: ./logify
image: logify:latest
container_name: logify
nginx:
image: betterstackcommunity/nginx-helloworld:latest
logging:
driver: json-file
container_name: nginx
ports:
- '80:80'
In the configuration file, you define a logify-script
service that will build an image with the name logify:latest
based on the Dockerfile in the ./logify
directory. You then define a nginx
service to listen on port 80 for incoming HTTP requests. If there is a service currently running on port 80, you should terminate it.
To build the images and create the services, run the following command in the same directory as your docker-compose.yml
file:
docker compose up -d
The -d
flag allows the containers to run in the background.
You can check the status of the containers with this command:
docker compose ps
NAME COMMAND SERVICE STATUS PORTS
logify "./logify.sh" logify-script running
nginx "/runner.sh nginx" nginx running 0.0.0.0:80->80/tcp, :::80->80/tcp
Send five requests to the nginx
service using the curl
command:
curl http://localhost:80/?[1-5]
Following that, check the logs of the containers in your Docker Compose setup:
docker compose logs
logify | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 1, "ssn": "407-01-2433", "time": 1695545456}
...
logify | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 1, "ssn": "407-01-2433", "time": 1695545462}
nginx | {"timestamp":"2023-09-12T07:10:04+00:00","pid":"8","remote_addr":"172.19.0.1","remote_user":"","request":"GET /?1 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1694502604.901"}
...
nginx | {"timestamp":"2023-09-12T07:10:04+00:00","pid":"8","remote_addr":"172.19.0.1","remote_user":"","request":"GET /?2 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1694502604.909"}
The output displays all the logs from the nginx
and logify
containers.
With your containers running and producing logs, the next step is to set up a Vector container to read and centralize these logs.
Defining the Vector service with Docker Compose
In this section, you will define the Vector service in your Docker Compose setup to collect from the existing containers and centralize logs in Better Stack. You will also create a Vector configuration file that specifies how the log records should be collected and processed.
In the root directory, open the docker-compose.yml
file:
nano docker-compose.yml
Then add the following code in the docker-compose.yml
file:
version: '3'
services:
...
vector:
image: timberio/vector:0.32.1-debian
volumes:
- ./vector:/etc/vector
- /var/run/docker.sock:/var/run/docker.sock
command: ["-c", "/etc/vector/vector.yaml"]
ports:
- '8686:8686'
container_name: vector
depends_on:
- logify-script
- nginx
The vector
service definition uses the official timberio/vector Docker image. It also mounts the vector
directory containing the Vector configuration file into the container.
Next, create the vector
directory and move into it:
mkdir vector && cd vector
Afterward, execute the following command to obtain the Docker image names:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fc30e4a4599f betterstackcommunity/nginx-helloworld:latest "/runner.sh nginx" 4 minutes ago Up 4 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp nginx
7bf40ea91435 logify:latest "./logify.sh" 4 minutes ago Up 4 minutes logify
Create a vector.yaml
configuration file:
nano vector.yaml
Add the code below and be sure to include the image names in the include_images
option with the ones you noted earlier:
sources:
bash_logs:
type: "docker_logs"
include_images:
- "logify:latest"
nginx_logs:
type: "docker_logs"
include_images:
- "betterstackcommunity/nginx-helloworld:latest"
vector_logs:
type: "internal_logs"
The sources.bash_logs
component uses the docker_logs
source to read logs from a Docker container. The include_images
option tells Vector to collect logs from containers built from the logify:latest
image.
The sources.nginx_logs
component also reads logs from the Nginx Docker container built from the betterstackcommunity/nginx-helloworld:latest
image.
Following this, the sources.vector_logs
component uses the internal_logs
source, which allows Vector to produce it's logs, which can be read and forwarded to destinations.
Next, you will define a destination to forward the logs. We will use Better Stack to centralize the records so you can monitor and analyze them in one place.
Before forwarding the logs, create a free Better Stack account. Once you have logged in, click the Sources link:
Once on the Sources page in Better Stack, click the Connect source button:
Following that, enter a source name of your choice and select "Vector" as the platform:
Upon the creation of the source, copy the Source token field to the clipboard:
Next, return to the vector.yaml
file and add a sink to redirect the logs to Better Stack:
...
sinks:
better_stack_http_sink_bash:
type: "http"
method: "post"
inputs:
- "bash_logs"
uri: "https://in.logs.betterstack.com/"
encoding:
codec: "json"
auth:
strategy: "bearer"
token: "<your_bash_source_token>"
Save and exit the configuration file.
Go back to the root directory:
cd ..
Enter the following command to create the Vector image and start the container:
docker compose up -d
After waiting for a few seconds, return to Better Stack to check if the logs have been successfully sent:
Now that the Bash script logs are centralized, you can follow similar steps to create two more sources for Nginx and Vector logs. Make sure to keep the source tokens in a safe place. Once successfully completed, your interface will look like this:
Following that, open the vector.yaml
file again:
nano vector/vector.yaml
Add two sinks and update the tokens accordingly to ensure logs from Nginx and Vector are correctly sent to Better Stack:
...
sinks:
better_stack_http_sink_bash:
...
better_stack_http_sink_nginx:
type: "http"
method: "post"
inputs:
- "nginx_logs"
uri: "https://in.logs.betterstack.com/"
encoding:
codec: "json"
auth:
strategy: "bearer"
token: "<your_nginx_source_token>"
better_stack_http_sink_vector:
type: "http"
method: "post"
inputs:
- "vector_logs"
uri: "https://in.logs.betterstack.com/"
encoding:
codec: "json"
auth:
strategy: "bearer"
token: "<your_vector_source_token>"
Save the file and run the command once more:
docker compose up -d
Now send five requests to the nginx
service again:
curl http://localhost:80/?[1-5]
The logs from Nginx will be successfully uploaded to Better Stack:
To see if Vector logs are being uploaded, stop all the containers:
docker compose stop
Start the containers again:
docker compose up -d
The Vector logs will be uploaded to Better Stack:
With that, you can centralize your application, Nginx and Vector logs.
Monitoring Vector health with Better Stack
Vector provides a /health
endpoint that tools like Better Stack can periodically check. If Vector becomes unhealthy or goes down, you can configure Better Stack to send alerts through phone or email, enabling you to address any issues promptly.
To set up health monitoring for Vector, open the vector.yaml
file:
nano vector/vector.yaml
Then add the following code at the top of the configuration file:
api:
enabled: true
address: "0.0.0.0:8686"
...
This configuration enables the API and makes the /health
endpoint accessible online.
For these changes to take effect, stop and discard the containers:
docker compose down
Start the containers again:
docker compose up -d
Verify that the /health
endpoint works:
curl http:/localhost:8686/health
{"ok":true}
Assuming you have a free Better Stack account, log in to Better Stack.
On the Monitors page, click the Create monitor button:
Next, enter the relevant details and then click the Create monitor button:
In the screenshot, you select the option to trigger Better Stack and provide the server's IP address or domain name with the endpoint on port 8686. In addition, you choose how you want to be notified.
At this point, Better Stack will start monitoring the endpoint and provide performance statistics:
Let's see what will happen if the endpoint stops working. To do that, stop the services:
docker compose stop
After a minute or two passes, you will see that Better Stack will update the status to "Down":
If you chose to be notified by email, you will receive an email alert:
Final thoughts
In this comprehensive article, we delved deep into Vector and set up a log processing stack using Vector, Docker, Nginx, and Better Stack. We covered various topics, from creating Vector configurations and dockerizing your Bash script and Nginx to centralizing logs with Better Stack.
With the knowledge gained, you are now well-prepared to manage logs efficiently, whether for troubleshooting, enhancing performance, or ensuring compliance with your applications and services.
To further expand your knowledge with Vector, consult the documentation. For more insights into Docker and Docker Compose, refer to their respective documentation pages: Docker and Docker Compose.
Thanks for reading, and happy logging!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github