How to Collect, Process, and Ship Log Data with Vector

Stanley Ulili

Updated on January 9, 2024

Prerequisites
Developing a demo logging application
Installing Vector
How Vector works
Getting started with Vector
Transforming the logs
Collecting logs from Docker containers and centralizing logs
Monitoring Vector health with Better Stack
Final thoughts

In most systems, logs are crucial in maintaining the system's health and troubleshooting issues. While application-specific log records are valuable, they often fall short when it comes to gaining comprehensive insights. To achieve a deeper understanding, you must gather and analyze logs from various sources, including Docker containers, syslog, databases, and more. This is where a log aggregator comes into play. A log aggregator is a tool designed to collect, transform, and route logs from diverse sources to a central location, enhancing your ability to analyze and troubleshoot effectively. Many log aggregators are available, such as Vector, Fluentd, and Filebeat, to mention a few. However, in this article, we will focus on Vector.

Vector is a robust open-source log aggregator developed by Datadog. It empowers you to build observability pipelines by seamlessly fetching logs from many sources, transforming the data as needed, and routing it to your preferred destination. Vector stands out for its lightweight nature, exceptional speed, and memory efficiency, mainly owing to its implementation in Rust, a programming language renowned for its memory management capabilities.

Vector offers a rich set of features commonly found in log aggregators, including support for plugins that enable integration with various data sources and destinations, real-time monitoring, and robust security features. Additionally, Vector can be configured for high availability, ensuring it can handle substantial volumes of logs without compromising performance.

This comprehensive guide will explore how to leverage Vector to collect, forward, and manage logs effectively. we'll start by building a sample application that writes logs to a file. Next, we'll walk you through using Vector to read and direct the logs to the console. Finally, we'll delve into log transformation, centralization, and monitoring to ensure the health and reliability of your Vector-based log management setup.

Prerequisites

To complete this tutorial, you will need a system with a non-root user that has sudo privileges. Optionally, you can install Docker and Docker Compose on your system. If you're unfamiliar with log shippers, you can read this article to learn more about their advantages.

Once you've met these requirements, create a root project directory to house your application, configurations, and Dockerfiles:

Copied!

mkdir log-processing-stack

This directory will serve as the foundation for your project as you progress through the tutorial.

Afterward, move into the directory:

Copied!

cd log-processing-stack

Next, create a directory dedicated to your demo application. Then move into the newly created directory:

Copied!

mkdir logify && cd logify

The fastest log
search on the planet

Better Stack lets you see inside any stack, debug any issue, and resolve any incident.

Developing a demo logging application

In this section, you will create a sample Bash script that generates logs at regular intervals.

In the logify directory, create a new file named logify.sh with the text editor of your choice:

Copied!

nano logify.sh

In your logify.sh file, and add the following code:

log-processing-stack/logify/logify.sh

Copied!

#!/bin/bash
filepath="/var/log/logify/app.log"

create_log_entry() {
    local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
    local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
    local http_status_code=200
    local ip_address="127.0.0.1"
    local emailAddress="user@mail.com"
    local level=30
    local pid=$$
    local ssn="407-01-2433"
    local time=$(date +%s)
    local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "emailAddress": "'$emailAddress'", "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "time": '$time'}'
    echo "$log"
}

while true; do
    log_record=$(create_log_entry)
    echo "${log_record}" >> "${filepath}"
    sleep 3
done

The create_log_entry() function creates a log entry in the JSON format, which includes fields such as the HTTP status code, IP address, a random log message, process ID, social security number, and a timestamp. The script then enters an infinite loop, repeatedly calling this function to generate the log entries and appending them to the specified log file in the /var/log/logify directory.

Note that while this example includes personal information, such as email addresses, social security numbers, and IP addresses, it is primarily intended for demonstration purposes. Vector can filter out sensitive data by either removing personal information fields or redacting them, which is crucial for maintaining data privacy and security. You'll learn how to implement it later in the tutorial.

Once you are finished, save the changes you've made to the file. Run the following command to make the script executable:

Copied!

chmod +x logify.sh

Next, create the /var/log/logify where the application will store the logs:

Copied!

sudo mkdir /var/log/logify

Change the directory ownership to the user specified in the $USER environment variable, which contains the currently logged-in user:

Copied!

sudo chown -R $USER:$USER /var/log/logify/

Now, execute the script in the background by adding & at the end:

Copied!

./logify.sh &

The bash job control system yields output that includes the process ID:

Output

[1] 2933

The process ID, which is 2933 in this case, will be used to terminate the script later.

Next, view the contents of the log file using the tail command:

Copied!

tail -n 4 /var/log/logify/app.log

Output

{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 12655, "ssn": "407-01-2433", "time": 1694551048}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 12655, "ssn": "407-01-2433", "time": 1694551051}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 12665, "ssn": "407-01-2433", "time": 1694551072}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 12665, "ssn": "407-01-2433", "time": 1694551075}

Installing Vector

Now that you can generate logs, you will install the latest version of Vector. In this article, we will install Vector on Ubuntu 22.04 through the apt package manager. If you're using a different system, you can select the appropriate option based on your operating system on the documentation page.

To add the Vector repository, use the following command:

Copied!

bash -c "$(curl -L https://setup.vector.dev)"

Install Vector with the following command:

Copied!

sudo apt install vector

Next, confirm that the installation was successful:

Copied!

vector --version

Output

vector 0.32.1 (x86_64-unknown-linux-gnu 9965884 2023-08-21 14:52:38.330227446)

When you install Vector, it will automatically launch in the background as a systemd service. However, in this tutorial, we will run Vector manually, so we don't need the service to be running. It can lead to conflicts if you intend to run Vector manually while the background service is running.

To stop the Vector service, use the following command:

Copied!

sudo systemctl stop vector

How Vector works

With Vector now installed, let's explore how it works.

Diagram showing Vector observability pipeline

To understand Vector, imagine it as a pipeline. At one end, Vector ingests raw logs and standardizes them into a unified log event format. As the log event travels through Vector, it can undergo various manipulations using "transforms" to manipulate and enhance its content. Finally, at the end of the pipeline, the log event can be sent to multiple destinations for storage or analysis.

You can define the data sources, transforms, and destinations in a configuration file at /etc/vector/vector.yaml. This configuration file is organized into the following components:

Copied!

sources:
  <unique_source_name>:
    # source configuration properties go here

transforms:
  <unique_transform_name>:
    # transform configuration properties go here

sinks:
  <unique_destination_name>:
    # sink configuration properties go here

This structure allows you to configure and customize Vector to suit your specific log aggregation and processing needs.

Let's analyze the components:

sources: this section defines the data sources that Vector should read.
transforms: specifies how the data should be manipulated or transformed.
sinks: defines the destinations where Vector should route the data.

Each component requires you to specify a plugin. For sources, the following are some of the inputs you can use:

File: fetch logs from files.
Docker Logs: gather logs from Docker containers.
Socket: collect logs sent via the socket client.
Syslog: fetches logs from Syslog.

To process the data, here are some of the transforms that can come in handy:

Remap with VRL: an expression-oriented language designed to transform your data.
Lua: use the Lua programming language to transform log events.
Filter: filter events according to the specified conditions.
Throttle: rate limit log streams.

Finally, let's look at some of the sinks available for Vector:

HTTP: forward logs to an HTTP endpoint.
WebSocket: deliver observability data to a WebSocket endpoint.
Loki: forward logs to Grafana Loki.
Elasticsearch: deliver logs to Elasticsearch.

In the next section, you will use file source to read logs from a file and forward the records to the console using the console sink.

Getting started with Vector

Now that you know how Vector works, you will configure it to read log records from the /var/log/logify/app.log file and redirect them to the console.

Open the /etc/vector/vector.yaml file and ensure you have the necessary superuser privileges:

Copied!

sudo nano /etc/vector/vector.yaml

Remove all the existing contents and add the following lines:

/etc/vector/vector.yaml

Copied!

sources:
  app_logs:
    type: "file"
    include:
      - "/var/log/logify/app.log"

sinks:
  print:
    type: "console"
    inputs:
      - "app_logs"
    encoding:
      codec: "json"

In the sources component, you define a app_logs source to reads logs from a file. The type option specifies the file source, and you define the include option, which contains the path to the file that should be read.

In the sinks component, you define a print sink, which specifies the destination to send the logs. To redirect them to the console, you set the type to the console sink. Next, you specify the source component from which the logs will originate, which is the app_logs source in this case. Finally, you specify that logs should be in JSON format using encoding.codec.

Once you have made these configurations, save the file and validate your changes in the terminal:

Copied!

sudo vector validate /etc/vector/vector.yaml

Output

√ Loaded ["/etc/vector/vector.yaml"]
√ Component configuration
√ Health check "print"
------------------------------------
                           Validated

Now you can run Vector:

Copied!

sudo vector

Upon starting, it will pick up the configuration file automatically.

If you defined vector.yaml in a different location, you need to pass the full path to the configuration file:

Copied!

sudo vector --config </path/to/vector.yaml>

When Vector starts, you will see output confirming that it has started:

Output

2023-09-12T05:56:41.803796Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-09-12T05:56:41.804202Z  WARN vector::app: DEPRECATED The openssl legacy provider provides algorithms and key sizes no longer recommended for use. Set `--openssl-legacy-provider=false` or `VECTOR_OPENSSL_LEGACY_PROVIDER=false` to disable. See https://vector.dev/highlights/2023-08-15-0-32-0-upgrade-guide/#legacy-openssl for details.
2023-09-12T05:56:41.805079Z  INFO vector::app: Loaded openssl provider. provider="legacy"
2023-09-12T05:56:41.805287Z  INFO vector::app: Loaded openssl provider. provider="default"
2023-09-12T05:56:41.806105Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.yaml"]
2023-09-12T05:56:41.809530Z  INFO vector::topology::running: Running healthchecks.
2023-09-12T05:56:41.810125Z  INFO vector: Vector has started. debug="false" version="0.32.1" arch="x86_64" revision="9965884 2023-08-21 14:52:38.330227446"
2023-09-12T05:56:41.810335Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
...

After a few seconds, you will start seeing log messages in JSON format appear at the end:

Output

{"file":"/var/log/logify/app.log","host":"vector-test","message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}","source_type":"file","timestamp":"2023-09-12T20:40:21.582883690Z"}
{"file":"/var/log/logify/app.log","host":"vector-test","message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Connected to database\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551051}","source_type":"file","timestamp":"2023-09-12T20:40:21.582980072Z"}
...

The output confirms that Vector can successfully read the log files and route the logs to the console. Vector has automatically added several fields such as file, host, message, source_type, and timestamp to each log entry for further context.

You can now press CTRL + C to exit Vector.

Transforming the logs

It's uncommon to send logs without processing them in some way. Often, you may need to enrich them with important fields, redact sensitive data, or transform plain text logs into a structured format like JSON, which is easier for machines to parse.

Vector offers a powerful language for data manipulation called Vector Remap Language (VRL). VRL is a high-performance, expression-oriented language designed for transforming data. It provides functions for parsing data, converting data types, and even includes conditional statements, among other capabilities.

In this section, you will use VRL to process data in the following ways:

Parsing JSON logs.
Removing fields.
Adding new fields.
Converting timestamps.
Redacting sensitive data.

Vector Remap Language(VRL) dot operator

Before we dive into transforming logs with VRL, let's cover some fundamentals that will help you understand how to use it efficiently.

To get familiar with the syntax, Vector provides a vector vrl subcommand, which starts a Read-Eval-Print Loop (REPL). To use it, you need to provide it with the --input option, which accepts a JSON file with log events.

First, make sure you are in the log-processing-stack/logify and create an input.json file:

Copied!

nano input.json

In your input.json file, add the following log event from the output in the last section:

log-processing-stack/logify/input.json

Copied!

{"file":"/var/log/logify/app.log","host":"vector-test","message":"{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}","source_type":"file","timestamp":"2023-09-12T20:40:21.582883690Z"}

Make sure there are no trailing spaces at the end to avoid errors.

Then, start the REPL:

Copied!

vector vrl --input input.json

Type a single dot into the REPL prompt:

Copied!

When Vector reads the log event in the input.json file, the dot operator will return the following:

Output

{ "file": "/var/log/logify/app.log", "host": "vector-test", "message": "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}", "source_type": "file", "timestamp": "2023-09-12T20:40:21.582883690Z" }

The . references the incoming event, and every event that Vector processes can be accessed using the dot notation.

To access a property, you prefix it with a . like so:

Copied!

.host

Output

"vector-host"

You can also reassign the value of . to another property:

Copied!

. = .host

Now, type the . again:

Copied!

It will no longer refer to the original object but to the "host" property:

Output

"vector-host"

Now you can exit the REPL by typing exit:

Copied!

exit

Now that you are familiar with the dot operator, you will explore VRL in more detail in the upcoming sections, starting with parsing JSON logs.

Parsing JSON logs using Vector

To begin, if you examine the log in the output closely on the message property, you will notice that even though the log entry was originally in the JSON format, Vector has converted it into a string:

Output

{
  "file": "/var/log/logify/app.log",
  "host": "vector-test",
  "message": "{\"status\": 200, \"ip\": \"127.0.0.1\", \"level\": 30, \"emailAddress\": \"user@mail.com\", \"msg\": \"Task completed successfully\", \"pid\": 12655, \"ssn\": \"407-01-2433\", \"time\": 1694551048}",
  "source_type": "file",
  "timestamp": "2023-09-12T20:40:21.582883690Z"
}

However, our goal is to have Vector parse the JSON logs. To achieve this, open the configuration file again:

Copied!

sudo nano /etc/vector/vector.yaml

Next, define a transform and set it to use the remap transform:

/etc/vector/vector.yaml

Copied!

...
transforms:
  app_logs_parser:
    inputs:
      - "app_logs"
    type: "remap"
    source: |
      # Parse JSON logs
      ., err = parse_json(.message)

sinks:
  print:
    type: "console"
    inputs:
      - "app_logs_parser"
    encoding:
      codec: "json"

You define a transform named app_logs_parser to process the logs. You specify that the input for this component should come from the source reading the records, which is app_logs here. Next, you configure the component to use the remap transform, which enables you to use the Vector Remap Language (VRL).

The source option contains the VRL syntax ., err = parse_json(.message) wrapped in triple quotes. In VRL, you need to enclose your syntax in triple quotes whenever you write VRL code.

As explored in the previous section, the . refers to the entire object Vector processes. To select a specific attribute within the object, you use a field name and prefix it with a dot. With that, here is how Vector executes ., err = parse_json(.message):

.message: returns the entire string within the message field.
parse_json(.message): The method parses the JSON data.
., err: If parsing JSON is successful, the . is set to the result of calling the parse_json() method; otherwise, the err variable is initialized.

Finally, in the sinks.print component, you update the inputs to specify that the logs now come from the transforms.app_logs_parser component.

Save the changes you have made. Without exiting the configuration file, switch to another terminal and start Vector with the watch mode:

Copied!

sudo vector --watch-config

The --watch-config option automatically restarts Vector when you save changes in the configuration file. Moving forward, you won't need to stop Vector manually; you can make configuration adjustments in another terminal, streamlining the process.

When Vector runs, you will be able to observe that the log messages are being parsed successfully:

Output

{"emailAddress":"user@mail.com","ip":"127.0.0.1","level":30,"msg":"Initialized application","pid":13611,"ssn":"407-01-2433","status":200,"time":1694551588}
...

In the output, the object has now been parsed, and we no longer see the additional fields that were added by Vector; only the logs remain. If the fields Vector added are helpful, you can replace ., err = ... with .message, err = ..... However, for brevity in the output, we will keep them removed for the rest of this tutorial.

So far, we've explored how to parse JSON logs. However, Vector also comes with parser functions for various other formats, including:

parse_csv: useful for parsing CSV log data.
parse_logfmt: helpful for parsing structured logs in the Logfmt format.
parse_syslog: suitable for parsing Syslog.
parse_grok: useful for parsing unstructured log data.

These parsers provide flexibility for handling various log formats and structures.

When working with parser functions, it's a recommended practice to address potential runtime errors. For additional details on this practice, you can refer to the runtime errors on Vector's website.

Adding and removing fields with Vector

Now that you can parse JSON logs, you will remove sensitive details, such as the emailAddress. After that, you will add a new environment field to indicate whether the logs are from production or development.

Return to the terminal where you have the /etc/vector/vector.yaml file open. Then, update the source configuration with the following lines:

/etc/vector/vector.yaml

Copied!

transforms:
  app_logs_parser:
    ...
    source: |
      # Parse JSON logs
      ., err = parse_json(.message)

      # Remove emailAddress field
      del(.emailAddress)

      # Add an environment field
      .environment = "dev"

In the above snippet, the del() function removes the emailAddress field. Afterward, a new environment field is added to the JSON object with the value dev. If the field already exists, its value would be overwritten.

After making the changes to the Vector configuration file, Vector should automatically restart. If it doesn't, you can manually restart Vector. When you do, you will see output similar to the following:

Output

{"environment":"dev","ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":13647,"ssn":"407-01-2433","status":200,"time":1694551695}
...

As you can see in the output, the emailAddress field has been deleted, and a new environment field has been added to the object.

The del() function is one of the path functions that Vector provides. Other helpful functions are listed below:

exists: helpful when you want to check if a field or an array element exists.
remove: useful when you want to remove a field whose path you don't know.
set: helpful when you want to dynamically insert a value into an object or array.

That takes care of modifying attributes on a log event. In the next section, you will format dates using Vector.

Formatting dates with Vector

The application produces logs in Unix timestamps, representing the number of seconds elapsed since January 1st, 1970, at 00:00:00 UTC. To make the timestamps human-readable, you must convert them into a readable format.

In the configuration file, add the following lines:

/etc/vector/vector.yaml

Copied!

transforms:
  app_logs_parser:
    ...
    source: |
      # Parse JSON logs
      ., err = parse_json(.message)

      # Remove emailAddress field
      del(.emailAddress)

      # Add an environment field
      .environment = "dev"

      # Format date to the ISO format
      .time = from_unix_timestamp!(.time)
      .time = format_timestamp!(.time, format: "%+")

The from_unix_timestamp!() function converts a Unix timestamp to a VRL timestamp. It's return value overwrites the time field, which is subsequently overwritten once again with the value from the format_timestamp!() function. The function formats the date in ISO format according to the %+ format directive.

You may notice that the functions end with !. This signifies that the functions are fallible, meaning they can fail and require error handling.

After saving the configuration file, Vector will reload, and you'll see output similar to the following:

Output

{"environment":"dev","ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":13691,"ssn":"407-01-2433","status":200,"time":"2023-09-12T20:49:43+00:00"}
...

The date is now in a human-readable ISO format.

The from_unix_timestamp!() function used in this section is one of the conversion functions that Vector provides. The following functions can also be helpful when converting data of various types:

to_unix_timestamp: converts a value into the Unix timestamp.
to_syslog_facility: helpful when converting values into Syslog facility code.
to_syslog_level: coerce a value into a Syslog severity level.

To better understand date formatting in logs, see our comprehensive log formatting guide.

Working with conditional statements

VRL also provides conditional statements, instructing the computer to decide based on conditions. They work similarly to other programming languages like JavaScript. In this section, you will use a conditional statement to check if the status equals 200 and add a success field if the condition evaluates to true.

To accomplish this, add the following conditional statement:

/etc/vector/vector.yaml

Copied!

transforms:
  app_logs_parser:
    ...
    source: |
      # Parse JSON logs
      ., err = parse_json(.message)

      # Remove emailAddress field
      del(.emailAddress)

      # Add an environment field
      .environment = "dev"

      # Format date to the ISO format
      .time = from_unix_timestamp!(.time)
      .time = format_timestamp!(.time, format: "%+")

      if .status == 200 {
        .success = true
      }

The if statement checks if the status equals 200 and adds a new success field.

After saving, Vector will reload, and you will see output that looks like this:

Output

{"environment":"dev","ip":"127.0.0.1","level":30,"msg":"Connected to database","pid":13722,"ssn":"407-01-2433","status":200,"success":true,"time":"2023-09-12T20:50:55+00:00"}

The success field has been added successfully to the object.

When working with conditional statements, it would be helpful to be familiar with type functions:

is_json: useful when you want to check if a value is valid JSON.
is_boolean: handy when you want to check if a value is a boolean.
is_string: helpful when you want to check if the given value is a string.

Redacting sensitive data

The log message still contains sensitive fields, like IP addresses and social security numbers. Private user information shouldn't be logged to avoid it falling into the wrong hands. Therefore, redacting sensitive data is a good practice, especially when you can't remove a field entirely. To accomplish this, Vector provides the redact() function, which can redact any data.

In the configuration, add the following code to redact the IP address and social security number:

/etc/vector/vector.yaml

Copied!

transforms:
  app_logs_parser:
    ...
    source: |
      # Parse JSON logs
      ., err = parse_json(.message)

      # Remove emailAddress field
      del(.emailAddress)

      # Add an environment field
      .environment = "dev"

      # Format date to the ISO format
      .time = from_unix_timestamp!(.time)
      .time = format_timestamp!(.time, format: "%+")

      if .status == 200 {
        .success = true
      }

      # Redact field values
      . = redact(., filters: ["us_social_security_number", r'^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$'])

The redact() method takes the entire object and applies the filters. Filters can be either regular expressions (regex) or built-in filters. Currently, Vector only has one built-in filter that can redact social security numbers, us_social_security_number. For other sensitive information, you need to use regex. In this example, the regex filter matches any IPV4 IP address and redacts them.

Save the changes, and Vector will yield output that looks like this:

Output

{"environment":"dev","ip":"[REDACTED]","level":30,"msg":"Connected to database","pid":13759,"ssn":"[REDACTED]","status":200,"success":true,"time":"2023-09-12T20:52:13+00:00"}
...

You can now stop Vector, and exit the configuration file. To stop the logify.sh script, type the following command to obtain the process ID:

Copied!

jobs -l | grep "logify"

Output

[1]+  2933 Running                 ./logify.sh &

Terminate the program with its process ID:

Copied!

kill -9 2933

Now that you can transform logs, you will use Vector to collect records from multiple sources and forward them to a central location.

Collecting logs from Docker containers and centralizing logs

In this section, you will containerize the Bash script and use a Nginx hello world Docker image preconfigured to produce Nginx logs in JSON format every time it receives a request. Then, you will use Vector to collect logs from both containers and centralize the logs on Better Stack for analysis and monitoring.

Dockerizing the Bash script

In this section, you will create a Dockerfile to containerize the Bash script you wrote earlier.

Make sure you are in the log-processing-stack/logify directory. Next, create a Dockerfile, which specifies what should be included in a container when it is running:

Copied!

nano Dockerfile

In your Dockerfile, add the instructions:

log-processing-stack/logify/Dockerfile

Copied!

FROM ubuntu:latest

COPY . .

RUN chmod +x logify.sh

RUN mkdir -p /var/log/logify

RUN ln -sf /dev/stdout /var/log/logify/app.log

CMD ["./logify.sh"]

In the Dockerfile, you specify the latest Ubuntu image, copy the contents of the local directory into the container, make the script executable, and then create a dedicated directory to store the application logs. To ensure that logs are accessible, you redirect them to the standard output (stdout) using a symbolic link. Lastly, you specify the command to execute the script when the container initiates.

You will now write a docker-compose.yml file to define the Bash script and Nginx services.

First, change the directory into the root project directory:

Copied!

cd ..

Create a docker-compose.yml in your text editor:

Copied!

nano docker-compose.yml

Then add the following Docker Compose instructions:

log-processing-stack/docker-compose.yml

Copied!

version: '3'
services:
  logify-script:
    build:
      context: ./logify
    image: logify:latest
    container_name: logify
  nginx:
    image: betterstackcommunity/nginx-helloworld:latest
    logging:
      driver: json-file
    container_name: nginx
    ports:
      - '80:80'

In the configuration file, you define a logify-script service that will build an image with the name logify:latest based on the Dockerfile in the ./logify directory. You then define a nginx service to listen on port 80 for incoming HTTP requests. If there is a service currently running on port 80, you should terminate it.

To build the images and create the services, run the following command in the same directory as your docker-compose.yml file:

Copied!

docker compose up -d

The -d flag allows the containers to run in the background.

You can check the status of the containers with this command:

Copied!

docker compose ps

Output

NAME                COMMAND              SERVICE             STATUS              PORTS
logify              "./logify.sh"        logify-script       running
nginx               "/runner.sh nginx"   nginx               running             0.0.0.0:80->80/tcp, :::80->80/tcp

Send five requests to the nginx service using the curl command:

Copied!

curl http://localhost:80/?[1-5]

Following that, check the logs of the containers in your Docker Compose setup:

Copied!

docker compose logs

Output

logify  | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 1, "ssn": "407-01-2433", "time": 1695545456}
...
logify  | {"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Operation finished", "pid": 1, "ssn": "407-01-2433", "time": 1695545462}
nginx  | {"timestamp":"2023-09-12T07:10:04+00:00","pid":"8","remote_addr":"172.19.0.1","remote_user":"","request":"GET /?1 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1694502604.901"}
...
nginx  | {"timestamp":"2023-09-12T07:10:04+00:00","pid":"8","remote_addr":"172.19.0.1","remote_user":"","request":"GET /?2 HTTP/1.1","status": "200","body_bytes_sent":"11109","request_time":"0.000","http_referrer":"","http_user_agent":"curl/7.81.0","time_taken_ms":"1694502604.909"}

The output displays all the logs from the nginx and logify containers.

With your containers running and producing logs, the next step is to set up a Vector container to read and centralize these logs.

Defining the Vector service with Docker Compose

In this section, you will define the Vector service in your Docker Compose setup to collect from the existing containers and centralize logs in Better Stack. You will also create a Vector configuration file that specifies how the log records should be collected and processed.

In the root directory, open the docker-compose.yml file:

Copied!

nano docker-compose.yml

Then add the following code in the docker-compose.yml file:

log-processing-stack/docker-compose.yml

Copied!

version: '3'

services:
  ...
  vector:
    image: timberio/vector:0.32.1-debian
    volumes:
      - ./vector:/etc/vector
      - /var/run/docker.sock:/var/run/docker.sock
    command: ["-c", "/etc/vector/vector.yaml"]
    ports:
      - '8686:8686'
    container_name: vector
    depends_on:
      - logify-script
      - nginx

The vector service definition uses the official timberio/vector Docker image. It also mounts the vector directory containing the Vector configuration file into the container.

Next, create the vector directory and move into it:

Copied!

mkdir vector && cd vector

Afterward, execute the following command to obtain the Docker image names:

Copied!

docker ps

Output

CONTAINER ID   IMAGE                                          COMMAND              CREATED         STATUS         PORTS                               NAMES
fc30e4a4599f   betterstackcommunity/nginx-helloworld:latest   "/runner.sh nginx"   4 minutes ago   Up 4 minutes   0.0.0.0:80->80/tcp, :::80->80/tcp   nginx
7bf40ea91435   logify:latest             "./logify.sh"        4 minutes ago   Up 4 minutes                                       logify

Create a vector.yaml configuration file:

Copied!

nano vector.yaml

Add the code below and be sure to include the image names in the include_images option with the ones you noted earlier:

log-processing-stack/vector/vector.yaml

Copied!

sources:
  bash_logs:
    type: "docker_logs"
    include_images:
      - "logify:latest"

  nginx_logs:
    type: "docker_logs"
    include_images:
      - "betterstackcommunity/nginx-helloworld:latest"

  vector_logs:
    type: "internal_logs"

The sources.bash_logs component uses the docker_logs source to read logs from a Docker container. The include_images option tells Vector to collect logs from containers built from the logify:latest image.

The sources.nginx_logs component also reads logs from the Nginx Docker container built from the betterstackcommunity/nginx-helloworld:latest image.

Following this, the sources.vector_logs component uses the internal_logs source, which allows Vector to produce it's logs, which can be read and forwarded to destinations.

Next, you will define a destination to forward the logs. We will use Better Stack to centralize the records so you can monitor and analyze them in one place.

Before forwarding the logs, create a free Better Stack account. Once you have logged in, click the Sources link:

Screenshot pointing to the **sources** link

Once on the Sources page in Better Stack, click the Connect source button:

Screenshot that draws attention to the **Connect source** button that the user has to click

Following that, enter a source name of your choice and select "Vector" as the platform:

Screenshot showing that the name field has been filled with "Logify logs" and the Platform field has been set to "Vector"

Upon the creation of the source, copy the Source token field to the clipboard:

Screenshot showing an arrow pointing to the "Source Token" field

Next, return to the vector.yaml file and add a sink to redirect the logs to Better Stack:

log-processing-stack/vector/vector.yaml

Copied!

...
sinks:
  better_stack_http_sink_bash:
    type: "http"
    method: "post"
    inputs:
      - "bash_logs"
    uri: "https://in.logs.betterstack.com/"
    encoding:
      codec: "json"
    auth:
      strategy: "bearer"
      token: "<your_bash_source_token>"

Save and exit the configuration file.

Go back to the root directory:

Copied!

cd ..

Enter the following command to create the Vector image and start the container:

Copied!

docker compose up -d

After waiting for a few seconds, return to Better Stack to check if the logs have been successfully sent:

Screenshot of the logs centralized in Better Stack

Now that the Bash script logs are centralized, you can follow similar steps to create two more sources for Nginx and Vector logs. Make sure to keep the source tokens in a safe place. Once successfully completed, your interface will look like this:

Screenshot of Better Stack with three sources

Following that, open the vector.yaml file again:

Copied!

nano vector/vector.yaml

Add two sinks and update the tokens accordingly to ensure logs from Nginx and Vector are correctly sent to Better Stack:

log-processing-stack/vector/vector.yaml

Copied!

...
sinks:
  better_stack_http_sink_bash:
  ...
  better_stack_http_sink_nginx:
    type: "http"
    method: "post"
    inputs:
      - "nginx_logs"
    uri: "https://in.logs.betterstack.com/"
    encoding:
      codec: "json"
    auth:
      strategy: "bearer"
      token: "<your_nginx_source_token>"

  better_stack_http_sink_vector:
    type: "http"
    method: "post"
    inputs:
      - "vector_logs"
    uri: "https://in.logs.betterstack.com/"
    encoding:
      codec: "json"
    auth:
      strategy: "bearer"
      token: "<your_vector_source_token>"

Save the file and run the command once more:

Copied!

docker compose up -d

Now send five requests to the nginx service again:

Copied!

curl http://localhost:80/?[1-5]

The logs from Nginx will be successfully uploaded to Better Stack:

Screenshot of Nginx logs in Better Stack

To see if Vector logs are being uploaded, stop all the containers:

Copied!

docker compose stop

Start the containers again:

Copied!

docker compose up -d

The Vector logs will be uploaded to Better Stack:

Screenshot of Vector logs in Better Stack

With that, you can centralize your application, Nginx and Vector logs.

Monitoring Vector health with Better Stack

Vector provides a /health endpoint that tools like Better Stack can periodically check. If Vector becomes unhealthy or goes down, you can configure Better Stack to send alerts through phone or email, enabling you to address any issues promptly.

To set up health monitoring for Vector, open the vector.yaml file:

Copied!

nano vector/vector.yaml

Then add the following code at the top of the configuration file:

log-processing-stack/vector/vector.yaml

Copied!

api:
  enabled: true
  address: "0.0.0.0:8686"
...

This configuration enables the API and makes the /health endpoint accessible online.

For these changes to take effect, stop and discard the containers:

Copied!

docker compose down

Start the containers again:

Copied!

docker compose up -d

Verify that the /health endpoint works:

Copied!

curl http:/localhost:8686/health

Output

{"ok":true}

Assuming you have a free Better Stack account, log in to Better Stack.

On the Monitors page, click the Create monitor button:

On the monitors page, which allows a user to create a monitor

Next, enter the relevant details and then click the Create monitor button:

Screenshot of Better Stack with the necessary options set

In the screenshot, you select the option to trigger Better Stack and provide the server's IP address or domain name with the endpoint on port 8686. In addition, you choose how you want to be notified.

At this point, Better Stack will start monitoring the endpoint and provide performance statistics:

Screenshot of Better Stack monitoring the endpoint

Let's see what will happen if the endpoint stops working. To do that, stop the services:

Copied!

docker compose stop

After a minute or two passes, you will see that Better Stack will update the status to "Down":

If you chose to be notified by email, you will receive an email alert:

Screenshot of the Better Stack email alert

Final thoughts

In this comprehensive article, we delved deep into Vector and set up a log processing stack using Vector, Docker, Nginx, and Better Stack. We covered various topics, from creating Vector configurations and dockerizing your Bash script and Nginx to centralizing logs with Better Stack.

With the knowledge gained, you are now well-prepared to manage logs efficiently, whether for troubleshooting, enhancing performance, or ensuring compliance with your applications and services.

To further expand your knowledge with Vector, consult the documentation. For more insights into Docker and Docker Compose, refer to their respective documentation pages: Docker and Docker Compose.

Thanks for reading, and happy logging!

Article by

Stanley Ulili

Stanley Ulili is a technical educator at Better Stack based in Malawi. He specializes in backend development and has freelanced for platforms like DigitalOcean, LogRocket, and AppSignal. Stanley is passionate about making complex topics accessible to developers.

Got an article suggestion? Let us know

How to Collect, Process, and Ship Log Data with Fluentd

Learn how to use Fluentd to collect, process, and ship log data at scale, and improve your observability and troubleshooting capabilities.

→

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us

Writer of the month

Marin Bezhanov

Marin is a software engineer and architect with a broad range of experience working...

Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github

How to Collect, Process, and Ship Log Data with Vector

Contents

Prerequisites

Developing a demo logging application

Installing Vector

How Vector works

Getting started with Vector

Transforming the logs

Vector Remap Language(VRL) dot operator

Parsing JSON logs using Vector

Adding and removing fields with Vector

Formatting dates with Vector

Working with conditional statements

Redacting sensitive data

Collecting logs from Docker containers and centralizing logs

Dockerizing the Bash script

Defining the Vector service with Docker Compose

Monitoring Vector health with Better Stack

Final thoughts

Make your mark

Join the writer's program

Build on top of Better Stack