Monitoring Linux with Prometheus and Node Exporter
Proactive monitoring allows you to identify and address resource bottlenecks, errors, and unexpected load spikes before they impact your users.
This requires a monitoring system capable of collecting and analyzing metrics from various sources, including servers, network devices, applications, and containers.
Prometheus is a popular open-source monitoring system whose core is a time-series database for storing metrics. It is complemented by additional components like exporters for gathering data from various sources, and the Alertmanager for managing and dispatching alerts.
This guide focuses on setting up Prometheus and Node Exporter on Linux servers to monitor system-level metrics. Node Exporter is a lightweight agent that exposes a wealth of hardware and kernel-related metrics for monitoring server health and performance. For Windows environments, the analogous Windows Exporter serves a similar purpose.
By following the steps outlined in this guide, you'll gain the foundational knowledge to effectively monitor your Linux-based infrastructure with Prometheus and Node Exporter.
Let's get started!
Prerequisites
- Basic familiarity with Prometheus monitoring.
- Docker and Docker Compose installed.
Setting up Prometheus
The easiest way to set up Prometheus is through its official Docker image. Here's the basic Docker Compose configuration you need to get started:
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
volumes:
- prometheus_data:/prometheus
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
expose:
- 9090
ports:
- 9090:9090
networks:
- host-monitoring
networks:
monitoring:
driver: bridge
volumes:
prometheus_data:
Prometheus is set up to run on http://localhost:9090
with its default
configuration. You can start the service with:
docker compose up -d
It will launch accordingly:
[+] Running 3/3
✔ Network prometheus-node-exporter_host-monitoring Created 0.2s
✔ Volume "prometheus-node-exporter_prometheus_data" Created 0.0s
✔ Container prometheus Created 0.1s
You can head over to http://localhost:9090
in your browser to see the
Prometheus interface.
By default, Prometheus is set up to collect its own metrics. You can confirm
this by typing http://localhost:9090/target
in your browser address bar:
In the next step you will now set up the node exporter to collect host metrics with a custom configuration file.
Setting up Node Exporter
To set up Node Exporter, you can use its official Docker image or download a binary archive here.
Let's set it up with Docker Compose:
services:
. . .
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- --path.procfs=/host/proc
- --path.rootfs=/rootfs
- --path.sysfs=/host/sys
- --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
ports:
- 9100:9100
networks:
- host-monitoring
networks:
host-monitoring:
volumes:
prometheus_data:
This sets up a node-exporter
service that listens on port 9100
(its default
port) and gathers its data from the host system.
The volumes
section mounts the host's /proc
, /sys
, and /
directories
into the container, which allows information about the host's system resources,
such as CPU usage, memory usage, and disk I/O.
The command section specifies the arguments to be passed to the node-exporter
binary. These arguments tell the node-exporter
where to find the host's system
information so that it does not read the container's details instead.
You also need to set up a Prometheus configuration file so that it can scrape
the node-exporter
metrics:
global:
scrape_interval: 10s
scrape_configs:
- job_name: node-exporter
static_configs:
- targets:
- 'node-exporter:9100'
The address is specified as node-exporter:9100
, which assumes that the Node
Exporter is running as a container named node-exporter
in the same Docker
network and is listening on port 9100.
You'll need to modify the prometheus
service to override the default
configuration with:
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
volumes:
- prometheus_data:/prometheus
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- 9090:9090
networks:
- host-monitoring
. . .
Once you're all set, you can recreate both services with:
docker compose up -d --force-recreate
[+] Running 2/2
✔ Container node-exporter Started 0.9s
✔ Container prometheus Started 0.9s
Return to the Prometheus' target page and you'll see that the node-exporter
service is now being scraped:
You can also visit the Node Exporter metrics page to view the raw data being scraped in Prometheus:
http://localhost:9100/metrics
Configuring Node Exporter collectors
The Node Exporter exposes a host of hardware and OS-level metrics through various collectors. These are components within the exporter that gather specific types of metrics about the system.
Some of the collectors built into the Node Exporter include:
cpu
: Gathers metrics about CPU usage, such as idle time, user time, system time, and interrupts.meminfo
: Collects information about memory usage, including total memory, free memory, and used memory.diskstats
: Exposes disk I/O operations.filesystem
: Provides data on filesystem usage, such as available space and inodes.time
: Reports system time information.processes
: Gathers information about running processes, such as their number and states.
There are over 70 collectors in Node exporter. You can find a complete list in the Node Exporter documentation along with their descriptions and supported operating systems.
The majority of the available exporters are enabled by default, but some are disabled due to high cardinality or significant resource demands on the host.
You can enable a collector by providing a --collector.<name>
flag, and you can
disable one by using --no-collector.<name>
instead.
If you'd like to disable all the default collectors and only enable specific ones, you can combine the flags below:
--collector.disable-defaults --collector.<name> . . .
For example, to monitor overall process metrics, you must enable the processes
collector with --collector.processes
which aggregates them from /proc
:
. . .
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- --path.procfs=/host/proc
- --path.rootfs=/rootfs
- --path.sysfs=/host/sys
- --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
- --collector.processes
ports:
- 9100:9100
networks:
- host-monitoring
Once you recreate the services, you'll start seeing the metrics with the
node_processes
prefix such as node_processes_state
, node_processes_pid
,
and others.
A few collectors also can be configured to include or exclude certain patterns using dedicated flags. You'll notice the following flag in your Docker Compose configuration:
--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
This configures the filesystem
collector to exclude specific mount points from
its metrics collection including /sys
, /proc
, /dev
, /host
, and /etc
.
It's necessary to specify this when running Node Exporter in a containerized environment to prevent the collection of metrics about the container's file system.
Let's look at some of the most common Node exporter metrics you need to know about.
Exploring common Node Exporter metrics
While the Node Exporter provides a wide array of metrics, some are more commonly used than others for monitoring system health and performance.
Here are some of the most common ones:
Metric Name | Type | Description |
---|---|---|
node_cpu_seconds_total |
Counter | Total CPU time spent in different modes (user , system , idle , iowait , etc.). |
node_memory_MemTotal_bytes |
Gauge | Total amount of physical memory in bytes. |
node_memory_MemFree_bytes |
Gauge | Amount of unused memory in bytes. |
node_memory_Buffers_bytes |
Gauge | Amount of memory used as buffers for I/O operations. |
node_memory_Cached_bytes |
Gauge | Amount of memory used for caching data. |
node_filesystem_avail_bytes |
Gauge | Amount of free space available to non-root users on a filesystem in bytes. |
node_filesystem_size_bytes |
Gauge | The total filesystem size in bytes. |
node_filesystem_free_bytes |
Gauge | The free space on a filesystem in bytes. |
node_disk_read_bytes_total |
Counter | Number of bytes read from a disk. |
node_disk_written_bytes_total |
Counter | Number of bytes written to a disk. |
node_disk_reads_completed_total |
Counter | Total reads for a partition. |
node_disk_writes_completed_total |
Counter | Total writes for a partition. |
node_network_receive_bytes_total |
Counter | Bytes received on a network interface. |
node_network_transmit_bytes_total |
Counter | Bytes transmitted on a network interface. |
node_network_receive_packets_total |
Counter | Packets in received traffic. |
node_network_transmit_packets_total |
Counter | Packets in sent traffic. |
node_network_receive_drop_total |
Counter | Packets dropped while receiving on a network interface. |
node_network_transmit_drop_total |
Counter | Packets dropped while transmitting on a network interface. |
As you can see, the Node exporter exposes a wide range of system metrics out of the box. However, there are scenarios where you might need to expose custom metrics specific to your host such as RAID controller statistics, information about installed packages, or any other specialized metrics that are critical to your monitoring needs.
Let's talk about that next.
Exposing custom host metrics
To expose custom metrics, you can leverage the textfile
collector, which is
enabled by default in Node Exporter. This collector allows you to include custom
metrics by reading them from a set of text files that you can generate as
needed.
This approach is particularly useful for gathering data from sources that Node Exporter cannot directly access.
The text files used must:
- Follow the same text-based format that Prometheus uses.
- Have a
.prom
file extension.
When Prometheus scrapes the Node Exporter, it includes all the metrics from these files alongside the default metrics.
You can create these text files using any program or script based on the specific data you want to collect. To ensure accuracy and consistency during Prometheus scrapes, the file generation process must be atomic.
This means you should write the data to a temporary file first and then move it to the target directory. This avoids scenarios where Prometheus reads partial or incomplete data during a scrape.
To set this up, create a new directory somewhere on your filesystem and
configure Node Exporter to read .prom
files from this directory as follows:
mkdir textfiles
. . .
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- ./textfiles:/textfiles
command:
- --path.procfs=/host/proc
- --path.rootfs=/rootfs
- --path.sysfs=/host/sys
- --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
- --collector.processes
- --collector.systemd
- --collector.textfile.directory=./textfiles
ports:
- 9100:9100
networks:
- host-monitoring
With this configuration, Node Exporter will read all the .prom
files in the
textfiles
directory and include their metrics in its output. Ensure to restart
the services afterward:
docker compose up -d --force-recreate
To test this out, you can create a simple bash script that captures the system uptime in seconds:
#!/bin/bash
# Define the output directory and files
OUTPUT_DIR="textfiles"
TMP_METRIC_FILE="/tmp/uptime.prom"
METRIC_FILE="$OUTPUT_DIR/uptime.prom"
# Ensure the output directory exists
mkdir -p "$OUTPUT_DIR"
# Collect custom metrics
# Example: Capture system uptime in seconds
UPTIME_SECONDS=$(awk '{print $1}' /proc/uptime)
# Define custom metrics
METRIC_NAME="system_uptime_seconds"
METRIC_HELP="# HELP $METRIC_NAME The system uptime in seconds"
METRIC_TYPE="# TYPE $METRIC_NAME gauge"
# Write metrics to the temporary file
{
echo "$METRIC_HELP"
echo "$METRIC_TYPE"
echo "$METRIC_NAME $UPTIME_SECONDS"
} > "$TMP_METRIC_FILE"
# Move the temporary file to the final directory atomically
mv "$TMP_METRIC_FILE" "$METRIC_FILE"
# Print success message
echo "Uptime written to $METRIC_FILE"
This script collects the system's uptime and writes it as a metric to a
specified text file (textfiles/uptime.prom
).
After saving the file, make it executable with:
chmod +x uptime-script.sh
Then execute it with:
./uptime-script.sh
You will see:
Uptime written to textfiles/uptime.prom
To confirm this, view the contents of the file with:
cat textfiles/uptime.prom
You'll see the following output:
# HELP system_uptime_seconds The system uptime in seconds
# TYPE system_uptime_seconds gauge
system_uptime_seconds 1346404.27
At this point, you'll start seeing this metric in the Node Exporter /metrics
page:
You can also query the custom metric in Prometheus:
You can then use a cron job or a Systemd timer to run the script periodically so that your custom metric remains up to date:
* * * * * /path/to/script.sh # execute the script every minute
If you're creating such text files in a language with a functioning Prometheus
client library, you don't need to create the Prometheus text format manually.
You can use the WriteToTextfile()
function (or similar) to generate the text
file in the correct format.
You can also check out further examples of useful scripts that can be used with
the textfile
collector in
this GitHub repository.
Exploring common Node Exporter alerting rules
Collecting and querying Node Exporter metrics is only half the battle. To gain value from this data, you need to define alerting rules that help you detect critical issues promptly.
This section will only list a few alerting rules for host monitoring with Node Exporter. To set up these rules with Alertmanager, you'll need to read our comprehensive guide on the subject.
1. High memory usage
This alert is derived from the difference between node_memory_MemTotal_bytes
and node_memory_MemAvailable_bytes
. It alerts when memory usage exceeds 90% of
the total available memory for 2 minutes:
- alert: HostOutOfMemory
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 2m
labels:
severity: warning
annotations:
summary: Host out of memory (instance {{ $labels.instance }})
description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
2. Low disk space
This alert is triggered when the available disk space on /
is less than 10% of
the total capacity for 2 minutes:
- alert: HostOutOfDiskSpace
expr: ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 2m
labels:
severity: warning
annotations:
summary: Host out of disk space (instance {{ $labels.instance }})
description: "Disk is almost full (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
3. Node is unreachable
To detect host downtime, you can use the following rule:
# Node Down Alert
- alert: NodeDown
expr: up{job="node-exporter"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Node down ({{ $labels.instance }})"
description: "The node {{ $labels.instance }} is down for the last 2 minutes."
For more examples, see the host and hardware section of the Awesome Prometheus alerts collection.
Visualizing Node Exporter metrics in Better Stack
Once you're done setting up Node Exporter to scrape your host metrics, the next step is to visualize it in a dashboard so you can see how your hosts are performing at a glance.
While Grafana OSS is a common choice, solutions like Better Stack allow you to set query, visualize and alert on your Prometheus metrics without the complexities of self-hosting.
In this section, you'll learn how to send Prometheus metrics to Better Stack and visualize them in a dashboard. There are several options, but the most straightforward is often configuring the source to push the metrics directly rather than exposing URLs that should be scraped.
You can easily achieve this using the OpenTelemetry Collector which manages Prometheus metrics through the prometheus receiver.
To explore this, we'll replace the prometheus
service in Docker Compose with
the
Collector's contrib distribution,
then use the prometheusreceiver
to ingest the Node Exporter metrics and
subsequently forward them to Better Stack.
If you prefer the scraping approach, or want to use other agents besides the OTel collector, please refer our documentation.
To get started, sign up for a free Better Stack account and navigate to the Telemetry dashboard. From the menu on the left, select Sources and click on Connect Source:
Specify Node exporter
as the name and OpenTelemetry as the platform, then
scroll to the bottom of the page and click Connect source.
The new source will be created immediately:
You'll get a Source token that should be copied to your clipboard for the next step.
Afterwards, open up your docker-compose.yml
config file, and update it as
follows:
[docker-compose.yml]
services:
collector:
container_name: collector
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./otelcol.yaml:/etc/otelcol-contrib/config.yaml
networks:
- host-monitoring
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- ./textfiles:/textfiles:ro
command:
- --path.procfs=/host/proc
- --path.rootfs=/rootfs
- --path.sysfs=/host/sys
- --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
- --collector.processes
- --collector.systemd
- --collector.textfile.directory=/textfiles
ports:
- 9100:9100
networks:
- host-monitoring
networks:
host-monitoring:
volumes:
prometheus_data:
The prometheus
service has been remove and replaced with the collector
service which will be configured using an otelcol.yaml
config file:
receivers:
prometheus:
config:
scrape_configs:
- job_name: node-exporter
scrape_interval: 10s
static_configs:
- targets: ['node-exporter:9100']
processors:
attributes/betterstack:
actions:
- key: better_stack_source_token
value: <your_source_token>
action: insert
batch:
exporters:
prometheusremotewrite/betterstack:
endpoint: https://in-otel.logs.betterstack.com/metrics
service:
pipelines:
metrics/betterstack:
receivers: [prometheus]
processors: [batch, attributes/betterstack]
exporters: [prometheusremotewrite/betterstack]
Here, the prometheus
receiver is configured to scrape the metrics of the
node-exporter
service every 10 seconds. This receiver is fully compatible with
the prometheus.yml
configuration, making it a drop-in replacement.
Once the metrics are collected, they're batch processed, and assigned a
better_stack_source_token
attribute, before being sent to the endpoint
specified in the prometheusremotewrite/betterstack
exporter.
Once configured, you can restart your Docker Compose services with the
--remove-orphans
flag to remove the orphaned prometheus
service that is no
longer needed:
docker compose up -d --force-recreate --remove-orphans
[+] Running 3/3
✔ Container prometheus Removed 0.3s
✔ Container collector Started 1.0s
✔ Container node-exporter Started 0.9s
After a few seconds, head over to the Better Stack dashboard and scroll to the bottom of the page to confirm that the metrics are now being received:
You're now ready to create a dashboard for your Node Exporter metrics.
To get started, select Dashboards on the top-left menu, and click Create dashboard.
In the resulting dialog, select the Host (Prometheus) template, and click Add dashboard
On the resulting dashboard, ensure that your Node exporter source is selected.
You'll then see a comprehensive overview of how your server is performing in real-time.
From here, you can organize the panels as you see fit, create new ones, and configure alerting as needed. See our documentation for more details.
Final thoughts
This concludes our comprehensive guide to setting up Prometheus and Node Exporter on your Linux servers! You should have a solid foundation for collecting and visualizing critical system metrics by now.
Remember that this is just the beginning of your Prometheus journey! There's a vast ecosystem of exporters, integrations, and advanced configurations to explore as your monitoring needs evolve.
With Better Stack's Prometheus-compatible platform, you can offload the complexities of self-hosting and focus on extracting valuable insights from your data.
Better Stack also offers a comprehensive suite of observability tools, including log management, incident management, and uptime monitoring, in our all in one centralized platform.
Take advantage of our free tier to experience the benefits firsthand and effortlessly scale your monitoring as your infrastructure grows.
Thanks for reading, and happy monitoring!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github