Monitoring Linux with Prometheus and Node Exporter
Proactive monitoring allows you to identify and address resource bottlenecks, errors, and unexpected load spikes before they impact your users.
This requires a monitoring system capable of collecting and analyzing metrics from various sources, including servers, network devices, applications, and containers.
Prometheus is a popular open-source monitoring system whose core is a time-series database for storing metrics. It is complemented by additional components like exporters for gathering data from various sources, and the Alertmanager for managing and dispatching alerts.
This guide focuses on setting up Prometheus and Node Exporter on Linux servers to monitor system-level metrics. Node Exporter is a lightweight agent that exposes a wealth of hardware and kernel-related metrics for monitoring server health and performance. For Windows environments, the analogous Windows Exporter serves a similar purpose.
By following the steps outlined in this guide, you'll gain the foundational knowledge to effectively monitor your Linux-based infrastructure with Prometheus and Node Exporter.
Let's get started!
Prerequisites
- Basic familiarity with Prometheus monitoring.
- Docker and Docker Compose installed.
Setting up Prometheus
The easiest way to set up Prometheus is through its official Docker image. Here's the basic Docker Compose configuration you need to get started:
Prometheus is set up to run on http://localhost:9090 with its default
configuration. You can start the service with:
It will launch accordingly:
You can head over to http://localhost:9090 in your browser to see the
Prometheus interface.
By default, Prometheus is set up to collect its own metrics. You can confirm
this by typing http://localhost:9090/target in your browser address bar:
In the next step you will now set up the node exporter to collect host metrics with a custom configuration file.
Setting up Node Exporter
To set up Node Exporter, you can use its official Docker image or download a binary archive here.
Let's set it up with Docker Compose:
This sets up a node-exporter service that listens on port 9100 (its default
port) and gathers its data from the host system.
The volumes section mounts the host's /proc, /sys, and / directories
into the container, which allows information about the host's system resources,
such as CPU usage, memory usage, and disk I/O.
The command section specifies the arguments to be passed to the node-exporter
binary. These arguments tell the node-exporter where to find the host's system
information so that it does not read the container's details instead.
You also need to set up a Prometheus configuration file so that it can scrape
the node-exporter metrics:
The address is specified as node-exporter:9100, which assumes that the Node
Exporter is running as a container named node-exporter in the same Docker
network and is listening on port 9100.
You'll need to modify the prometheus service to override the default
configuration with:
Once you're all set, you can recreate both services with:
Return to the Prometheus' target page and you'll see that the node-exporter
service is now being scraped:
You can also visit the Node Exporter metrics page to view the raw data being scraped in Prometheus:
Configuring Node Exporter collectors
The Node Exporter exposes a host of hardware and OS-level metrics through various collectors. These are components within the exporter that gather specific types of metrics about the system.
Some of the collectors built into the Node Exporter include:
cpu: Gathers metrics about CPU usage, such as idle time, user time, system time, and interrupts.meminfo: Collects information about memory usage, including total memory, free memory, and used memory.diskstats: Exposes disk I/O operations.filesystem: Provides data on filesystem usage, such as available space and inodes.time: Reports system time information.processes: Gathers information about running processes, such as their number and states.
There are over 70 collectors in Node exporter. You can find a complete list in the Node Exporter documentation along with their descriptions and supported operating systems.
The majority of the available exporters are enabled by default, but some are disabled due to high cardinality or significant resource demands on the host.
You can enable a collector by providing a --collector.<name> flag, and you can
disable one by using --no-collector.<name> instead.
If you'd like to disable all the default collectors and only enable specific ones, you can combine the flags below:
For example, to monitor overall process metrics, you must enable the processes
collector with --collector.processes which aggregates them from /proc:
Once you recreate the services, you'll start seeing the metrics with the
node_processes prefix such as node_processes_state, node_processes_pid,
and others.
A few collectors also can be configured to include or exclude certain patterns using dedicated flags. You'll notice the following flag in your Docker Compose configuration:
This configures the filesystem collector to exclude specific mount points from
its metrics collection including /sys, /proc, /dev, /host, and /etc.
It's necessary to specify this when running Node Exporter in a containerized environment to prevent the collection of metrics about the container's file system.
Let's look at some of the most common Node exporter metrics you need to know about.
Exploring common Node Exporter metrics
While the Node Exporter provides a wide array of metrics, some are more commonly used than others for monitoring system health and performance.
Here are some of the most common ones:
| Metric Name | Type | Description |
|---|---|---|
node_cpu_seconds_total |
Counter | Total CPU time spent in different modes (user, system, idle, iowait, etc.). |
node_memory_MemTotal_bytes |
Gauge | Total amount of physical memory in bytes. |
node_memory_MemFree_bytes |
Gauge | Amount of unused memory in bytes. |
node_memory_Buffers_bytes |
Gauge | Amount of memory used as buffers for I/O operations. |
node_memory_Cached_bytes |
Gauge | Amount of memory used for caching data. |
node_filesystem_avail_bytes |
Gauge | Amount of free space available to non-root users on a filesystem in bytes. |
node_filesystem_size_bytes |
Gauge | The total filesystem size in bytes. |
node_filesystem_free_bytes |
Gauge | The free space on a filesystem in bytes. |
node_disk_read_bytes_total |
Counter | Number of bytes read from a disk. |
node_disk_written_bytes_total |
Counter | Number of bytes written to a disk. |
node_disk_reads_completed_total |
Counter | Total reads for a partition. |
node_disk_writes_completed_total |
Counter | Total writes for a partition. |
node_network_receive_bytes_total |
Counter | Bytes received on a network interface. |
node_network_transmit_bytes_total |
Counter | Bytes transmitted on a network interface. |
node_network_receive_packets_total |
Counter | Packets in received traffic. |
node_network_transmit_packets_total |
Counter | Packets in sent traffic. |
node_network_receive_drop_total |
Counter | Packets dropped while receiving on a network interface. |
node_network_transmit_drop_total |
Counter | Packets dropped while transmitting on a network interface. |
As you can see, the Node exporter exposes a wide range of system metrics out of the box. However, there are scenarios where you might need to expose custom metrics specific to your host such as RAID controller statistics, information about installed packages, or any other specialized metrics that are critical to your monitoring needs.
Let's talk about that next.
Exposing custom host metrics
To expose custom metrics, you can leverage the textfile collector, which is
enabled by default in Node Exporter. This collector allows you to include custom
metrics by reading them from a set of text files that you can generate as
needed.
This approach is particularly useful for gathering data from sources that Node Exporter cannot directly access.
The text files used must:
- Follow the same text-based format that Prometheus uses.
- Have a
.promfile extension.
When Prometheus scrapes the Node Exporter, it includes all the metrics from these files alongside the default metrics.
You can create these text files using any program or script based on the specific data you want to collect. To ensure accuracy and consistency during Prometheus scrapes, the file generation process must be atomic.
This means you should write the data to a temporary file first and then move it to the target directory. This avoids scenarios where Prometheus reads partial or incomplete data during a scrape.
To set this up, create a new directory somewhere on your filesystem and
configure Node Exporter to read .prom files from this directory as follows:
With this configuration, Node Exporter will read all the .prom files in the
textfiles directory and include their metrics in its output. Ensure to restart
the services afterward:
To test this out, you can create a simple bash script that captures the system uptime in seconds:
This script collects the system's uptime and writes it as a metric to a
specified text file (textfiles/uptime.prom).
After saving the file, make it executable with:
Then execute it with:
You will see:
To confirm this, view the contents of the file with:
You'll see the following output:
At this point, you'll start seeing this metric in the Node Exporter /metrics
page:
You can also query the custom metric in Prometheus:
You can then use a cron job or a Systemd timer to run the script periodically so that your custom metric remains up to date:
If you're creating such text files in a language with a functioning Prometheus
client library, you don't need to create the Prometheus text format manually.
You can use the WriteToTextfile() function (or similar) to generate the text
file in the correct format.
You can also check out further examples of useful scripts that can be used with
the textfile collector in
this GitHub repository.
Exploring common Node Exporter alerting rules
Collecting and querying Node Exporter metrics is only half the battle. To gain value from this data, you need to define alerting rules that help you detect critical issues promptly.
This section will only list a few alerting rules for host monitoring with Node Exporter. To set up these rules with Alertmanager, you'll need to read our comprehensive guide on the subject.
1. High memory usage
This alert is derived from the difference between node_memory_MemTotal_bytes
and node_memory_MemAvailable_bytes. It alerts when memory usage exceeds 90% of
the total available memory for 2 minutes:
2. Low disk space
This alert is triggered when the available disk space on / is less than 10% of
the total capacity for 2 minutes:
3. Node is unreachable
To detect host downtime, you can use the following rule:
For more examples, see the host and hardware section of the Awesome Prometheus alerts collection.
Visualizing Node Exporter metrics in Better Stack
Once you're done setting up Node Exporter to scrape your host metrics, the next step is to visualize it in a dashboard so you can see how your hosts are performing at a glance.
While Grafana OSS is a common choice, solutions like Better Stack allow you to set query, visualize and alert on your Prometheus metrics without the complexities of self-hosting.
In this section, you'll learn how to send Prometheus metrics to Better Stack and visualize them in a dashboard. There are several options, but the most straightforward is often configuring the source to push the metrics directly rather than exposing URLs that should be scraped.
You can easily achieve this using the OpenTelemetry Collector which manages Prometheus metrics through the prometheus receiver.
To explore this, we'll replace the prometheus service in Docker Compose with
the
Collector's contrib distribution,
then use the prometheusreceiver to ingest the Node Exporter metrics and
subsequently forward them to Better Stack.
If you prefer the scraping approach, or want to use other agents besides the OTel collector, please refer our documentation.
To get started, sign up for a free Better Stack account and navigate to the Telemetry dashboard. From the menu on the left, select Sources and click on Connect Source:
Specify Node exporter as the name and OpenTelemetry as the platform, then
scroll to the bottom of the page and click Connect source.
The new source will be created immediately:
You'll get a Source token that should be copied to your clipboard for the next step.
Afterwards, open up your docker-compose.yml config file, and update it as
follows:
The prometheus service has been remove and replaced with the collector
service which will be configured using an otelcol.yaml config file:
Here, the prometheus receiver is configured to scrape the metrics of the
node-exporter service every 10 seconds. This receiver is fully compatible with
the prometheus.yml configuration, making it a drop-in replacement.
Once the metrics are collected, they're batch processed, and assigned a
better_stack_source_token attribute, before being sent to the endpoint
specified in the prometheusremotewrite/betterstack exporter.
Once configured, you can restart your Docker Compose services with the
--remove-orphans flag to remove the orphaned prometheus service that is no
longer needed:
After a few seconds, head over to the Better Stack dashboard and scroll to the bottom of the page to confirm that the metrics are now being received:
You're now ready to create a dashboard for your Node Exporter metrics.
To get started, select Dashboards on the top-left menu, and click Create dashboard.
In the resulting dialog, select the Host (Prometheus) template, and click Add dashboard
On the resulting dashboard, ensure that your Node exporter source is selected.
You'll then see a comprehensive overview of how your server is performing in real-time.
From here, you can organize the panels as you see fit, create new ones, and configure alerting as needed. See our documentation for more details.
Final thoughts
This concludes our comprehensive guide to setting up Prometheus and Node Exporter on your Linux servers! You should have a solid foundation for collecting and visualizing critical system metrics by now.
Remember that this is just the beginning of your Prometheus journey! There's a vast ecosystem of exporters, integrations, and advanced configurations to explore as your monitoring needs evolve.
With Better Stack's Prometheus-compatible platform, you can offload the complexities of self-hosting and focus on extracting valuable insights from your data.
Better Stack also offers a comprehensive suite of observability tools, including log management, incident management, and uptime monitoring, in our all in one centralized platform.
Take advantage of our free tier to experience the benefits firsthand and effortlessly scale your monitoring as your infrastructure grows.
Thanks for reading, and happy monitoring!