Monitoring Node.js Apps with Prometheus
Monitoring your applications is crucial for maintaining infrastructure reliability and stability. Metrics like request counts, database query times, API request durations, and bytes sent offer valuable insights into your apps' health and performance. By instrumenting your application to collect and visualize these metrics, you can understand trends, measure consumption, track error rates, and monitor performance changes. This enables you to respond swiftly to any issues.
To instrument your application, you can use Prometheus, an open-source monitoring and alerting tool that collects and exposes metrics. Prometheus includes a time series database for storing these metrics and offers a query language for dashboard visualization.
This guide will show you how to use Prometheus to monitor your system's health and performance.
Prerequisites
Before diving into this tutorial, it's important to ensure you have the following prerequisites in place:
- Node.js installed on your system.
- Basic knowledge of Node.js web frameworks, such as Fastify. We will use Fastify, but you can still follow along if you have not used it.
- Understanding of Node.js performance hooks. For more information, refer to the introduction guide.
- Docker and Docker Compose installed from step 3 onwards.
Step 1 — Downloading the demo project
To demonstrate the importance of monitoring your application, we'll use a demo project with Fastify. This application handles user login and logout with session management and serves a list of movies from an SQLite database.
Start by cloning the repository with the following command:
git clone https://github.com/betterstack-community/nodejs-prometheus-demo
Move into the newly created directory:
cd nodejs-prometheus-demo
You will find the index.js
file, which contains the following code. Soem of
the important parts of the code arein is athu
...
// In-memory user store for demo purposes
const users = {
user1: { username: "user1", password: "password1" },
user2: { username: "user2", password: "password2" },
};
// Handle login form submissions
fastify.post("/login", async (request, reply) => {
const { username, password } = request.body;
...
});
// Handle logout
fastify.post("/logout", async (request, reply) => {
...
});
fastify.get("/", async (request, reply) => {
const rows = await new Promise((resolve, reject) => {
db.all("SELECT title, release_date, tagline FROM movies",
...
});
});
return rows.splice(0, 8);
});
...
The POST /login
endpoint validates user credentials against an in-memory store
and handles user session data.
The POST /logout
endpoint deletes user session data if the user is logged in;
otherwise, it returns an error message.
In a real-world application, user credentials should be stored in a database, passwords should be hashed, and sessions should be managed using JWT, etc.
The /
endpoint retrieves movie titles, release dates, and taglines from an
SQLite database, returning the first eight results.
Now that you understand the application you will monitor, you can install all
the dependencies. The command will also install additional tools, including
nodemon
for automatically restarting the server upon saving and autocannon
for load testing:
npm install
After the packages have been installed, launch the development server:
npm run dev
You will see output similar to this:
> nodejs-prometheus-demo@1.0.0 dev
> nodemon
[nodemon] 3.1.3
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): *.*
[nodemon] watching extensions: js,mjs,cjs,json
[nodemon] starting `node index.js`
Connection with SQLite has been established
{"level":30,"time":1718533820757,"pid":14043,"hostname":"MACOOKs-MBP","msg":"Server listening at http://[::1]:3000"}
{"level":30,"time":1718533820758,"pid":14043,"hostname":"MACOOKs-MBP","msg":"Server listening at http://127.0.0.1:3000"}
server listening on 3000
Now, leave the server running and open a second terminal where you will input
subsequent commands. Once open, visit the /
endpoint with curl:
curl http://localhost:3000/
Following that, you will see the following movies:
[{"title":"Avatar","release_date":"2009-12-10","tagline":"Enter the World of Pandora."},{"title":"Pirates of the Caribbean: At World's End","release_date":"2007-05-19","tagline":"At the end of the world, the adventure begins."},{"title":"Spectre","release_date":"2015-10-26","tagline":"A Plan No One Escapes"},{"title":"The Dark Knight Rises","release_date":"2012-07-16","tagline":"The Legend Ends"},{"title":"John Carter","release_date":"2012-03-07","tagline":"Lost in our world, found in another."},{"title":"Spider-Man 3","release_date":"2007-05-01","tagline":"The battle within."},{"title":"Tangled","release_date":"2010-11-24","tagline":"They're taking adventure to new lengths."},{"title":"Avengers: Age of Ultron","release_date":"2015-04-22","tagline":"A New Age Has Come."}]
You can log in with the details in the users
variable:
curl -X POST http://localhost:3000/login \
-H "Content-Type: application/json" \
-d '{"username":"user1","password":"password1"}' \
-c cookies.txt
Upon success, you will see the following success message:
{"message":"Login successful","username":"user1"}%
You can also log out with the following command:
curl -X POST http://localhost:3000/logout \
-b cookies.txt \
-H "Content-Type: application/json" \
-d '{}'
You will see a message confirming that the logout was successful:
{"message":"Logout successful"}%
With the application set up, proceed to the next section to instrument the application.
Step 2 — Understanding metrics and Prometheus
Before you start instrumenting your application, let's take a deep dive into Prometheus and the different types of metrics it supports.
What are Metrics and why do we collect them?
Metrics are crucial for understanding the performance, health, and behavior of applications, providing real-time insights into your application's inner workings. For instance, you can track the number of requests your application handles, the latency of these requests, the error rates, and resource utilization like CPU and memory usage.
Imagine a web application that handles e-commerce transactions. By monitoring metrics, you can answer various critical questions, such as how many transactions are processed per minute and the average response time for these transactions to ensure a smooth user experience.
Prometheus supports four main types of metrics: Counter, Gauge, Histogram, and Summary.
A Counter is a cumulative metric that only increases over time, such as the
total number of processed transactions or the number of HTTP requests received.
For example, http_requests_total
is a common counter metric that helps you
understand the traffic volume to your application.
A Gauge represents a value that can go up and down, like the current number of
active sessions or the amount of memory usage. An example would be
memory_usage_bytes
, which can help you monitor how much memory your
application is consuming at any given moment, allowing you to detect and respond
to memory leaks promptly.
Histograms are useful for measuring the distribution of values, such as request
durations. They divide the observations into configurable buckets and provide
counts for each bucket, helping to identify latency patterns. For example,
http_request_duration_seconds
can show you the distribution of request times,
helping you to pinpoint performance bottlenecks.
Summaries are similar to Histograms but provide a total count of observations
and the sum of all observed values. They are particularly useful for calculating
quantiles, which can show the 95th or 99th percentile latency. For instance, a
request_duration_seconds
summary can give you detailed insights into the
distribution of request durations, including average and percentile values,
ensuring that most of your users experience acceptable performance levels.
What is Prometheus?
Prometheus is a monitoring tool composed of various components that work together to collect, store, and process metrics from applications.
The following are the three major components:
- Client Libraries: These are libraries available for many programming languages. You can use them to instrument your application and expose metrics on an endpoint.
- Prometheus Server: This scrapes the endpoints with metrics and saves the metrics in a time-series database. It provides a query language called PromQL, which you can use to query the metrics.
- Prometheus UI: You can query, analyze, and visualize the metrics in the user interface using the query language. You can also monitor the status of different targets or jobs.
- Alertmanager: This handles the management and routing of alerts that Prometheus triggers. You can create alerting rules with the same PromQL queries and forward these alerts to different receivers like Slack, Gmail, etc.
Step 3 — Setting up automatic instrumentation
The easiest metrics to start with are those that your Node.js process exposes, which include useful information such as CPU usage, memory usage, event loop utilization, and more. These metrics already provide value without requiring additional work, and you can forward them to Prometheus for analysis. Collecting and exposing these metrics is known as instrumentation.
To get started, add the following code to your index.js
file:
...
import metricsPlugin from "fastify-metrics";
const fastify = Fastify({ logger: true });
await fastify.register(metricsPlugin, { endpoint: "/metrics" });
...
In this code snippet, you configure the Fastify application to integrate the
fastify-metrics
plugin by registering the plugin. This setup creates an
/metrics
endpoint dedicated to exposing metrics data.
To use the plugin, install the fastify-metrics
package:
npm i fastify-metrics
Save the changes, and the server will automatically reload.
Then, load test the application:
npx autocannon --renderStatusCodes http://localhost:3000
When the load testing is done, you can visit the /metrics
endpoint:
curl http://localhost:3000/metrics
The output will look like this (edited for brevity):
# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 10.783692
...
# HELP nodejs_heap_space_size_available_bytes Process heap space size available from Node.js in bytes.
# TYPE nodejs_heap_space_size_available_bytes gauge
nodejs_heap_space_size_available_bytes{space="read_only"} 0
...
nodejs_heap_space_size_available_bytes{space="shared_large_object"} 0
# HELP nodejs_version_info Node.js version info.
# TYPE nodejs_version_info gauge
nodejs_version_info{version="v20.1.0",major="20",minor="1",patch="0"} 1
# HELP nodejs_gc_duration_seconds Garbage collection duration by kind, one of major, minor, incremental or weakcb.
# TYPE nodejs_gc_duration_seconds histogram
nodejs_gc_duration_seconds_bucket{le="0.001",kind="incremental"} 7
...
nodejs_gc_duration_seconds_sum{kind="minor"} 0.048222003000788397
nodejs_gc_duration_seconds_count{kind="minor"} 100
# HELP http_request_duration_seconds request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005",method="GET",route="/",status_code="200"} 0
...
http_request_duration_seconds_count{method="GET",route="/",status_code="200"} 1605
...
You can also view this output in the browser:
The output showcases metrics used to monitor the health and performance of a Node.js application.
Counters like process_cpu_user_seconds_total
track CPU usage, while gauges
like nodejs_heap_space_size_available_bytes
show available heap memory,
helping identify memory leaks.
Histograms such as nodejs_gc_duration_seconds
and
http_request_duration_seconds
provide insights into garbage collection
duration and request processing times, respectively.
.
Step 4 — Understanding Prometheus metric types
You might find that the default metrics are insufficient for your needs, so you may want to create custom metrics.
To generate custom metrics, you need to instrument your running application with a client library. In Node.js, you can use the prom-client library.
Install the prom-client
package with the following command:
npm i prom-client
With the client library installed, you can now instrument the code, starting with a Counter metric.
Counter
A counter is a metric type that increments and resets to zero on application restart, making it suitable for tracking cumulative events or occurrences that increase over time. Counters are ideal for monitoring values that only go up, such as the number of requests, completed tasks, or errors. This allows you to check the rate at which the value increases later.
In the application, you will create a counter to track the number of times the homepage is accessed, regardless of whether the user is logged in. This will provide insights into overall traffic and usage patterns.
In the index.js
file, add the highlighted code below to instrument it with a
counter metric:
...
import { register, Counter } from "prom-client";
const fastify = Fastify({ logger: true });
// Prometheus metrics
const requestCounter = new Counter({
name: "http_requests_total",
help: "Total number of HTTP requests",
labelNames: ["method", "route"],
});
...
fastify.get("/", async (request, reply) => {
requestCounter.labels(request.method, request.routerPath).inc();
...
return rows.splice(0, 8);
});
fastify.listen({ port: 3000 }, (err) => {
...
});
You begin by importing register
and Counter
from the prom-client
library.
Then, you define a requestCounter
metric to count HTTP requests. The metric is
named http_requests_total
and tracks the total number of HTTP requests, with
labels for method
(HTTP method) and route
(endpoint path) to categorize
metrics.
In the root endpoint /
, you increment the counter using
requestCounter.labels(...).inc();
, where you specify the HTTP method and the
endpoint path.
Once you have finished making the changes, save the file. Nodemon
will
automatically restart the server.
Now, you can query the http://localhost:3000/metrics
endpoint:
curl http://localhost:3000/metrics
At the beginning, you will see metrics in the output that look like this:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
...
The output provides metrics for http_requests_total
, a counter that tracks the
total number of HTTP requests. However, no value has been set because the
homepage was not visited since the server was restarted automatically.
Next, query the /
endpoint:
curl http://localhost:3000/
Then, query the /metrics
endpoint again to check for any changes:
curl http://localhost:3000/metrics
You will see:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",route="/"} 1
...
This shows that 1 HTTP GET request has been made to the route "/." The metrics
include labels specifying the HTTP method (GET
) and the path (/
).
Now, send ten more requests to the /
endpoint using the autocannon
load
testing tool:
npx autocannon --renderStatusCodes -a 10 http://localhost:3000/
This command runs load tests on the application with ten requests sent to the specified homepage.
Upon running, the output will resemble the following:
Running 10 requests test @ http://localhost:3000/
10 connections
┌─────────┬───────┬───────┬───────┬───────┬─────────┬──────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼───────┤
│ Latency │ 26 ms │ 54 ms │ 81 ms │ 81 ms │ 55.7 ms │ 18.01 ms │ 81 ms │
└─────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬───────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────┤
│ Req/Sec │ 10 │ 10 │ 10 │ 10 │ 10 │ 0 │ 10 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────┤
│ Bytes/Sec │ 9.54 kB │ 9.54 kB │ 9.54 kB │ 9.54 kB │ 9.54 kB │ 0 B │ 9.54 kB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴───────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 10 │
└──────┴───────┘
Req/Bytes counts sampled once per second.
# of samples: 1
10 requests in 1.02s, 9.54 kB read
Now, return to the http://localhost:3000/metrics
endpoint:
curl http://localhost:3000/metrics
You will see that the counter has been updated to the number of requests sent by autocannon:
...
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",route="/"} 11
Now you see that the total requests are now 11. The value will keep increasing as the number of visitors increases.
Gauge
The second metric type is a Gauge. A Gauge tracks values that can increase and decrease, making it useful for metrics such as memory usage, the number of active sessions, or items in a queue. Unlike a counter, a Gauge does not need to track the rate of change.
The application supports authentication, and in a real-world scenario, it could have hundreds of thousands of active login sessions, which can go up or down depending on user activity. To monitor this, you will use a Gauge metric.
In the index.js
file, add the following code:
...
import { register, Counter, Gauge } from "prom-client";
// Prometheus metrics
const requestCounter = new Counter({
...
});
const loginUsersGauge = new Gauge({
name: "logged_in_users",
help: "Number of currently logged-in users",
});
...
fastify.post("/login", async (request, reply) => {
...
if (user && user.password === password) {
request.session.user = { username: user.username };
loginUsersGauge.inc(); // Increment gauge on successful login
return reply.send({ message: "Login successful", username: user.username });
} else {
return reply.status(401).send({ error: "Invalid username or password" });
}
});
// Handle logout
fastify.post("/logout", async (request, reply) => {
if (request.session.user) {
loginUsersGauge.dec(); // Decrement gauge on logout
delete request.session.user;
return reply.send({ message: "Logout successful" });
} else {
return reply.status(401).send({ error: "Not logged in" });
}
});
...
Here, you import the Gauge
class from the prom-client
library to set up
metrics. Using this Gauge, you define a metric named "loggedinusers" to track
the users currently logged into your application. When a user successfully logs
in via the /login
route, you increase the gauge value with
loginUsersGauge.inc()
. Conversely, when a user logs out via the /logout
route, the gauge decreases with loginUsersGauge.dec()
. This approach allows
you to see how many users are actively engaged with your application in real
time, providing valuable insights into user interaction and application usage.
With that, save the modifications, and the server should automatically restart.
To ensure a successful login, remove the cookies.txt
file with the following
command:
rm cookies.txt
Now, first visit the /
endpoint so the counter can increment:
curl http://localhost:3000/
Then log in with one user:
curl -X POST http://localhost:3000/login \
-H "Content-Type: application/json" \
-d '{"username":"user1","password":"password1"}' \
-c cookies.txt
{"message":"Login successful","username":"user1"}
Now check the /metrics
endpoint:
curl http://localhost:3000/metrics
You will see the metrics, including the updated Gauge at the beggining:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",route="/"} 1
# HELP logged_in_users Number of currently logged-in users
# TYPE logged_in_users gauge
logged_in_users 1
...
This output shows that 1 HTTP GET request was made to the route "/" and 1 currently logged-in user.
Now, enter the following command to log out the user:
curl -X POST http://localhost:3000/logout \
-b cookies.txt \
-H "Content-Type: application/json" \
-d '{}'
{"message":"Logout successful"}%
Then check the /metrics
endpoint again:
curl http://localhost:3000/metrics
You will see the Gauge metric updated:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",route="/"} 1
# HELP logged_in_users Number of currently logged-in users
# TYPE logged_in_users gauge
logged_in_users 0
...
The number of logged-in users is 0 because no users are currently logged in. The
Gauge metric for logged_in_users
will fluctuate up and down based on the
number of active user sessions.
Histogram
Unlike counters that track the frequency of events, histograms offer a detailed breakdown within defined intervals. For instance, while a counter provides a server's total count of HTTP requests, a histogram categorizes these requests based on response times (e.g., <10ms, 10-50ms, 50-100ms, >100ms). This breakdown is essential for pinpointing performance bottlenecks, understanding variations in user experience, and optimizing server response times.
In our application, you'll use a histogram to analyze the distribution of database query durations. Instead of relying on a single average time, a histogram divides these durations into buckets (e.g., 0-100ms, 100-200ms), showing how many queries fall into each time range. This method helps identify whether most queries are fast or if there are significant numbers of slower ones.
Choosing appropriate bucket sizes is crucial for insightful histograms. For example, if you expect database query times ranging from milliseconds to seconds, using buckets like 0-100ms and 100-200ms provides a clearer view of the distribution within the expected range.
To begin, open the index.js
. Then add the following code:
import { register, Counter, Gauge, Histogram } from "prom-client";
import { performance } from "perf_hooks";
const fastify = Fastify({ logger: true });
// Prometheus metrics
...
const dbQueryDurationHistogram = new Histogram({
name: "db_query_duration_seconds",
help: "Histogram of database query durations in seconds",
labelNames: ["method", "route"],
buckets: [0.005, 0.01, 0.025, 0.05, 0.075, 0.1],
});
// Define a route for '/'
fastify.get("/", async (request, reply) => {
requestCounter.labels(request.method, request.routerPath).inc();
const dbQueryStart = performance.now();
const rows = await new Promise((resolve, reject) => {
db.all("SELECT title, release_date, tagline FROM movies", (err, rows) => {
if (err) {
console.error(err.message);
reject(err);
}
resolve(rows);
});
});
const dbQueryDuration = (performance.now() - dbQueryStart) / 1000;
dbQueryDurationHistogram
.labels(request.method, request.routerPath)
.observe(dbQueryDuration);
return rows.splice(0, 8);
});
In this code, you import the Histogram
module from the prom-client
library
and the performance
module from perf_hooks
to track database query
durations.
You then define a histogram metric, dbQueryDurationHistogram
, which records
the duration of database queries labeled by HTTP method and route, with bucket
ranges of 0.005, 0.01, 0.025, 0.05, 0.075, and 0.1 seconds. These buckets
provide a detailed query times distribution.
Within the root endpoint (/
), you record the start time of the database query
using performance.now()
. After the query, you calculate the duration by
subtracting the start time from the current time and converting it to seconds.
This duration is then recorded in the dbQueryDurationHistogram
, labelled by
HTTP method and route. This approach allows for precise monitoring of database
query performance, enabling the identification of slow queries and potential
bottlenecks.
When you are done, save the changes, and the server will automatically restart.
Load test the application with as many requests as autocannon
can send:
npx autocannon --renderStatusCodes http://localhost:3000
After the load test, you will see an output indicating the number of requests sent:
Running 10s test @ http://localhost:3000
....
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 1559 │
└──────┴───────┘
Req/Bytes counts sampled once per second.
# of samples: 10
2k requests in 10.02s, 1.49 MB read628
This time, more requests have been sent.
Now visit the /metrics
endpoint:
curl http://localhost:3000/metrics
You will see output similar to the following:
...
# HELP db_query_duration_seconds Histogram of database query durations in seconds
# TYPE db_query_duration_seconds histogram
db_query_duration_seconds_bucket{le="0.005",method="GET",route="/"} 0
db_query_duration_seconds_bucket{le="0.01",method="GET",route="/"} 17
db_query_duration_seconds_bucket{le="0.025",method="GET",route="/"} 190
db_query_duration_seconds_bucket{le="0.05",method="GET",route="/"} 1052
db_query_duration_seconds_bucket{le="0.075",method="GET",route="/"} 1542
db_query_duration_seconds_bucket{le="0.1",method="GET",route="/"} 1559
db_query_duration_seconds_bucket{le="+Inf",method="GET",route="/"} 1569
db_query_duration_seconds_sum{method="GET",route="/"} 70.72987211300241
db_query_duration_seconds_count{method="GET",route="/"} 1569
...
The output presents detailed metrics for db_query_duration_seconds
, a
histogram showing the distribution of query durations. Most queries (1,559 out
of 1,569 tracked) completed within 0.1 seconds. The histogram buckets indicate
query counts across various time intervals, such as 17 queries under 0.01
seconds and 1,542 queries under 0.075 seconds. Overall, the queries totalled
approximately 70.73 seconds of execution time.
Summary
Summary metrics and histograms share similarities in their aim to understand value distributions. However, they differ in how they calculate percentiles: summaries compute them directly on the application server, while histograms delegate this task to the Prometheus server.
This distinction has trade-offs. Summaries offer flexibility by not requiring predefined buckets, which is advantageous when the value range is uncertain. However, unlike histograms, they cannot easily aggregate data across multiple application instances, which limits their utility for analyzing system-wide behaviour involving multiple servers.
Summaries are best suited for quick percentile or average calculations, prioritizing speed over absolute precision. They work well for initial explorations or when exact value distribution is optional.
In this section, you'll create a summary metric to track HTTP response sizes in bytes. To do this, update the code in index.js as follows:
import { register, Counter, Gauge, Histogram, Summary } from "prom-client";
...
const responseSizeSummary = new Summary({
name: "http_response_size_bytes",
help: "Summary of HTTP response sizes in bytes",
labelNames: ["method", "route"],
});
// In-memory user store for demo purposes
const users = {
user1: { username: "user1", password: "password1" },
user2: { username: "user2", password: "password2" },
};
// Middleware to track response size for '/' endpoint only
const trackResponseSize = async (request, reply, payload) => {
if (payload && request.routerPath === "/") {
const responseSizeBytes = JSON.stringify(payload).length;
responseSizeSummary
.labels(request.method, request.routerPath)
.observe(responseSizeBytes);
}
};
// Apply middleware to track response size
fastify.addHook("onSend", trackResponseSize);
This code tracks the size of HTTP responses using a metric called
responseSizeSummary
. This summary-type metric efficiently calculates
percentiles and averages. It is categorized by HTTP method and route using
labels to analyze the response size distribution for different API calls.
The code defines a middleware function named trackResponseSize
, specifically
triggered for requests to the root path /
. It calculates the size of the
response payload in bytes and records this value in the responseSizeSummary
metric. The HTTP method and route are used as labels to associate the data with
the specific request type. The middleware is attached to the Fastify instance
using the onSend
hook, ensuring it runs after the response is sent and
captures the final payload size.
After that, load test the application:
npx autocannon --renderStatusCodes http://localhost:3000
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 1608 │
└──────┴───────┘
Req/Bytes counts sampled once per second.
# of samples: 10
2k requests in 10.02s, 1.53 MB read
When that finishes, query the metrics endpoint:
curl http://localhost:3000/metrics
You will see output like:
# HELP http_response_size_bytes Summary of HTTP response sizes in bytes
# TYPE http_response_size_bytes summary
http_response_size_bytes{quantile="0.01",method="GET",route="/"} 880
http_response_size_bytes{quantile="0.05",method="GET",route="/"} 880
http_response_size_bytes{quantile="0.5",method="GET",route="/"} 880
http_response_size_bytes{quantile="0.9",method="GET",route="/"} 880
http_response_size_bytes{quantile="0.95",method="GET",route="/"} 880
http_response_size_bytes{quantile="0.99",method="GET",route="/"} 880
http_response_size_bytes{quantile="0.999",method="GET",route="/"} 880
http_response_size_bytes_sum{method="GET",route="/"} 1423840
http_response_size_bytes_count{method="GET",route="/"} 1618637
In this example, all quantile values are identical at 880 bytes, indicating that
every GET response to the root path has the same size. This consistency suggests
either static data being served or a highly consistent response format. Given
that the /
endpoint returns eight results and the data likely doesn't change
frequently, a uniform response size is expected. However, if the database
contained regularly updated information, response sizes and quantile values
would likely vary more, reflecting the dynamic nature of the data.
Step 5 — Collecting performance metrics with Prometheus
So far, you have generated custom metrics and set up automatic Node.js metrics. You have been viewing them on the endpoint, but you need to set up a Prometheus server for effectiveness. The server will scrape the metrics at specified intervals and store them in a time-series database, allowing you to monitor and query them properly.
To set up Prometheus, begin by creating a prometheus.yml
configuration file in
the root directory of your project:
global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9090"]
- job_name: "movies-app"
scrape_interval: 5s
static_configs:
- targets: ["host. Docker.internal:3000"]
The prometheus.yml
configuration file sets a global scrape interval of 15
seconds for Prometheus. It includes two scrape configurations: one for the
Prometheus server itself, with a 5-second scrape interval targeting
localhost:9090
, and another for the Fastify application named "movies-app,"
also with a 5-second scrape interval, targeting host.docker.internal:3000
.
Next, create a docker-compose.yml
file with the following content to set up
Prometheus using Docker:
services:
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
This configuration sets up a Prometheus service using the prom/prometheus
image and maps port 9090
of the container to port 9090
on your host machine.
It also mounts the prometheus.yml
configuration file into the container.
Launch the service and run it in the background with the following command:
docker compose up -d
The output will look similar to the following:
[+] Running 13/13
✔ prometheus Pulled 77.3s
✔ 6ce8b87a9754 Pull complete 8.3s
✔ d2f8aae8d80e Pull complete 7.8s
...
✔ 9afdc6cdd365 Pull complete 61.5s
✔ be6157533a37 Pull complete 61.7s
[+] Running 2/2
✔ Network nodejs-prometheus-demo_default Created 0.0s
✔ Container nodejs-prometheus-demo-prometheus-1 Started 0.3s
At this point, open your preferred browser and visit
http://localhost:9090/graph
to see the Prometheus UI homepage:
Next, you need to verify that Prometheus is actively monitoring the
application's /metrics
endpoint. To do that, click on Status, then click
on Targets in the navigation menu:
You will be redirected to a page that shows that the application /metrics
endpoint is being successfully monitored by Prometheus with the label "UP" next
to it:
By setting up Prometheus, you now have a powerful tool to collect and visualize performance metrics from your Fastify application. This allows you to gain insights into your application's behavior and performance over time, enabling proactive monitoring and troubleshooting.
Step 6 — Querying metrics in Prometheus
While you can generate valuable metrics, you need to query them using Prometheus to extract meaningful insights.
Raw metrics data provides a foundational layer, but true understanding comes from analyzing it. Prometheus' query language, PromQL, allows you to ask specific questions about your application's performance.
In this step, you'll focus on two key metrics:
- Total Requests: Indicates how many requests your application handled during the load test, reflecting its overall capacity.
- Database Fetch Latency: Measures the average time spent fetching data from the database.
To gather meaningful data, run another load test for five minutes:
npx autocannon -d 300 --renderStatusCodes http://localhost:3000/
After the load test is complete, visit http://localhost:9090/
in your browser.
To check the total number of requests, provide the input query
http_requests_total
. The result will be similar to this:
You can also check the rate at which requests are increasing over the last two minutes with:
rate(http_requests_total[2m])
This will show you the following output when you switch to the graph tab:
You can also calculate the average duration of database requests using the following expression:
rate(db_query_duration_seconds_sum[3m]) / rate(db_query_duration_seconds_count[3m])
This query calculates the average duration over the last 3 minutes. The result will show the average duration as follows:
To compute the 99th percentile (0.95 quantile) for database durations, use this expression:
histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[2m]))
The result will provide the 99th percentile for the database query durations, as depicted in the following screenshot:
Step 7 - Setting Up alerts with Alertmanager
Now that you can query metrics to analyze the data, it's time to set up alerts. Alerts are a critical part of your monitoring and deployment workflow as they notify you of any critical issues.
Alertmanager is a tool that is part of the Prometheus stack. It allows you to receive alerts from Prometheus. You create rules that trigger alerts, and Alertmanager intercepts these alerts and sends them to configured receivers, such as Gmail, Slack, etc.
In this section, you will configure Alertmanager to forward alerts to Gmail.
To set up Alertmanager, modify the docker-compose.yml
file to include the
Alertmanager service:
services:
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
- ./rules.yml:/etc/prometheus/rules.yml
alertmanager:
image: prom/alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/alertmanager.yml
command: --config.file=/alertmanager.yml --log.level=debug
In this Docker Compose configuration, you set up the Prometheus service with
custom alerting rules by mounting a local rules.yml
file into the container.
You also define the Alertmanager service using the prom/alertmanager
image,
with a policy to restart unless stopped, and expose port 9093. This
configuration ensures that Prometheus can utilize specific alerting rules while
Alertmanager is ready to manage and route alerts based on those rules.
Next, you will specify the rules file for alerting rules in the prometheus.yml
file and configure Alertmanager:
groups:
- name: high_request_rate_alerts
rules:
- alert: HighRequestRate
expr: rate(http_requests_total[1m]) > 120
for: 1m
labels:
severity: critical
annotations:
summary: "High rate of HTTP requests"
description: "The rate of HTTP requests has exceeded 120 requests per minute over the last 1 minute."
In this configuration you define a Prometheus alerting rule group named
high_request_rate_alerts
. Within this group, you specified a rule named
HighRequestRate
to trigger an alert when the rate of HTTP requests exceeds 120
per minute over a 1-minute period. The alert is labeled as critical
and
includes annotations for a summary ("High rate of HTTP requests") and a detailed
description of the alert condition.
Next, you will specify the rules file for alerting rules in the prometheus.yml
file and configure Alertmanager:
global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9090"]
- job_name: "movies-app"
scrape_interval: 5s
static_configs:
- targets: ["host.docker.internal:3000"]
rule_files:
- "/etc/prometheus/rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- "alertmanager:9093"
In the highlighted code, you specify the location of alerting rules with
rule_files
set to /etc/prometheus/rules.yml
and configure Prometheus to send
alerts to Alertmanager at alertmanager:9093
.
With this done, you are now ready to set up a receiver, which will be Gmail. To do that, you need to configure an app password for your Gmail account so that Alertmanager can send emails.
You can do this by visiting Google My Account → Security and enabling 2-Step Verification.
After this, locate the App passwords section and create a new app password. Type "Alertmanager" or any name of your choice in the resulting text field, then click the Create button.
Now, copy the password presented in the popup dialog and save it somewhere safe, as you won't be able to see it again.
Next, return to your text editor and create the alertmanager/alertmanager.yml
file with the following contents:
global:
resolve_timeout: 1m
route:
group_by: ["alertname"]
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: "gmail-notifications"
receivers:
- name: 'gmail-notifications'
email_configs:
- to: <example2@gmail.com>
from: <example@gmail.com>
smarthost: smtp.gmail.com:587
auth_username: <example@gmail.com>
auth_identity: <example@gmail.com>
auth_password: <app_password>
send_resolved: true
Here, replace all instances of <example@gmail.com>
with the Gmail account that
Alertmanager should use to send emails. Update the to
property with the
receiver's email address. If you do not have another email address, you can use
the same email address everywhere. In the auth_password
property, replace
<app_password>
with the app password you generated with your Google account.
With this, you can stop all the services:
docker compose down
Now build new services and rebuild the existing ones with the following command:
docker compose up -d --force-recreate --no-deps --build
When the services are up, visit http://localhost:9093/#/alerts
in your
browser. You should see something like this confirming that it works.
Now, let's do a load test that will trigger the alert. With the application server still running, run the following command to perform a load test for over 300 seconds:
npx autocannon -d 300 --renderStatusCodes http://localhost:3000/
After a minute, visit http://localhost:9090/alerts?search
to see the
Prometheus alerts. You will observe that the alert is firing:
If you visit the Alertmanager interface, you will see it as well:
After that, check your email, and you will see a message containing the details that triggered the alert:
With this setup, you can successfully monitor your application and be notified of any issues.
Step 8 - Monitoring with Better Stack
Better Stack allows you to forward and store your metrics, enabling comprehensive monitoring and analysis. By leveraging Better Stack, you can create detailed dashboards that provide real-time insights into your application's performance. It also supports logging, helping you stay informed about your system's status and troubleshoot issues more effectively.
In addition, you can set up alerts to notify you of any critical issues or performance degradations, configuring alert rules based on the metrics and logs you are tracking. This setup with Better Stack enables you to monitor the health of your application, quickly identify and diagnose issues, and ensure optimal performance. It provides a comprehensive view of your system's behavior, allowing you to maintain high availability and reliability.
See the Node.js demo dashboard live.
Final thoughts
This article walked you through monitoring a Node.js application using Prometheus and Fastify. It covered generating custom metrics, setting up a Prometheus server to scrape and store these metrics, and querying them with PromQL.
For more information, refer to the Prometheus documentation. Prometheus isn't the only monitoring tool available; you can explore OpenTelemetry and see how it compares to Prometheus in this guide.
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github