Due to Node.js's architecture, deploying a Node.js application on a machine with multiple CPUs typically runs as a single instance on a single CPU, responsible for handling all incoming requests. This setup can lead to performance degradation under heavy traffic, with other CPUs remaining idle while one CPU bears the entire processing load.
To tackle this challenge, Node.js introduced the
cluster
module, allowing for the
deployment of multiple instances of Node.js processes to use all available CPU
cores efficiently. The module incorporates a load balancer to evenly distribute
incoming requests among these instances running on different cores.
Employing the cluster
module has the advantage of allowing Node.js to manage
increasing loads and traffic while maintaining optimal performance.
In this tutorial, we'll cover the following topics:
- Understanding the performance of a single instance under increasing load.
- Scaling Node.js using the
cluster
module. - Employing PM2 to scale Node.js.
Prerequisites
To proceed with this tutorial, ensure that your system has at least two CPUs and that you have installed the latest version of Node.js.
To check the number of CPUs available on your system, execute the following command:
nproc
You should see an output similar to:
2
Once you have confirmed these prerequisites, you can set up the demo project.
Setting up the demo project
To illustrate the concepts in this tutorial, I've prepared a sample Express app which features a single endpoint that read the contents of a text file and return a response accordingly. You'll scale this application to handle higher traffic loads through clustering in the upcoming sections.
To begin, clone the repository from GitHub:
git clone https://github.com/betterstack-community/scaling-nodejs.git
Next, navigate into the newly created directory:
cd scaling-nodejs
Next, proceed to install the necessary dependencies, comprising:
- Express: A popular web application framework for Node.js.
- nodemon: A tool for automatic restarting a server when it detects file changes.
- autocannon: A load testing tool.
npm install
The root directory contains a content.txt
file with sample text that the app
will read, as shown below:
Content read from a file
The index.js
file contains the following code that sets up an endpoint to read
the content from the content.txt
file:
import express from 'express';
import { readFile } from 'node:fs/promises';
const app = express();
const PORT = 3000;
app.get('/read-content', async (req, res) => {
try {
const data = await readFile('content.txt', 'utf8');
res.status(200).send(data);
} catch (error) {
console.error('Error reading file:', error);
res.status(500).send('Internal Server Error');
}
});
app.listen(PORT, () => {
console.log(`App (PID: ${process.pid}) is listening on port ${PORT}`);
});
This code snippet sets up an HTTP GET endpoint at that reads the contents of the
content.txt
file and responds with the file's contents. To test it out, start
the development server with the following command:
npm start
> scaling-nodejs@1.0.0 start
> nodemon index.js
[nodemon] 3.1.0
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): *.*
[nodemon] watching extensions: js,mjs,cjs,json
[nodemon] starting `node index.js`
App (PID: 97619) is listening on port 3000
Once the application starts, execute the command below in a separate terminal to test the endpoint:
curl http://localhost:3000/read-content
You'll receive a response similar to:
Content read from a file
In the next section, you'll establish the baseline performance of the application without clustering.
Establishing baseline performance without clustering
In this section, you'll measure the application's ability to handle traffic to
the /read-content
endpoint without employing clustering techniques. Your
findings will later be contrasted with the performance improvements clustering
brings later on in this tutorial.
Without clustering, the application is limited to using just one CPU which leaves the others idle as illustrated in the diagram below:
Let's see how it performs by initiating a load test through the command below:
npx autocannon -d 11 --renderStatusCodes http://localhost:3000/read-content
Autocannon dispatches as many requests as possible within 11 seconds and produces the following output:
Running 11s test @ http://localhost:3000/read-content
10 connections
┌─────────┬──────┬──────┬───────┬──────┬────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼────────┼─────────┼───────┤
│ Latency │ 0 ms │ 1 ms │ 2 ms │ 3 ms │ 0.6 ms │ 0.64 ms │ 12 ms │
└─────────┴──────┴──────┴───────┴──────┴────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬──────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼──────────┼─────────┤
│ Req/Sec │ 7,051 │ 7,051 │ 7,463 │ 10,567 │ 8,563.28 │ 1,506.29 │ 7,049 │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼──────────┼─────────┤
│ Bytes/Sec │ 1.78 MB │ 1.78 MB │ 1.88 MB │ 2.66 MB │ 2.16 MB │ 380 kB │ 1.78 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴──────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 94190 │
└──────┴───────┘
Req/Bytes counts sampled once per second.
# of samples: 11
94k requests in 11.02s, 23.7 MB read
On my test machine, the server successfully processed approximately 94k requests
to the /read-content
endpoint in 11.02 seconds, averaging 8.5k requests per
second.
These results will form the baseline against which we'll measure the impact of implementing clustering on the application's performance to highlight the potential for improved performance.
Getting started with cluster module
Now that you've established the baseline performance of the application without
clustering, let's implement clustering by deploying multiple instances across
available CPUs through the cluster
module.
With clustering, each CPU core will host an isolated Node.js instance and a load balancer will evenly distributes incoming requests among these instances to ensure that no one remains idle.
To cluster your application, create a file named cluster.js
and populate it
with the following code:
import cluster from 'node:cluster';
import { availableParallelism } from 'node:os';
import process from 'node:process';
import { dirname } from 'node:path';
import { fileURLToPath } from 'node:url';
const __dirname = dirname(fileURLToPath(import.meta.url));
const cpuCount = availableParallelism();
console.log(`Primary pid=${process.pid}`);
cluster.setupPrimary({
exec: __dirname + '/index.js',
});
for (let i = 0; i < cpuCount; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`Worker ${worker.process.pid} has terminated.`);
console.log('Initiating replacement worker.');
cluster.fork();
});
This snippet calculates the number of available CPU cores on the system using
the availableParallelism()
function and sets up the primary process details,
including the path to the application entry script (index.js
). It then uses a
loop to fork worker processes equal to the number of CPU cores, and sets up a
callback function that is executed if any of the worker processes exit.
Upon a worker's exit, a new process is launched immediately to replace the terminated one, ensuring the application continues to utilize all available CPUs effectively.
Return to the first terminal and stop the current server with Ctrl+C
, then
execute the cluster.js
file as follows
node cluster.js
You'll observe the following output:
Primary pid=111556
App (PID: 111567) is listening on port 3000
App (PID: 111569) is listening on port 3000
App (PID: 111568) is listening on port 3000
App (PID: 111570) is listening on port 3000
The output indicates five currently operational processes: one primary process,
the script you executed, and four worker processes spawned from it. Each worker
process is listening on the same port, 3000
.
The cluster
module uses the
Round Robin Load Balancing
method to distribute the requests to the worker process. This approach involves
the primary process, which receives all incoming requests, acting as the
dispatcher and forwarding each incoming request to each worker process
sequentially. This process ensures an equal and efficient distribution of
requests, preventing the overloading of any single process while keeping others
idle.
In the second terminal, let's retest the endpoint under the same conditions to gauge performance improvements:
npx autocannon --renderStatusCodes http://localhost:3000/read-content
The test yields the following results:
Running 10s test @ http://localhost:3000/read-content
10 connections
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
│ Latency │ 0 ms │ 0 ms │ 3 ms │ 5 ms │ 0.42 ms │ 1.02 ms │ 28 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬────────┬──────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼────────┼──────────┼─────────┤
│ Req/Sec │ 4,987 │ 4,987 │ 10,599 │ 12,271 │ 9,928 │ 2,007.92 │ 4,987 │
├───────────┼─────────┼─────────┼─────────┼─────────┼────────┼──────────┼─────────┤
│ Bytes/Sec │ 1.26 MB │ 1.26 MB │ 2.67 MB │ 3.09 MB │ 2.5 MB │ 506 kB │ 1.26 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴────────┴──────────┴─────────┘
┌──────┬────────┐
│ Code │ Count │
├──────┼────────┤
│ 200 │ 109199 │
└──────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 11
109k requests in 11.02s, 27.5 MB read
The output demonstrates that the server processed 109,000 requests, doubling the number compared to the previous test, which handled 49,000 requests. Additionally, the request rate surged to 9,928 requests per second from the last average of 4,411 requests per second. The increased throughput underscores a significant improvement in the system's performance and reliability.
This performance improvement is due to the cluster
module creating several
worker processes and distributing the load among them equally. The operating
system's schedulers determine how and where to allocate each worker process.
This distribution depends on CPU availability, load, and scheduling policies.
Often, the OS assigns each CPU a worker process.
Stop the cluster.js process in the first terminal using CTRL+C
.
With clustering implemented, our application can now efficiently handle more requests at a much faster rate.
Interprocess communication
In a cluster setup, each process operates within its own isolated memory space. However, there are scenarios where these processes need to communicate.
Practical examples include one worker process reading data from the filesystem and sending it to other worker processes. Another scenario is when one process fetches data from the network or is used to offload a CPU-bound task. The processed data can then be shared with other worker processes for further manipulation or processing.
The worker processes can achieve this via an IPC (Inter-Process Communication) channel, which facilitates the exchange of messages between the parent and worker processes.
To send a message from a worker process, you can use the process.send()
method:
process.send({
msgFromWorker: `Message sent from a worker.`
});
The send()
method transmits a message to the primary instance. In the primary
instance, you can listen for messages using worker.on("message")
, like so:
worker.on('message', (msg) => {
// Handle received message here
});
To observe this behavior, create a messaging.js
file with the following
contents:
import cluster from 'node:cluster';
import { availableParallelism } from 'node:os';
import process from 'node:process';
const numCPUs = availableParallelism();
if (cluster.isPrimary) {
console.log(`Primary ${process.pid} is running.`);
for (let i = 0; i < numCPUs; i++) {
const worker = cluster.fork();
// Receive messages from workers and handle them in the Primary process.
worker.on('message', msg => {
console.log(
`Primary ${process.pid} received a message from worker ${worker.process.pid}:`,
msg
);
});
}
} else if (cluster.isWorker) {
console.log(`Worker ${process.pid} is active.`);
// Send a message to the Primary process.
process.send({
msgFromWorker: `Message sent from worker ${process.pid}.`,
});
}
The cluster.isPrimary
condition ensures that the code inside it executes only
in the primary instance. Within this condition, you set an event listener
worker.on('message')
on each worker process to receive messages from other
processes. It logs the message along with the process ID of the sending worker.
The cluster.isWorker
condition ensures that its code executes only in worker
processes. To send a message, you use process.send()
, which contains a string
indicating that the message has been sent from the worker process along with its
process ID.
Run the file as follows:
node message.js
Primary 96715 is running.
Worker 96726 is active.
Primary 96715 received a message from worker 96726: { msgFromWorker: 'Message sent from worker 96726.' }
Worker 96727 is active.
Primary 96715 received a message from worker 96727: { msgFromWorker: 'Message sent from worker 96727.' }
Worker 96729 is active.
Primary 96715 received a message from worker 96729: { msgFromWorker: 'Message sent from worker 96729.' }
Worker 96728 is active.
Primary 96715 received a message from worker 96728: { msgFromWorker: 'Message sent from worker 96728.' }
In this output, several workers, identified by their process IDs (PIDs), are
active and communicating with the primary process with the PID 96715 (which may
differ on your system). The messages containing data with the property
msgFromWorker
have been logged.
Now that you are familiar with interprocess communication, it's worth noting that Node.js clustering is one of many available solutions. Another popular option is PM2, which you'll explore in the next section.
Using PM2 to cluster a Node.js application
In the previous section, you enhanced the application's performance using the
Node.js cluster
module. This section will explore
PM2, a process manager built upon the cluster
module. PM2 simplifies scaling applications by deploying multiple processes
across available CPU cores and efficiently distributing requests among them.
First, return to the first terminal and install PM2:
npm install -g pm2
With PM2, scaling an application file becomes simpler, eliminating the need for
a separate file like cluster.js
used in the clustering approach.
To scale the application, use the following command:
pm2 start index.js -i 0
The -i
flag specifies the number of instances PM2 should launch. By setting it
to 0
, you instruct PM2 to create as many instances of our application as there
are CPUs on our machine.
Executing the command yields an output similar to this:
[PM2] Starting /home/stanley/scaling-nodejs/index.js in cluster_mode (0 instance)
[PM2] Done.
┌────┬──────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │
├────┼──────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 0 │ index │ default │ 1.0.0 │ cluster │ 112344 │ 0s │ 0 │ online │ 0% │ 51.2mb │ stanley │ disabled │
│ 1 │ index │ default │ 1.0.0 │ cluster │ 112351 │ 0s │ 0 │ online │ 0% │ 49.2mb │ stanley │ disabled │
│ 2 │ index │ default │ 1.0.0 │ cluster │ 112362 │ 0s │ 0 │ online │ 0% │ 47.0mb │ stanley │ disabled │
│ 3 │ index │ default │ 1.0.0 │ cluster │ 112373 │ 0s │ 0 │ online │ 0% │ 37.7mb │ stanley │ disabled │
└────┴──────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
As observed, PM2 has automatically created a cluster of four worker processes because the system has four cores. These processes are ready to handle incoming requests and efficiently use system resources.
Now load test the application to observe its performance with the new changes:
npx autocannon --renderStatusCodes http://localhost:3000/read-content
You'll receive output looking like this:
Running 10s test @ http://localhost:3000/read-content
10 connections
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
│ Latency │ 0 ms │ 0 ms │ 3 ms │ 5 ms │ 0.44 ms │ 1.07 ms │ 24 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┤
│ Req/Sec │ 5,415 │ 5,415 │ 10,167 │ 11,591 │ 9,684 │ 1,611.39 │ 5,415 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┤
│ Bytes/Sec │ 1.36 MB │ 1.36 MB │ 2.56 MB │ 2.92 MB │ 2.44 MB │ 406 kB │ 1.36 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘
┌──────┬────────┐
│ Code │ Count │
├──────┼────────┤
│ 200 │ 106516 │
└──────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 11
107k requests in 11.02s, 26.8 MB read
On my test machine, the performance results from using PM2 closely mirror those
achieved with the Node.js cluster
module. However, a slightly lower number of
requests were processed (107,000 requests) compared to 109,000 in the previous
test with the cluster
module. Additionally, the request rates were slightly
lower, averaging 9,684 requests per second, compared to 9,928 requests per
second observed in the previous test.
Overall, performance changes are minimal when using PM2. However, scaling a Node.js application with PM2 is much simpler and more accessible than manually configuring clustering. Often, setting up PM2 suffices for scaling purposes.
To stop PM2 processes, you can use the following command:
pm2 delete all
The command will halt all PM2-managed processes and clean up the environment.
[PM2] Applying action deleteProcessId on app [all](ids: [ 0, 1, 2, 3 ])
[PM2] [index](1) ✓
[PM2] [index](0) ✓
[PM2] [index](2) ✓
[PM2] [index](3) ✓
┌────┬───────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │
└────┴───────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
Now that you understand the benefits of configuring a cluster with PM2, the next section will explore the common challenges encountered when clustering.
Common challenges when clustering
To maximize the benefits of clustering, it's essential to be aware of the potential challenges that may arise, allowing you to prevent them or find effective workarounds.
Firstly, the worker processes are distinct entities with isolated memory spaces and do not inherently share data. This can pose challenges, particularly if your application stores session and login data in memory. To mitigate this issue, developing applications with minimal state is advisable.
Another challenge arises because Node.js does not automatically adjust the
number of worker processes based on available CPUs. This lack of automation
necessitates manual intervention in your code to ensure optimal usage of system
resources. As a best practice, using the os.availableParallelism()
function is
advisable instead of relying solely on os.cpus().length
to determine the
appropriate number of worker processes. If can't use
os.availableParallelism()
, it's important to note that the primary process
also consumes CPU resources. Therefore, subtracting one from the total available
cores ensures that the primary process can utilize a core while the remaining
cores are allocated to worker processes. This strategy helps maintain a balanced
distribution of processing power across the application.
In addition to distributing requests using the round-robin approach, the Node.js
cluster
module supports direct worker connection handling. However, this can
sometimes lead to unbalanced load distribution due to OS scheduler quirks,
resulting in uneven performance across worker processes.
Finally, there is limited control over application ports, as each worker process receives the same port. This can be restrictive, particularly if your application requires unique ports for specific functionalities.
Preparing applications for clustering
To ensure smooth scaling of your application, it's crucial to prepare it appropriately. Here are some essential tips to consider:
Firstly, aim to design your application to be as stateless as possible. Avoid storing critical information, such as session data, within the memory of worker processes. This practice prevents requests from being limited to specific instances, as the instance holding that information can only serve session-bound data. Instead, opt for a shared storage solution accessible to all instances. For session-specific data, in-memory stores like Redis or memcached are commonly used. Alternatively, databases like PostgreSQL or MySQL can be suitable choices depending on the nature of the data.
Another good practice is implementing graceful shutdown mechanisms to handle shutdown signals effectively. Additionally, ensure that a new worker is spawned if one fails, thus maintaining high availability. In our solution, we incorporated the following code:
cluster.on("exit", (worker, code, signal) => {
console.log(`Worker ${worker.process.pid} has terminated.`);
console.log("Initiating replacement worker.");
cluster.fork();
})
This code snippet ensures that when a worker crashes, a replacement worker is spawned to maintain system stability.
Lastly, monitoring and troubleshooting issues can become challenging when dealing with multiple worker processes. To ease this process, consider using a logging tool such as Pino or Winston to generate logs, including process-specific details. These logs can then be aggregated and forwarded to a centralized location like Better Stack for analysis.
Final thoughts
This article explored various deployment strategies for Node.js applications,
focusing on clustering techniques using the cluster
module and PM2. Each
approach underwent load testing to evaluate its impact on application
performance.
Clustering in Node.js using the cluster
module exhibited superior performance
metrics, showcasing higher requests processed per second. PM2 followed closely;
although it lagged slightly behind in requests processed, the difference was not
massive.
In conclusion, selecting the appropriate deployment strategy should be based on
the application's specific requirements, considering factors such as expected
traffic load, resource utilization, and scalability needs. PM2 simplifies the
deployment and scaling process of a Node.js application with minimal code
adjustments, although it may not surpass the performance achieved by the
cluster
module. However, despite requiring more configuration, the cluster
module offers robust performance when fine-tuned appropriately.
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github