Single-threaded, non-blocking performance in Node works great for a single process. But eventually, one process in one CPU is not going to be enough to handle the increasing workload of your application.
Using multiple processes is the best way to scale a Node application. Node is designed for building distributed applications with many nodes.
The Child Process Module
We can easily spin a child process using Node’s child_process
module and those child processes can easily communicate with each other with a messaging system.
The child_process
module enables us to access Operating System functionalities by running any system command inside child process.
We can control that child process input stream, and listen to its output stream. We can also control the arguments to be passed to the underlying OS command, and we can do whatever we want with that command’s output.
There are four different ways to create a child process in Node: spawn()
, fork()
, exec()
, execFile()
.
Spawned Child Process
The spawn
function launches a command in a new process and we can use it to pass that command any arguments. For example, here’s code to spawn a new process that will execute the pwd
command.
1 | const { spawn } = require('child_process') |
We simply destructure the spawn
function out of the child_process
module and execute it with the OS command as the first argument.
The result of executing the spawn
function (the child
object above) is a ChildProcess
instance, which implements the EventEmitter API
. This means we can register handlers for events on this child object directly. For example, we can do something when the child process exits by registering a handler for the exit
event.
1 | child.on('exit', function (code, signal) { |
The handler above gives us the exit code
for the child process and the signal
, if any, that was used to terminate the child process. This signal
variable is null when the child process exits normally.
The other events that we can register handlers for with the ChildProcess
instances are disconnect
, error
, close
, message
.
The
disconnect
event is emitted when the parent process manually calls thechild.disconnect
function.The
error
event is emitted if the process could not be spawned or killed.The
close
event is emitted when thestdio
streams of a child process get closed.The
message
event is the most important one. It’s emitted when the child process uses theprocess.send()
function to send messages. This is how parent/child processes can communicate with each other.
Every child process also gets the three standard stdio
streams, which we can access using child.stdin
, child.stdout
, child.stderr
.
When those streams get closed, the child process that was using them will emit the close
event. This close
event is different than the exit
event because multiple child processes might share the same stdio
streams and so one child process exiting does not mean that the streams got closed.
Since all streams are event emitters, we can listen to different events on those stdio
stream that are attached to every child process. Unlike in a normal process though, in a child process, the stdout
/stderr
streams are readable streams while the stdin
stream is a writable one. This is basically the inverse of those types as found in a main process. The events we can use for those streams are the standard ones. Most importantly, on the readable streams, we can listen to the data
event, which will have the output of the command or any errorencoutered while executing the command:
1 | child.stdout.on('data', (data) => { |
The two handlers above will log both cases to the main process stdout
and stderr
. When we execute the spawn
function above, the output of the pwd
command gets printed and the child process exits with code 0
, which means no error occured.
We can pass arguments to the command that’s executed by the spawn
function using the second argument of the spawn
function, which is an array of all the arguments to be passed to the command.
1 | const child = spawn('find', ['.', '-type', 'f']) |
If an error occurs during the execution of the command, for example, if we give find an invalid destination above, the child.stderr
data
event handler will be triggered and the exit
event handler will report an exit code of 1
, which signifies that an error has occured.
A child process stdin
is a writable stream. We can use it to send a command some input. Just like any writable stream, the easiest way to consume it is using the pipe
function. We simply pipe a readable stream into a writable stream. SInce the main process stdin
is a readable stream, we can pipe that into a child process stdin
stream.
1 | const { spawn } = require('child_process') |
In the example above, the child process invokes the wc
command, which count lines, words, and characters in Linux. When we pipe the main process stdin
(which is a readable stream) into the child process stdin
(which is a writable stream). The result of this combination is that we get a standard input mode where we can type somthing and when we hit Ctrl + D
, what we typed will be used as the input of the wc
command.
We can also pipe the standard input/output of multiple processes on each other, just like we can do with Linux commands. For example, we can pipe the stdout
of the find
command to the stdin
of the wc
command to count all the files in the current directory:
1 | const { spawn } = require('child_process') |
Add -l
to the wc
command to make it count only the lines.
Shell Syntax and the exec Function
By default, the spawn
function does not create a shell to execute the command we pass into it. This makes it slightly more efficient than the exec
function, which does create a shell. The exec
function has one other major difference. It buffers the command’s generated output and passes the whole output value to a callback function(instead of using streams, which is what spawn
does).
Here the is previous find | wc
example implemented with an exec
function.
1 | const { exec } = require('child_process') |
Since the exec
function uses a shell to execute the command, we can use the shell syntax
directly here making use of the shell pipe
feature.
Note that using the shell syntax comes at a security risk
if you’re executing any kind of dynamic input provided externally. A user can simply do a command injection attrack using shell syntax characters like ; and $(For example, commnd + '; rm -rf ~'
)
The exec
function buffers the output and passes it to the callback function(the second argument to exec
) as the stdout
argument there. This stdout
argument is the command’s output that we want to print out.
The exec
function is a good choice if you need to use the shell syntax and if the size of the data expected from te command is small(Remember exec
will buffer the whole data in memory before returning it).
The spawn
function is a much better choice when the size of the data expected from the command is large, because that data will be streamed with the standard IO objects.
We can make the spawned child process inherit the standard IO object of its parents if we want to, but also, more importantly, we can ke the spawn
function use the shell syntax as well. Here’s the smae find | wc
command implemented with the spawn
.
1 | const child = spawn('find . -type f', { |
Because of the stdio: 'inherit'
option above, when we execute the code, the child process inherits the main process stdin
, stdout
and stderr
. This causes the child process data events handlers to be triggered on the main process.stdout
stream, making the script output the result right away.
Because of the shell: true
option above, we were able to use the shell syntax in the passed command, just like we did with exec
. But with this code, we still get the advantage of the streaming of data that the spawn
function gives us. THis is really the best of both worlds.
There are a few other good options we can use in the last argument to the child_process
function besides shell
and stdio
. We can, for example, use the cwd
option to change the working directory of the script.
Another option we can use is the env
option to specify the environment variables that will be visible to the new child process. The default for this option is process.env
which gives any command access to the current process environment. If we want ot override that behavior, we can simple pass an empty object as the env
option or new value there to be considered as the only environment variables:
1 | const child = spawn('echo $ANSWER', { |
The echo command above does not have access to the parent process’s environment variables. It can’t, for example, access $HOME
, but it can accccess $ANSWER
because it was passes as a custom environement variable through the env
option.
One last important child process option to explain here is the detached
option, which makes the child process run independently of its parent process.
Assuming we have a file timer.js
that keeps the event loop busy:
1 | setTimeout(() => { |
We can execute it in the background using the detached
option:
1 | const { spawn } = require('child_process') |
The exact behavior of detached child process depends on the OS. On windows, the detached child process will have its own console window while on Linux the detached child process will be made the leader of a new process groups and session.
It the unref
function is called on the detached process, the parent process can exit independently of the child. This can be useful if the child is executing a long-running process, but to keep it running in the background the child’s stdio
configurations also have to be independent of the parent.
The example above will run a node script(timer.js) in the background by detaching and also ignoring its parent stdio
file descriptors so that the parent can terminate while the child keeps running in the background.
The execFile function
If you need to execute a file without using a shell, the execFile
function is what you need. It behaves exactly like the exec
function, but does not use a shell, which makes it a bit more efficient. On windows, some files cannot be executed on their own, like .bat
, .cmd
files. Those files cannot be executed with execFile
and either exec
or spawn
with shell set to true is required to execute them.
The *Sync Function
The functions spawn
, exec
, execFile
from the child_process
module also have synchronous blocking versions that will wait until child process exits.
1 | const { |
The fork() function
THe fork
function is a variation of the spawn
function for spawning node processes. The biggest difference between spawn
and fork
is that a communication channel is established to the child process when using fork
, so we can use the send
function on the forked along with the global process
object itself to exchange messsages between the parent and forked processes. We do this through the EventEmitter
module interface.
1 | // parent.js |
In the parent.js we fork child.js
(which will execute the file with the node
command) and then we listen for the message
event. The message
event will be emitted whenever the child use process.send
.
The pass down messages from the parent to the child, we can execute the send
function one the forked object it self, and then, in the child script we can listen to the message
event on the global process
.
When executing the parent.js
file, it’ll first send down the {hello: 'world'}
object to be printed by the forked child process and then the forked child process will send an incremental counter value every second to be printed by the parent process.
Let’s do more practical example about the fork
function.
Let’s say we have an http server that handles two endpoints. One of these endpoints(/compute
below) is computationally expensive and will take a few seconds to complete. WE can use a long for loop to simulate that:
1 | const http = require('http') |
The program has a big problem: when the /compute
endpoint is requested, the server will not be able to handle any other requests because the event loop is busy with the long for loop operation.
There are a few ways which we can solve this problem depending on the nature of the long operation but one solution that works for all operation is to just move the computational operation into another process using fork
.
We first move the whole longComputation
function into its own file and make it invoke that function when instructed via a message from the main process:
1 | // compute.js |
When a request to /compute
happens now with the above code, we simply send a message to the forked process to start executing the long operation. The main process’s event loop will not be blocked.
Once the forked process is done with the long operation, it can send its result back to the parent process using process.send
.
Node’s cluster
module is based on this idea of child process forking and load balancing the requests among the many forks.