Spawning subprocesses smartly and securely
As part of your code, you may be inclined to call a command to do something. But is it always a good idea? How to do it safely? What happens behind the scenes?
This article is written from a general perspective, with a Unix/C bias and a very slight Python bias. The problems mentioned apply to all languages in most environments, including Windows.
Use the right tool for the job
By calling another process, you introduce a third-party dependency. That dependency isn’t controlled by your code, and your code becomes more fragile. The problems include:
the program is not installed, or even available, for the user’s OS of choice
the program is not in the
$PATH
your process getsthe hard-coded path is not correct on the end user’s system
the program is in a different version (eg. GNU vs. BSD, updates/patches), which means different option names or other behaviors
the program’s output is not what you expected due to user config (including locale)
error reporting is based on numeric exit codes, and the meaning of those differs between programs (if they have meaning besides 0/1 in the first place)
On the other hand, if your code uses a lot of subprocesses, perhaps you should stay with Bash. You can do the harder parts with Python, Ruby, or some other language by calling them from within your Bash script.
Don’t spawn subprocesses if there’s an alternative
Spawning a subprocess always incurs a (minor) [1] performance hit minor compared to the alternatives. With that in mind, and the resiliency issues listed above, you should always try to find an alternative for the external command.
The simplest ones are the basic Unix utilities. Replace grep
, sed
and
awk
with string operations and regular expressions. Filesystem utilities
will have equivalents — for Python, in os
or shutil
. Your language of
choice can also handle things like networking (don’t call curl
), file
compression, working with date/time…
Similarly, you should check if there are packages available that already do what you want — library bindings or re-implementations. And if there isn’t, perhaps you could help the world by writing one of those and sharing it?
One more important thing: if the program uses the same language as your code, then you should try to import the code and run it from the same process instead of spawning a process, if this is feasible.
Security considerations: shells, spaces, and command injection
We come to the most important part of this article: how to spawn subprocesses
without compromising your system. When you spawn a subprocess on a typical Unix
system, fork()
is called, and your process is copied. Many modern Unix
systems have a copy-on-write implementation of that syscall, meaning that the
operation does not result in copying all the memory of the host process over.
Forking is (almost) immediately followed by calling execve()
(or a helper
function from the exec family) [2] in the child process — that function
transforms the calling process into a new process [3]. This technique is
called fork-exec and is the typical way to spawn a new process on Unix. [4]
There are two ways to access this API, from the C perspective:
directly, by calling
fork()
andexec*()
(orposix_spawn()
), and providing an array of arguments passed to the process, or-
through the shell (
sh
), usually by callingsystem()
. As Linux’s manpage forsystem(3)
puts it,The
system()
library function usesfork(2)
to create a child process that executes the shell command specified in command usingexecl(3)
as follows:
If you go through the shell, you pass one string argument, whereas exec*()
demands you to specify arguments separately. Let’s write a sample program to print all the arguments it receives. I’ll do it in Python to get a more readable output.
Let’s see what appears:
$ ./argv.py foo bar ['./argv.py', 'foo', 'bar'] $ ./argv.py 'foo bar' ['./argv.py', 'foo bar'] $ ./argv.py foo\ bar baz ['./argv.py', 'foo bar', 'baz'] $ ./argv.py $(date) ['./argv.py', 'Sat', 'Sep', '2', '16:54:52', 'CEST', '2017'] $ ./argv.py "$(date)" ['./argv.py', 'Sat Sep 2 16:54:52 CEST 2017'] $ ./argv.py /usr/* ['./argv.py', '/usr/X11', '/usr/X11R6', '/usr/bin', '/usr/include', '/usr/lib', '/usr/libexec', '/usr/local', '/usr/sbin', '/usr/share', '/usr/standalone'] $ ./argv.py "/usr/*" ['./argv.py', '/usr/*'] $ ./argv.py $EDITOR ['./argv.py', 'nvim'] $ $PWD/argv.py foo bar ['/Users/kwpolska/Desktop/blog/subprocess/argv.py', 'foo', 'bar'] $ ./argv.py a{b,c}d ['./argv.py', 'abd', 'acd'] $ python argv.py foo bar | cat ['argv.py', 'foo', 'bar'] $ python argv.py foo bar > foo.txt $ cat foo.txt ['argv.py', 'foo', 'bar'] $ ./argv.py foo; ls /usr ['./argv.py', 'foo'] X11@ X11R6@ bin/ include/ lib/ libexec/ local/ sbin/ share/ standalone/
As you can see, the following things are handled by the shell (the process is unaware of this occurring):
quotes and escapes
expanding expressions in braces
expanding variables
wildcards (glob,
*
)redirections and pipes (
> >> |
)command substitution (backticks or
$(…)
)running multiple commands on the same line (
; && || &
)
The list is full of potential vulnerabilities. If end users are in control of the arguments passed, and you go through the shell, they can execute arbitrary commands or even get full shell access. Even in other cases, you’ll have to depend on the shell’s parsing, which introduces an unnecessary indirection.
TL;DR: How to do this properly in your language of choice
To ensure spawning subprocess is done securely, do not use the shell in between. If you need any of the operations I listed above as part of your command — wildcards, pipes, etc. — you will need to take care of them in your code; most languages have those features built-in.
- In C (Unix)
-
Perform fork-exec by yourself, or use
posix_spawn()
. This also lets you communicate with the process if you open a pipe and make it stdout of the child process. Never usesystem()
. - In Python
-
Use the subprocess module. Always pass
shell=False
and give it a list of arguments. With asyncio, useasyncio.create_subprocess_exec
(and not_shell
), but note it takes*args
and not a list. Never useos.system
andos.popen
. - In Ruby
-
Pass arrays to
IO.popen
. Pass multiple arguments tosystem()
(system(["ls", "ls"])
orsystem("ls", "-l")
). Never use%x{command}
or backticks. - In Java
-
Pass arrays to
Runtime.exec
. Pass multiple arguments or list toProcessBuilder
. - In PHP
-
All the standard methods go through the shell. Try
escapeshellcmd()
,escapeshellarg()
— or better, switch to Python. Or anything, really. - In Go
-
os/exec
andos.StartProcess
are safe. - In Node.js
-
Use
child_process.execFile
orchild_process.spawn
withshell
set to false. - Elsewhere
-
You should be able to specify multiple strings (using variadic arguments, arrays, or otherwise standard data structures of your language of choice) as the command line. Otherwise, you might be running into something shell-related.
The part where I pretend I know something about Windows
On Windows, argument lists are always passed to processes as strings (Python joins them semi-intelligently if it gets a list). Redirections and variables work in shell mode, but globs (asterisks) are always left for the called process to handle.
Some useful functions are implemented as shell built-ins — in that case, you need to call it via the shell.
Internals: There is no fork()
on Windows. Instead, CreateProcess()
,
ShellExecute()
, or lower-level spawn*()
functions are used. cmd.exe
/c
is called in shell calls.