In Progress
Unit 1, Lesson 1
In Progress

Subprocesses Part 9: Exec

If you read any books or articles on UNIX process management, you’ll notice that two system calls come up over and over again: fork and exec. These two commands are sort of the twin pillars of process-spawning on POSIX systems, and a lot of documentation simply assumes that you understand how they work.

But maybe you come from a different OS background. Or maybe you’ve always been afraid to ask. Or maybe you have a basic understanding of these calls, but you aren’t entirely clear on how to apply them in practical ways.

Today we’re going to tackle the second of these two essential system calls: exec. We’ll learn how it works by building a practical development tool in Ruby: a script that kicks off a local, directory-specific instance of the Apache web server.

Video transcript & code

Sometimes when we're starting a new program from within Ruby, a subprocess isn't quite what we want.

For instance, let's say we want to be able to start an Apache server process within the current directory. So we've written a small script to make this possible.

This script starts by grabbing a port number from the command line.

Then it creates a temporary directory to function as an Apache server root.

Next, it writes a custom Apache configuration file inside the server root directory.

Finally, it outputs its own process ID…

…and then starts an Apache subprocess, passing in the server root directory path.

require "tmpdir"
require "pathname"

port        = ARGV.shift or fail "Port number argument required"
port        = Integer(port)
server_root = Pathname(Dir.mktmpdir("apache2"))
doc_root    = Pathname.pwd.expand_path

(server_root / "apache2.conf").write <<~"EOF"
  LoadModule mpm_prefork_module /usr/lib/apache2/modules/mod_mpm_prefork.so
  LoadModule authz_core_module /usr/lib/apache2/modules/mod_authz_core.so
  LoadModule dir_module /usr/lib/apache2/modules/mod_dir.so
  ServerName localhost
  PidFile #{server_root}/apache.pid
  ErrorLog /dev/stderr
  TransferLog /dev/stdout
  Listen #{port}
  DirectoryIndex index.html
  ServerAdmin webmaster@localhost
  DocumentRoot #{doc_root}
EOF

puts "**** PID: #{Process.pid}"
system "apache2", *%W[-X -d #{server_root}]

Let's try this out. We execute the script, with a port number as an argument.

$ ./mypache 8881
\**** PID: 14478

We can see that it outputs a process ID.

But if we look up that process ID, it isn't for an apache process.

$ ps 14478
  PID TTY      STAT   TIME COMMAND
14478 pts/18   Sl+    0:00 ruby ./mypache 8881

It's for the Ruby process.

If we dump all the processes for the current user, we can see both the Ruby process…

…and the child Apache process.

$ ps u
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
avdi     14478  0.0  0.1  46564  9908 pts/18   Sl+  20:54   0:00 ruby ./mypache
avdi     14481  0.0  0.0  29324  3296 pts/18   S+   20:54   0:00 apache2 -X -d /

The pstree command makes the relationship more clear.

$ pstree 14478
ruby─┬─apache2
     └─{ruby-timer-thr}

This isn't the end of the world, but it seems a bit messy. It's also inconvenient: if we wanted to send a signal to the Apache process using kill, and we used the PID dumped by the script, we'd be sending it to the wrong process, and it wouldn't do what we expected.

$ kill -SIGUSR1 14478

Another non-obvious problem is that when the script finally terminates, it will hide the process exit status from Apache.

In order to forward the exit status, we have to add another line at the end of the script.

exit $?

The fact is, once our script is done setting things up for Apache to run, and kicks off the Apache process, it's no longer serving any purpose. It's just getting in the way.

In this case, instead of starting a subprocess, what we'd really like to do is just hand control over to Apache at the end of our script. And that's exactly what we can do, by replacing system with exec.

require "tmpdir"
require "pathname"

port        = ARGV.shift or fail "Port number argument required"
port        = Integer(port)
server_root = Pathname(Dir.mktmpdir("apache2"))
doc_root    = Pathname.pwd.expand_path

(server_root / "apache2.conf").write <<~"EOF"
  LoadModule mpm_prefork_module /usr/lib/apache2/modules/mod_mpm_prefork.so
  LoadModule authz_core_module /usr/lib/apache2/modules/mod_authz_core.so
  LoadModule dir_module /usr/lib/apache2/modules/mod_dir.so
  ServerName localhost
  PidFile #{server_root}/apache.pid
  ErrorLog /dev/stderr
  TransferLog /dev/stdout
  Listen #{port}
  DirectoryIndex index.html
  ServerAdmin webmaster@localhost
  DocumentRoot #{doc_root}
EOF

puts "**** PID: #{Process.pid}"
exec "apache2", *%W[-X -d #{server_root}]

exec is a system call available on UNIX-like systems. It behaves like a subprocess-starting command, except that instead of keeping the original process around, the new process replaces the old one.

We can see what this means if we give the modified script a try.

$ ./mypache 8881
\**** PID: 16887

This time, when we request a process listing for the displayed PID, we don't find Ruby process. We find an apache process in its place.

This illustrates the fact that when we say that exec "replaces" the previous process, we're really not talking about one process ending when the next one begins. Rather, it's more like the new process takes over the old one. It keeps the old process ID, along with any other process environment setup that existed prior to the exec call.

exec is most often used as part of the low-level implementation of higher-level process-management abstractions. But sometimes, when we just need to write a script that sets the stage before handing off to another process, it's just the tool we need.

Oh and before I go, I want to clarify one thing: in other episodes we've talked about Ruby's #class_exec and instance_exec methods. Despite the similarity in naming, this exec method has absolutely nothing to do with those.

Happy hacking!

Responses