In Progress
Unit 1, Lesson 21
In Progress

Subprocesses Part 12: Fork and Exec

In this episode, you’ll learn how the fork() and exec() calls work together to form the twin pillars of the UNIX process-spawning architecture. You’ll learn the theory behind their operation, and then dive into concrete examples with Ruby.

Video transcript & code

As we saw in a recent episode, in most other operating systems that came both before and after UNIX, subprocesses are started by telling the operating system the location of the program to execute. The operating system then allocates a new process to run the specified program from its beginning.

UNIX, and the UNIX-like operating systems it gave rise to, were different. UNIX didn't have a traditional process-spawning system call. Instead, it had the fork and exec system calls.

In earlier episodes we learned about these system calls individually.

We learned about the exec system call. We saw that it can be used to replace the current process with a completely different program, while retaining process state like environment variables and standard input and output streams.

We also learned about the fork call. We saw how it effectively splits the current process into two: the original parent process, and a cloned child process which inherits all the parent processes state and proceeds forward from the point at which the parent process called for.

These two system calls form the pillars of subprocess creation on UNIX-like systems. We've seen in earlier episodes how they are each useful on their own. When we put them together, they can be used to spawn new subprocesses.

The essential concept is pretty straightforward. Rather than having a single subprocess-spawning system call, we first fork the current process into a parent and a child process running the same program. Then, in the child process, we invoke exec to replace the original program with a different one.

This is consistent with the overall UNIX philosophy of having small, single-purpose tools which can be composed together to achieve more complex goals.

That's the basic idea. Let's see how this plays out in Ruby code.

We'll write the Ruby program process ID to standard out.

Then, we'll fork. Inside the child process, we'll write the new process ID. Then we'll exec the UNIX date utility to print the current date and time.

When we run this, we can see that the parent and child have different process IDs, and that the UNIX date utility is executed and does what it is supposed to do.

puts "Parent ID: #{$$}"
fork do
  puts "Child ID: #{$$}"
  exec "date"
end

# >> Parent ID: 19295
# >> Child ID: 19304
# >> Wed Jul 26 08:56:56 EDT 2017

This isn't a very exciting demonstration. Let's do something more interesting.

A common use for child processes is to do some kind of processing and then report the results back to the parent process. To do that, we'll add a pipe, using Ruby's IO.pipe method.

Remember, the child process is an exact duplicate of the parent up to the point that they forked, so both processes have access to the filehandles associated with the pipe. Inside the child process, we'll use the trick we learned in Episode #29 to redirect all standard output to the "write" end of the pipe.

Then, back in the parent, we'll read a line of text from the child process, using the "read" end of the pipe. Because of the default synchronous nature of pipes in Ruby, this call will block until the child process has ended and automatically closed its filehandles.

Finally, we output the current date, using the text we got from the child process.

When we run this, we can see that rather than being printed directly to standard output, the product of the child date process has been captured and used by the parent.

read, write = IO.pipe

fork do
  $stdout.reopen(write)
  exec "date"
end

output = read.gets
puts "The date is: #{output}"

# >> The date is: Wed Jul 26 08:49:52 EDT 2017

This example illustrates a very important point about the fork-and-exec pattern: we can modify how the program-to-be-executed behaves by altering the state of the child process before calling exec. Many aspects of the process state will be inherited by whatever new program replaces the Ruby process as a result of the exec. In this case we've set up some IO redirection before the exec. But there are lots of other changes we could make.

For instance, let's say we want to see the output of the date command in the UTC time zone instead of the local time zone. But we don't want to modify the environment of the parent process. In the child process, we can set the TZ variable before the exec.

Then when we run the program again, the captured output shows universal time instead of eastern daylight time.

read, write = IO.pipe

fork do
  $stdout.reopen(write)
  ENV["TZ"] = "UTC"
  exec "date"
end

output = read.gets
puts "The date is: #{output}"

# >> The date is: Wed Jul 26 12:51:18 UTC 2017

What's cool about this fork-and-exec technique is that we don't have to learn anything new in order to alter the setup of the child process. If we wanted to perform a similar environment alteration using the Process.spawn command, we would have to remember that it expects any environment variable overrides to be specified as a hash before the command to be spawned. Any other process-specific setup, such as the IO redirection, would have to be specified in the options hash that follows the command, using option conventions we'd probably have to look up.

# Process.spawn({"TZ" => "UTC"}, "date", ...)

By contrast, in the fork-and-exec model, so long as we know how to set an environment variable for the current process, or redirect output for the current process, we already know how to set it for the child process.

If you do any programming on a UNIX platform, chances are you will run into the term "fork and exec" at some point. Even if you don't use it directly, UNIX-like systems typically implement their higher-level process-spawning functions in terms of this fork-and-exec model. In fact, if you dig into the UNIX-specific C code that underlies Ruby's higher-level process-spawning methods, such as the backtick operator, the system command, or the Process.spawn command, what you'll find when you dig deep enough is fork-and-exec.

result = `date`
system("ls")
Process.spawn("myprog")

Now you have a basic idea of what it means to fork-and-exec. Be aware that because it relies on the ability to fork the current process, this capability is limited to UNIX-like systems such as Linux, BSD, and modern macOS. It can't be used on Ruby VMs that are compiled natively for Windows, so this is not a technique you should use to write cross-platform subprocess-spawning code. However, if you ever find yourself doing systems-level programming on a program which only targets UNIX-like hosts, this technique may well come in handy. Happy hacking!

Responses