In Progress
Unit 1, Lesson 1
In Progress

Subprocesses Part 11: Fork

If you’re making subprocesses on a UNIX-like OS, you’re using fork(). In this episode you’ll learn about the Ruby bindings for fork(), and how Ruby makes fork() easier to use.

Video transcript & code

In traditional operating system design, the procedure for spawning a new subprocess looked something like this: the parent process tells the operating system the name of the program that it would like to spawn. The operating system loads this program into a separate memory space, and begins a new process running concurrently with the first.

In the early days of the UNIX operating system, there were only two processes: one for each terminal connected to the system. As Ken Thompson and Dennis Ritchie fleshed out what would become the UNIX process architecture they discovered something that would go on to be profoundly influential on its design. They realized that as an outgrowth of some early design decisions, it was almost trivially easy to implement a system call which would make a copy of the current process, including all of its present state.

They named this system call fork. In effect, when a process called fork, a new clone of it would be created, and both processes would proceed forward from the point in their program where fork was invoked.

Ruby inherits most of the UNIX process API, including fork. Let's try it out. First we'll write a program which simply outputs its own process ID, and then exits.

puts "Starting up..."
puts "Hi, I'm process #{$$}"

# >> Starting up...
# >> Hi, I'm process 26771

Now let's add a fork call just before writing out the process ID.

puts "Starting up..."
fork
puts "Hi, I'm process #{$$}"

# >> Starting up...
# >> Hi, I'm process 26789
# >> Hi, I'm process 26798

When we run this again, we see two different process IDs written to the console. That's because at the point of the fork, our program split into two separate processes which proceeded independently from that point on.

By itself, this isn't tremendously useful. In fact, it's pretty easy to imagine how having multiple copies of the same process would do more harm than good. After all, it seems like there will be a lot of opportunities for the separate processes to step on each other's toes, as they all try to do the same thing.

But what if the processes could know which was the parent, and which was the child?

As it happens, the fork system call has a return value. In the parent process, it returns the process ID of the newly created child process. In a child process, the return value is nil.

Let's test this out. we'll capture the result of the fork in a variable. Then, we'll switch on whether the return value is set or not.

If it has a non-nil value, that means we are in the parent process.

Otherwise, we must be in the child process.

Let's run this again.

child_pid = fork
if child_pid
  puts "I'm the parent #{$$}, with child #{child_pid}"
else
  puts "I'm the child #{$$}"
end

# >> I'm the parent 31490, with child 31499
# >> I'm the child 31499

Now that we know which processes the parent, and which processes the child, things get more interesting.

To demonstrate the kind of things we can do with child processes, let's write a classic example program: a forking server.

We start by requiring the socket library.

Then we create a new TCP server bound to port 2000.

Next we start a loop.

In the loop, we will wait for a new TCP client to connect using the accept method.

Then we'll fork the process.

On the parent side, we'll just report the new child process. Once this is done, the conditional will end, and the parent process will loop around to wait for another connection.

In a child process, things are more interesting. We'll start by initializing some session state, in the form of the counter variable. Then we'll greet the new client. We'll loop on input from the client. Each time we get a new line, we treat it as an integer and add it to the counter. We conclude the loop by reporting the current count.

We don't ever want the child processes to loop back around the outer loop and start accepting connections of their own. So after the inner loop ends, we exit the process.

require "socket"

server = TCPServer.new("localhost", 2000)

loop do
  client = server.accept
  child_pid = fork
  if child_pid
    puts "Spawned #{child_pid}"
  else
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
    exit
  end
end

Let's open up a new terminal, and give this program a whirl.

Once we have it up and running, we can open up a separate terminal and start a telnet connection to it.

Here, we can see the greeting from the child process. We can enter some numbers, and see that the counter is incremented.

Now let's open up a couple more telnet connections.

Let's play around with adding numbers to the different telnet sessions. What we can see as we're doing this is that each connection has its own separate state, thanks to being run in separate child processes.

So far the Ruby code we've written is very similar to the equivalent C code for a forking TCP server. But Ruby is all about programmer happiness and convenience, and so it adds a special Ruby-ish twist to the fork method call.

Instead of having to switch on the return value of fork to determine whether the code is running in the parent or child process, we can simply pass a block to it containing the child process logic. This block will only be run on the child side of the fork.

Ruby will automatically exit the child process at the end of the fork, so we don't need to do that explicitly anymore.

Now that we no longer have an if/then conditional to check the result of fork, we put our new process notification after the end of the block.

require "socket"

server = TCPServer.new("localhost", 2000)

loop do
  client = server.accept
  child_pid = fork do
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  puts "Spawned #{child_pid}"
end

This code behaves exactly the same as the original, so I'm not going to demonstrate it a second time.

Of course, this is just one example of how fork can be useful. There are other applications for it; for instance, it's a great way to spin up groups of worker processes when we want to tackle some problem in parallel across multiple CPU cores. In future episodes we may cover strategies for communicating and coordinating work across these kinds of process groups.

Before we close this episode, here are three notes you should keep in mind:

  1. First, and most importantly: Remember how at the beginning of the episode, I said that fork evolved based on a peculiarity of the UNIX implementation? Well, as a result, fork is only available on UNIX-like systems. If you try to use fork on, for instance, a Ruby runtime compiled natively for Windows, Ruby will complain that it is not implemented on that platform. So fork is not something to use in code that you intend to be cross-platform.
  2. Second, we've looked at how to create new clone processes today, but we haven't delved into how to ensure that parent and children are cleanly shut down. We'll talk about coordinating child process termination in a future episode.
  3. Third, so far we've only looked at how fork can be used to clone the current process. But… didn't we begin this episode talking about spawning new programs as subprocesses? We'll take a look at how to combine fork with exec for a complete process-spawning solution in a future episode.

For now, though… happy hacking!

Responses