In Progress
Unit 1, Lesson 1
In Progress

Subprocesses Part 14: Advanced Wait

If you’re writing a forking server, a job queuing system, or any other program which must manage child processes, you have to know how to cleanly handle the shutdown of those processes. In today’s episode you’ll learn the nuances of Ruby’s Process.wait, Process.wait2, and Process.waitall system calls. You’ll see how they differ, and get some insight into how to choose between them based on your use case.

Video transcript & code

Today we're going to pick up roughly where we left off in the last episode in this miniseries.

We've written a basic TCP server program. Every time it accepts a new connection, it forks off a new connection handler process.

Meanwhile, a background thread waits for the user to type Enter. Once that happens, it waits for each process to finish up before exiting the server process.

require "socket"

pids = []

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  pids.each do |pid|
    puts "Waiting on PID #{pid}..."
    Process.wait(pid)
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    $stdout.reopen(File::NULL)
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  pids << child_pid
  puts "Spawned #{child_pid}"
end

Today we're going to learn about some variations on the Process.wait call. The first has to do with how we specify which process ID to wait for.

One of the drawbacks of the approach we've used up til now is that it waits on the child processes in the order they were started. But just because a child process started first doesn't mean it will be the first to terminate.

Let's rewrite the process-waiting code slightly. Instead of iterating over the list of PIDs, we'll loop until that list is empty. Then, inside the loop, we'll invoke Process.wait with no arguments at all.

When we don't supply a process ID to Process.wait, it waits for any subprocess of the current process to end. Once a subprocess ends, it returns the PID of the finished process, which we'll capture in a variable.

Then we'll delete that process ID from our list, and log the event.

require "socket"

pids = []

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  puts "Waiting for children"
  until pids.empty?
    pid = Process.wait
    pids.delete(pid)
    puts "Child #{pid} has ended"
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    $stdout.reopen(File::NULL)
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  pids << child_pid
  puts "Spawned #{child_pid}"
end

Let's try this out. We'll start up the server process. Then, in separate terminals, we'll start up two telnet connections. And then we'll hit Enter in the server terminal, to let it know it should start shutting down.

We'll close out the second connection first. When we do this, we can see that the server immediately notes the child process termination, and then goes back to waiting.

Then when we close the first connection, the server logs the occurrence and, with no further children to wait for, shuts down.

$ ruby server1.rb
Accepting connections (server PID: 21702)
Spawned 22090
Spawned 22282

Waiting for children
Child 22090 has ended
Child 22282 has ended
Shutting down

Thus far, we've just been assuming that the child processes do whatever they need to do. We haven't been checking up to see if they succeeded or failed.

Now let's add a way for the connection handlers to fail. We'll specify that the counters they maintain cannot exceed a value of 7, or else they will exit with an error status.

Now let's go back to where we wait for subprocesses to finish. How can we check on the exit status of the processes we are waiting on?

One way to do it goes like this: we spawn a process, wait for it to end, and then check the value of the $? variable for a Process::Status object.

pid = spawn("ls")
Process.wait
$?
# => #<Process::Status: pid 284084 exit 0>

But while this may work, there's a better way to do it that doesn't involve a pseudo-global special variable.

Back in our server code, we change the Process.wait call to Process.wait2. We can now capture a second return value from the call: a status object. We can use the status when we log the process termination.

require "socket"

pids = []

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  puts "Waiting for children"
  until pids.empty?
    pid, status = Process.wait2
    pids.delete(pid)
    puts "Child #{pid} finished with status #{status.exitstatus}"
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    $stdout.reopen(File::NULL)
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      if count > 7
        client.puts "I can't count that high!!"
        exit 1
      end
      client.puts "Count: #{count}"
    end
    client.close
  end
  pids << child_pid
  puts "Spawned #{child_pid}"
end

Let's put this through its paces. After starting up the server we make two separate telnet connections to it. Then we tell the server to start shutting down.

In one telnet connection, we close out the telnet session normally.

While in the other, we cause the count to be incremented beyond

At this point, looking at the server terminal , we can see that one child process exited with a successful status of 0, and the other exited with an error status of 1.

$ ruby server2.rb
Accepting connections (server PID: 24672)
Spawned 24676
Spawned 24680

Waiting for children
Child 24676 finished with status 1
Child 24680 finished with status 0
Shutting down

Let's return to our server code. Up til now, our loop that waits for child processes to end has depended on having a record of which child processes we've started. But what if, for some reason, we haven't kept track of child process PIDs? Are we out of luck?

Thankfully not. We can wait for each child process to end even if we don't know their process IDs.

We'll change our until to an infinite loop. We'll leave our call to Process.wait2 unchanged, but we'll surround it with a begin…end construct. Then we'll add a rescue clause which waits for an Errno::ECHILD exception. When that exception is detected, it breaks out of the loop.

This code will loop around, waiting for a child process to terminate. So long as there are still subprocesses left, it will continue. But eventually, all the child processes will have died. At that point, Process.wait2 will raise an Errno::ECHILD exception to indicate that it doesn't have anything to wait for. At that point we're done, and we break out of the loop.

require "socket"

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  loop do
    puts "Waiting for children to finish"
    begin
      pid, status = Process.wait2
      puts "Child #{pid} finished with status #{status.exitstatus}"
    rescue Errno::ECHILD
      break
    end
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    $stdout.reopen(File::NULL)
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  puts "Spawned #{child_pid}"
end

Let's quickly try this. We re-start the server, start up some clients , tell the server to shut down , and then shut down the clients one by one. There's nothing new about the output, but this time we achieved it without keeping track of the child process IDs that we had spawned.

$ ruby server3.rb
Accepting connections (server PID: 5567)
Spawned 5996
Spawned 6348

Waiting for children to finish
Child 5996 finished with status 0
Waiting for children to finish
Child 6348 finished with status 0
Waiting for children to finish
Shutting down

This code is what we use when we want to report back on each child process as it terminates. But if we don't need to track the ending processes in realtime—if all we need to do is wait for all of the subprocesses to finish—there is a much easier way.

We can replace our entire loop with a call to Process.waitall. This will wait for all child processes of the current process to terminate. If we want to report on the exit status of the subprocesses, we can capture them in a variable.

This return value consists of a list of pairs of process ID and exit status object, which we can easily loop over with each.

require "socket"

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  puts "Waiting for children to finish"
  status_list = Process.waitall
  status_list.each do |pid, status|
    puts "Child #{pid} finished with status #{status.exitstatus}"
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    $stdout.reopen(File::NULL)
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  puts "Spawned #{child_pid}"
end

One more time, let's fire up the server, start some connections, tell the server to stop, and then close down the connections.

This time, we can see that the server waited for the last child process to end before it logs their PIDs and statuses.

$ ruby server4.rb
Accepting connections (server PID: 13568)
Spawned 13760
Spawned 13844
Spawned 13844

Waiting for children to finish
Child 13760 finished with status 0
Child 13844 finished with status 0
Shutting down

I know that waiting for subprocesses to end isn't the most exciting of technical topics. But if you write any kind of systems automation code in Ruby, or if you ever want to create a forking server or work-queuing system, this knowledge can come in handy. I hope what you've learned today will help you avoid some tedious trial-and-error in the future. Happy hacking!

[su_icon_centered_note note_icon="icon: exclamation-triangle"]"The names of the Errno:: classes depend on the environment in which Ruby runs." - This solution may not be portable to all operating system environments.[/su_icon_centered_note]

Responses