In Progress
Unit 1, Lesson 21
In Progress

Subprocesses Part 5: SIGCHLD

Video transcript & code

So far in this miniseries on subprocesses, we've focused mostly on how to start processes. But we can't always kick off a process and then ignore it. Often, we need a way to keep tabs on it as it does its work.

Here's an example.

We recently wrote some code that kicked off a group of parallel video encoding processes.

Since then we've moved the code that constructs the full avconv command into its own method.

def avconv_preview_command(input_file)
  fade_args = "fade=type=in:nb_frames=15, fade=type=out:start_frame=870"
  output_file = "./previews/" +
                File.basename(input_file, ".mp4") +
                "-preview" +
                File.extname(input_file)
  %W[avconv
      -ss 30
      -i #{input_file}
      -y
      -strict -2
      -t 30
      -vf '#{fade_args}'
      #{output_file}
      > /dev/null 2>&1
    ].join(" ")
end

Once again, we're not going to dig into this code in detail. What's important is that we use this code to construct shell commands, which we then execute asynchronously using Process.spawn.

In the process, we collect process IDs, or PIDs.

Then we loop over the PIDs and wait for each one to finish

require "fileutils"
require "./video"

pids = []

FileUtils.mkpath "previews"
ARGV.each do |input_file|
  command = avconv_preview_command(input_file)
  pid = Process.spawn(command)
  puts "Started process #{pid}"
  pids << pid
end

pids.each do |pid|
  puts "Waiting for process #{pid}"
  Process.waitpid(pid)
end

This works OK. We can kick it off with a few video files as input.

We see all the processes begin at once.

Then we watch as the code waits for a while on the first PID.

Once the first process finishes, the other PIDs are collected in short order.

$ ./mkpreviews *.mp4
Started process 5201
Started process 5202
Started process 5203
Started process 5205
Waiting for process 5201
Waiting for process 5202
Waiting for process 5203
Waiting for process 5205

This code provides some very course-grained monitoring of our child processes. But it's not really ideal.

The most glaring problem is that we simply loop over out list of PIDs to wait on from beginning to end.

But there's no reason to assume that the first process that started is going to be the first one to finish. Each process has its own workload. One of the reasons that we see the processes finish all in a rush at the end may be that some of them had already finished while we were waiting for the first one.

It would be preferable if we could wait for any process to finish and report it as soon as it happens. Happily, this doesn't require a lot of rework to achieve.

Instead of looping over the PIDs, we establish a do-while loop, which we learned about in episode #073.

And then instead of waiting on a specific PID, we supply the special flag value -1 to Process.waitpid.

This tells waitpid to wait for any child process to finish.

In order to determine which process finished, we capture the return value of waitpid.

Then we delete it from the list of PIDs we're waiting on.

require "fileutils"
require "./video"

pids = []

FileUtils.mkpath "previews"
ARGV.each do |input_file|
  command = avconv_preview_command(input_file)
  pid = Process.spawn(command)
  pids << pid
  puts "Started process #{pid}"
end

begin
  pid = Process.waitpid(-1)
  puts "Process #{pid} is done!"
  pids.delete(pid)
end until pids.empty?
puts "All done!"

When we run this version, we don't see a huge difference from the first version.

But if we look carefully at the final output, we can see that the processes ended in a slightly different order than they began. This was probably happening in the first version as well, but this time we actually tracked the process terminations in the order they occurred.

This is an improvement, because now we have a more accurate indication of which processes finish when.

But it still means blocking our whole main process while we wait for processes to stop. What if we want the program to do something else while our jobs run?

As a very simple example, let's say we want to display an animated, running status dashboard as the processes do their work.

To do this, we start by changing the format of our PID list, from an array to a hash.

We'll make the PIDs the keys in the hash, and map them to a status value, either :working or :done.

Then we once again rewrite our waiting loop. This time, it repeats until all of the PIDs are marked done.

Inside the loop, we just repaint the screen, then sleep for a tenth of a second, then repeat.

pids = {}

start_time = Time.now
FileUtils.mkpath "previews"
ARGV.each do |input_file|
  command = avconv_preview_command(input_file)
  pid = Process.spawn(command)
  pids[pid] = :working
end

until pids.values.all?{|v| v == :done}
  repaint(pids, start_time)
  sleep 0.1
end

I'm not going to be spending any time talking about the repainting code today, because it's not essential for understanding the topic at hand.

$animation = %w[| / - \\ | / - \\]
def repaint(pids, start_time)
  print "\e[H\e[2J"
  elapsed = Time.now - start_time
  mins, secs = elapsed.to_i.divmod(60)
  printf "Transcoding... %s %02d:%02d\n\n", $animation.rotate!.first, mins, secs
  template = "%-10s%s\n"
  header   = template % ["PID", "Time"]
  print header
  puts "-" * header.size
  pids.each do |pid, status|
    printf template, pid, status.to_s.upcase
  end
end

This is a classic, simple user interface design pattern. We're just repeatedly displaying the current state of the world, on a set schedule. What's notably missing from the loop now is any code to check on the child processes and update the state of the world.

What we need is a way to asynchronously update the PID statuses as the child processes finish. At this point, we might consider kicking off some monitoring threads to wait on the PIDs and report when they quit. But so long as we're running on a UNIX-style host, there's a much easier way to accomplish our goal.

What we need to do is trap a signal: the SIGCHLD signal.

Whenever a child process exits, the operating system sends us this signal. (If you're not familiar with how trapping signals works, check out episodes #353 and #354.)

All we have to do now is figure out which process just terminated. Remember, one of the limitations to operating system signals is that they can't carry any extra information with them.

To get the PID, we once again use Process.waitpid.

We pass negative one, as before, indicating that it should look for any exited children of this process.

Then we update the corresponding field in the PIDs hash to the :done status.

This is straightforward enough. And we don't even need to worry about multithreaded race conditions, because we know that the nature of signal handlers is that the rest of the process will be frozen while the signal is being handled.

trap("CHLD") do
  pid = Process.waitpid(-1)
  pids[pid] = :done
end

Unfortunately, there's one teensie little complication. It turns out that if many child processes exit around the same time, the operating system is allowed to optimize by batching them all into a single SIGCHLD signal. Which means we have to account for the possibility that more than one child process is waiting to have its status harvested.

Fortunately this is a pretty simple change. First, we add the Process::WNOHANG constant as a second argument to waitpid.

This flag subverts the usual behavior of waitpid: if there are no more terminated processes waiting to be dealt with, it will immediately return nil instead of waiting for another process to finish.

Then we wrap the whole handler in a while loop, which will repeat until all the pids are marked :done or until waitpid returns nil because there are currently no processes left to be handled.

require "fileutils"
require "./video"
require "./display"

pids = {}

trap("CHLD") do
  while pids.values.include?(:working) &&
        pid = Process.waitpid(-1, Process::WNOHANG)
    pids[pid] = :done
  end
end

start_time = Time.now
FileUtils.mkpath "previews"
ARGV.each do |input_file|
  command = avconv_preview_command(input_file)
  pid = Process.spawn(command)
  pids[pid] = :working
end

until pids.values.all?{|v| v == :done}
  repaint(pids, start_time)
  sleep 0.1
end

Let's run this third version.

Transcoding... | 00:04

PID       Time
---------------
6824      DONE
6825      DONE
6827      DONE
6829      WORKING

This time, we get an animated, live display of the status of the child processes. We can watch as each process finishes, and the program's main thread is never frozen by the need to wait for a process to exit.

In order to get all the processes marked as DONE without adding too much complexity to the code, we'll add a last call to repaint

until pids.values.all?{|v| v == :done}
  repaint(pids, start_time)
  sleep 0.1
end

repaint(pids, start_time)

And one last run

Transcoding... \ 00:05

PID       Time
---------------
6839      DONE
6840      DONE
6842      DONE
6844      DONE

I know that the code we've written today hasn't been the simplest. Working with processes using system calls can be a slightly messy prospect, filled with edge cases that have to be accounted for. But when we persevere, the results can be very powerful. We can be instantly notified about changes to child process state, without the need to resort to threaded code to track the many different things going on at once.

Happy hacking!

Responses