Subprocesses Part 5: SIGCHLD
Video transcript & code
So far in this miniseries on subprocesses, we've focused mostly on how to start processes. But we can't always kick off a process and then ignore it. Often, we need a way to keep tabs on it as it does its work.
Here's an example.
We recently wrote some code that kicked off a group of parallel video encoding processes.
Since then we've moved the code that constructs the full avconv
command into its own method.
def avconv_preview_command(input_file)
fade_args = "fade=type=in:nb_frames=15, fade=type=out:start_frame=870"
output_file = "./previews/" +
File.basename(input_file, ".mp4") +
"-preview" +
File.extname(input_file)
%W[avconv
-ss 30
-i #{input_file}
-y
-strict -2
-t 30
-vf '#{fade_args}'
#{output_file}
> /dev/null 2>&1
].join(" ")
end
Once again, we're not going to dig into this code in detail. What's important is that we use this code to construct shell commands, which we then execute asynchronously using Process.spawn
.
In the process, we collect process IDs, or PIDs.
Then we loop over the PIDs and wait for each one to finish
require "fileutils"
require "./video"
pids = []
FileUtils.mkpath "previews"
ARGV.each do |input_file|
command = avconv_preview_command(input_file)
pid = Process.spawn(command)
puts "Started process #{pid}"
pids << pid
end
pids.each do |pid|
puts "Waiting for process #{pid}"
Process.waitpid(pid)
end
This works OK. We can kick it off with a few video files as input.
We see all the processes begin at once.
Then we watch as the code waits for a while on the first PID.
Once the first process finishes, the other PIDs are collected in short order.
$ ./mkpreviews *.mp4 Started process 5201 Started process 5202 Started process 5203 Started process 5205 Waiting for process 5201 Waiting for process 5202 Waiting for process 5203 Waiting for process 5205
This code provides some very course-grained monitoring of our child processes. But it's not really ideal.
The most glaring problem is that we simply loop over out list of PIDs to wait on from beginning to end.
But there's no reason to assume that the first process that started is going to be the first one to finish. Each process has its own workload. One of the reasons that we see the processes finish all in a rush at the end may be that some of them had already finished while we were waiting for the first one.
It would be preferable if we could wait for any process to finish and report it as soon as it happens. Happily, this doesn't require a lot of rework to achieve.
Instead of looping over the PIDs, we establish a do-while loop, which we learned about in episode #073.
And then instead of waiting on a specific PID, we supply the special flag value -1
to Process.waitpid
.
This tells waitpid
to wait for any child process to finish.
In order to determine which process finished, we capture the return value of waitpid
.
Then we delete it from the list of PIDs we're waiting on.
require "fileutils"
require "./video"
pids = []
FileUtils.mkpath "previews"
ARGV.each do |input_file|
command = avconv_preview_command(input_file)
pid = Process.spawn(command)
pids << pid
puts "Started process #{pid}"
end
begin
pid = Process.waitpid(-1)
puts "Process #{pid} is done!"
pids.delete(pid)
end until pids.empty?
puts "All done!"
When we run this version, we don't see a huge difference from the first version.
But if we look carefully at the final output, we can see that the processes ended in a slightly different order than they began. This was probably happening in the first version as well, but this time we actually tracked the process terminations in the order they occurred.
This is an improvement, because now we have a more accurate indication of which processes finish when.
But it still means blocking our whole main process while we wait for processes to stop. What if we want the program to do something else while our jobs run?
As a very simple example, let's say we want to display an animated, running status dashboard as the processes do their work.
To do this, we start by changing the format of our PID list, from an array to a hash.
We'll make the PIDs the keys in the hash, and map them to a status value, either :working
or :done
.
Then we once again rewrite our waiting loop. This time, it repeats until all of the PIDs are marked done.
Inside the loop, we just repaint the screen, then sleep for a tenth of a second, then repeat.
pids = {}
start_time = Time.now
FileUtils.mkpath "previews"
ARGV.each do |input_file|
command = avconv_preview_command(input_file)
pid = Process.spawn(command)
pids[pid] = :working
end
until pids.values.all?{|v| v == :done}
repaint(pids, start_time)
sleep 0.1
end
I'm not going to be spending any time talking about the repainting code today, because it's not essential for understanding the topic at hand.
$animation = %w[| / - \\ | / - \\]
def repaint(pids, start_time)
print "\e[H\e[2J"
elapsed = Time.now - start_time
mins, secs = elapsed.to_i.divmod(60)
printf "Transcoding... %s %02d:%02d\n\n", $animation.rotate!.first, mins, secs
template = "%-10s%s\n"
header = template % ["PID", "Time"]
print header
puts "-" * header.size
pids.each do |pid, status|
printf template, pid, status.to_s.upcase
end
end
This is a classic, simple user interface design pattern. We're just repeatedly displaying the current state of the world, on a set schedule. What's notably missing from the loop now is any code to check on the child processes and update the state of the world.
What we need is a way to asynchronously update the PID statuses as the child processes finish. At this point, we might consider kicking off some monitoring threads to wait on the PIDs and report when they quit. But so long as we're running on a UNIX-style host, there's a much easier way to accomplish our goal.
What we need to do is trap a signal: the SIGCHLD
signal.
Whenever a child process exits, the operating system sends us this signal. (If you're not familiar with how trapping signals works, check out episodes #353 and #354.)
All we have to do now is figure out which process just terminated. Remember, one of the limitations to operating system signals is that they can't carry any extra information with them.
To get the PID, we once again use Process.waitpid
.
We pass negative one, as before, indicating that it should look for any exited children of this process.
Then we update the corresponding field in the PIDs hash to the :done
status.
This is straightforward enough. And we don't even need to worry about multithreaded race conditions, because we know that the nature of signal handlers is that the rest of the process will be frozen while the signal is being handled.
trap("CHLD") do
pid = Process.waitpid(-1)
pids[pid] = :done
end
Unfortunately, there's one teensie little complication. It turns out that if many child processes exit around the same time, the operating system is allowed to optimize by batching them all into a single SIGCHLD
signal. Which means we have to account for the possibility that more than one child process is waiting to have its status harvested.
Fortunately this is a pretty simple change. First, we add the Process::WNOHANG
constant as a second argument to waitpid
.
This flag subverts the usual behavior of waitpid
: if there are no more terminated processes waiting to be dealt with, it will immediately return nil
instead of waiting for another process to finish.
Then we wrap the whole handler in a while
loop, which will repeat until all the pids are marked :done
or until waitpid
returns nil
because there are currently no processes left to be handled.
require "fileutils"
require "./video"
require "./display"
pids = {}
trap("CHLD") do
while pids.values.include?(:working) &&
pid = Process.waitpid(-1, Process::WNOHANG)
pids[pid] = :done
end
end
start_time = Time.now
FileUtils.mkpath "previews"
ARGV.each do |input_file|
command = avconv_preview_command(input_file)
pid = Process.spawn(command)
pids[pid] = :working
end
until pids.values.all?{|v| v == :done}
repaint(pids, start_time)
sleep 0.1
end
Let's run this third version.
Transcoding... | 00:04 PID Time --------------- 6824 DONE 6825 DONE 6827 DONE 6829 WORKING
This time, we get an animated, live display of the status of the child processes. We can watch as each process finishes, and the program's main thread is never frozen by the need to wait for a process to exit.
In order to get all the processes marked as DONE
without adding too much complexity to the code, we'll add a last call to repaint
until pids.values.all?{|v| v == :done}
repaint(pids, start_time)
sleep 0.1
end
repaint(pids, start_time)
And one last run
Transcoding... \ 00:05 PID Time --------------- 6839 DONE 6840 DONE 6842 DONE 6844 DONE
I know that the code we've written today hasn't been the simplest. Working with processes using system calls can be a slightly messy prospect, filled with edge cases that have to be accounted for. But when we persevere, the results can be very powerful. We can be instantly notified about changes to child process state, without the need to resort to threaded code to track the many different things going on at once.
Responses