In Progress
Unit 1, Lesson 21
In Progress

Subprocesses Part 3: Spawn

Video transcript & code

In the first episode of this series, we saw how the Kernel.system method was great for executing subprocesses where we didn't care about the output. But we also learned that the system method is not so great when the command being run might take a while to complete.

For instance, here's a script for generating short 30-second preview files from a list of MP4 video files.

I'm not going to go into detail about how this is accomplished. All you need to understand is that this script assembles a fairly elaborate invocation of the avconv utility.

It first echoes the command to standard output, and then executes it using system.

fade_args = "fade=type=in:nb_frames=15, fade=type=out:start_frame=870"

ARGV.each do |input_file|
  output_file = File.basename(input_file, ".mp4") +
                "-preview" +
                File.extname(input_file)
  command = %W[avconv
               -ss 30
               -i #{input_file}
               -y
               -strict -2
               -t 30
               -vf '#{fade_args}'
               #{output_file}
               > /dev/null 2>&1
            ].join(" ")
  puts command
  system(command)
end

When we run this command with some video files as input, we see the first command echoed.

Then we have to wait for that command to finish before the next command starts.

…and so on, until it runs out of input files.

$ ./mkpreviews4 example/*.mp4
avconv -ss 30 -i example/408-leaning-toothpicks.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15
, fade=type=out:start_frame=870' 408-leaning-toothpicks-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/409-optional-gem.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, fade
=type=out:start_frame=870' 409-optional-gem-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/410-stay-positive.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, fad
e=type=out:start_frame=870' 410-stay-positive-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/411-stay-positive-2.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, f
ade=type=out:start_frame=870' 411-stay-positive-2-preview.mp4 > /dev/null 2>&1
mple/*.mp4~/Dropbox/rubytapas-shared/working-episodes/417-subprocesses-part-3-spawn$ ./mkpreviews exa

But this seems inefficient. After all, everyone has multiple processor cores these days. And it's not like the processes are dependent on each other. Can't we just kick off each process in the background, and immediately move on to starting the next one?

In fact we can. In order to do so, we have to switch from using system to using Process.spawn.

fade_args = "fade=type=in:nb_frames=15, fade=type=out:start_frame=870"

ARGV.each do |input_file|
  output_file = File.basename(input_file, ".mp4") +
                "-preview" +
                File.extname(input_file)
  command = %W[avconv
               -ss 30
               -i #{input_file}
               -y
               -strict -2
               -t 30
               -vf '#{fade_args}'
               #{output_file}
               > /dev/null 2>&1
            ].join(" ")
  puts command
  Process.spawn(command)
end

When we run this version, we see that all the commands start executing immediately.

$ ./mkpreviews2 example/*.mp4
avconv -ss 30 -i example/408-leaning-toothpicks.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15
, fade=type=out:start_frame=870' 408-leaning-toothpicks-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/409-optional-gem.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, fade
=type=out:start_frame=870' 409-optional-gem-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/410-stay-positive.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, fad
e=type=out:start_frame=870' 410-stay-positive-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/411-stay-positive-2.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, f
ade=type=out:start_frame=870' 411-stay-positive-2-preview.mp4 > /dev/null 2>&1

After all the avconv commands have been kicked off, the script exits.

Is this what we want to happen? Maybe. But maybe not. Maybe we'd like all these conversion processes to start up in parallel, but still have the script wait until they are done.

When we kick off a command with Process.spawn, the method returns the process ID, or "PID", of the new process.

Process.spawn("ls /")
# => 15002

# >> bin
# >> boot
# >> cdrom
# >> dev
# >> etc
# >> home
# >> initrd.img
# >> lib
# >> lib32
# >> lib64
# >> lost+found
# >> media
# >> mnt
# >> opt
# >> proc
# >> root
# >> run
# >> sbin
# >> snap
# >> srv
# >> sys
# >> tapas
# >> tmp
# >> usr
# >> var
# >> vmlinuz

When we have a PID, we can tell Ruby to wait until that process exits, using Process.waitpid.

If the process has already exited at the point of the waitpid, that's fine. It'll just return immediately.

Once a process has exited, we can use the special child status variable to learn more about how that process terminated.

pid = Process.spawn("ls /")
# => 24966
Process.waitpid(pid)
# => 24966
$?
# => #<Process::Status: pid 24966 exit 0>

# >> bin
# >> boot
# >> cdrom
# >> dev
# >> etc
# >> home
# >> initrd.img
# >> lib
# >> lib32
# >> lib64
# >> lost+found
# >> media
# >> mnt
# >> opt
# >> proc
# >> root
# >> run
# >> sbin
# >> snap
# >> srv
# >> sys
# >> tapas
# >> tmp
# >> usr
# >> var
# >> vmlinuz

Now that we know this, let's update our script. We'll set up an array to collect PIDs.

Each time we spawn a new process, we'll capture the PID into our list.

Then, once all of the processes are spun up, we'll go over our list of PIDs.

For each one, we'll output that we are waiting on that process.

Then we'll use Process.waitpid to wait until the process identified by that PID has finished.

fade_args = "fade=type=in:nb_frames=15, fade=type=out:start_frame=870"

pids = []

ARGV.each do |input_file|
  output_file = File.basename(input_file, ".mp4") +
                "-preview" +
                File.extname(input_file)
  command = %W[avconv
               -ss 30
               -i #{input_file}
               -y
               -strict -2
               -t 30
               -vf '#{fade_args}'
               #{output_file}
               > /dev/null 2>&1
            ].join(" ")
  puts command
  pids << Process.spawn(command)
end

pids.each do |pid|
  puts "Waiting for process #{pid}"
  Process.waitpid(pid)
end

Let's run this.

All of the processes are kicked off immediately just like before. But this time, the program then pauses, waiting for the first process to finish.

After that process is done, it waits for the next, and the next, and so on.

$ ./mkpreviews3 example/*.mp4
avconv -ss 30 -i example/408-leaning-toothpicks.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15
, fade=type=out:start_frame=870' 408-leaning-toothpicks-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/409-optional-gem.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, fade
=type=out:start_frame=870' 409-optional-gem-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/410-stay-positive.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, fad
e=type=out:start_frame=870' 410-stay-positive-preview.mp4 > /dev/null 2>&1
avconv -ss 30 -i example/411-stay-positive-2.mp4 -y -strict -2 -t 30 -vf 'fade=type=in:nb_frames=15, f
ade=type=out:start_frame=870' 411-stay-positive-2-preview.mp4 > /dev/null 2>&1
Waiting for process 26747
Waiting for process 26748
Waiting for process 26749
Waiting for process 26752

But maybe we decide we don't want the script to wait for these processes after all.

So we go through and remove all of our PID-waiting code.

There's actually a potential problem with this code as it is. Remember earlier, when we took a look at the exit status information of a subprocess after waiting for it to end?

The operating system doesn't know if and when we might use waitpid to capture a child process' exit status. So it has to keep that information around indefinitely, just in case. Even if the subprocess exited long ago.

Processes like this, which are no longer running, but for which the OS still has a record, are known as zombie processes. Zombies clutter up process listings, and they can also use up system resources.

If we kick off a process and we really don't care what happens to it after that, there's a way to do it politely, such that we don't leave any zombies behind.

To do this, we send Process.detach with the PID of the new process.

fade_args = "fade=type=in:nb_frames=15, fade=type=out:start_frame=870"

ARGV.each do |input_file|
  output_file = File.basename(input_file, ".mp4") +
                "-preview" +
                File.extname(input_file)
  command = %W[avconv
               -ss 30
               -i #{input_file}
               -y
               -strict -2
               -t 30
               -vf '#{fade_args}'
               #{output_file}
               > /dev/null 2>&1
            ].join(" ")
  puts command
  pid = Process.spawn(command)
  Process.detach(pid)
end

In the background, this starts a Ruby thread to automatically "reap" the subprocess status information when it exits. Once the OS knows we have this information, it can clear it out of the process table.

So, that's the Process.spawn method. It's useful for starting up asynchronous child processes. We can then either check in on those processes later on; or, using Process.detach, we can disavow any interest in them and let them run to completion on their own.

Process.spawn also has a number of interesting options that can be specified along with the command to run. But that's a topic for another day. Happy hacking!

Responses