In Progress
Unit 1, Lesson 21
In Progress

Subprocesses Part 13: Detach, Wait, Kill

If you are starting subprocesses in Ruby, you also need to know how to cleanly shut them down. Different applications have different needs, so in this episode, you’ll learn three different approaches for managing child process lifetime.

Video transcript & code

In a previous episode, we wrote some code to implement a simple forking TCP server.

In order to service multiple clients simultaneously, every time it accepts a new connection , it immediately forks the process and creates a new child dedicated to handling that connection.

require "socket"

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  puts "Spawned #{child_pid}"
end

One thing we haven't done in this code is any kind of clean-up for child processes which have finished. That was a little sloppy of us. Today, we're going to examine a few different ways of handling child process termination.

But first, let's talk about how the parent process is shut down.

At the moment, the code accepts new connections forever. The only way to get it to stop is to send an interrupt signal with Ctrl-C.

$ ruby server.rb
\* Accepting connections
server.rb:12:in `accept': Interrupt
    from server.rb:12:in `block in <main>'
    from server.rb:11:in `loop'
    from server.rb:11:in `<main>'
interrupt

Let's make it possible to exit this process cleanly. We'll add a new background thread. To ensure we see any problems in the thread, we'll enable the abort_on_exception option we learned about in Episode #136. Then, we'll have the thread wait for the user to press enter at the console. Once they do, the causes the whole process to exit.

require "socket"

at_exit do
 puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  puts "Spawned #{child_pid}"
end

Now we can fire up the server and when we press enter, it shuts itself down.

$ ruby server2.rb
Accepting connections

Shutting down

Let's start it up again. This time, we'll connect to it from a telnet program. Then we'll close the telnet session.

We repeat this process again two more times.

The server terminal now shows that three child processes have been spawned.

But now that we have closed the telnet sessions associated with each child process, what status are they in? Let's find out using the ps command.

$ ps ux 25380 25493 25610
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
avdi     25380  0.0  0.0      0     0 pts/6    Z+   09:09   0:00 [ruby] <defunct>
avdi     25493  0.0  0.0      0     0 pts/6    Z+   09:09   0:00 [ruby] <defunct>
avdi     25610  0.0  0.0      0     0 pts/6    Z+   09:09   0:00 [ruby] <defunct>

If we look under the "STAT" column, we see a "Z" for each process. This is short for "zombie".

What is a zombie process? Well, as the name suggests, it means that these processes are dead. But the operating system knows that sometimes, parent processes need to check up on the status of child processes even after they die, to see what the exit status of each child was.

For that reason, the operating system can't remove the child processes from the process table. Like the custodian of a police evidence locker, it is keeping them around just in case the parent process ever decides to ask the operating system about the circumstances of their death.

If the parent process is a long-running daemon that spawns a lot of short-lived children, then zombie processes can clutter up the process table over time. If we know we don't care about the exit status of a child process, we should let the operating system know this too, so that it doesn't have to keep process information around indefinitely just in case.

To do that, we call Process.detach, giving it the PID of the child process, right after spawning it.

require "socket"

at_exit do
 puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  Process.detach(child_pid)
  puts "Spawned #{child_pid}"
end

Let's restart the server process. Then let's start up a few connections, and immediately shut them back down.

This time when we check the process table for information about the now-exited connection processes, there is nothing to be found. Since we detached the processes after starting them, the operating system hasn't kept any process info around. We've prevented zombie processes from being created.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

Let's start up another client connection. Now, without ending that connection, let's hit enter to shut down the server process. Back in the telnet client, we enter a few numbers and see the counter incremented.

With this reveals is that we can kill the parent process and the child goes right on executing as if nothing has happened.

In some cases, this may be what we want. But for other applications we might prefer that the parent process doesn't shut down until all of its children have finished.

To make this possible, we need to set things up a little differently.

First, we'll initialize a new variable to hold a list of child process IDs.

Then, instead of detaching the child process after forking, we add its process ID to the list.

In the thread that waits on user input to shutdown the server, we add a new loop over the PID list. For each child process ID, we log the fact that we are waiting on it. Then we call Process.wait with the PID.

What does it mean to wait on process? in the language of the UNIX process API, which Ruby's Process module inherits, it means to wait for that process to end.

require "socket"

pids = []

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  pids.each do |pid|
    puts "Waiting on PID #{pid}..."
    Process.wait(pid)
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  pids << child_pid
  puts "Spawned #{child_pid}"
end

Let's start up this new version of the server.

Then we'll establish a new client connection. Which we immediately close back down. After which we create a second client connection. We'll leave this one open for the moment.

In the terminal running our server process, we hit enter to tell it to shut down. it tells us that it is waiting for the first child process, but then immediately moves on to waiting on the second one.

What this shows is that when a child process has already terminated, a call to Process.wait with that process ID will immediately return.

Now let's quit out of the second telnet connection.

As soon as we do this, we can see that the server process is no longer waiting, and it finishes shutting down.

$ ruby server4.rb
Accepting connections (server PID: 11195)
Spawned 11420
Spawned 11649

Waiting on PID 11420...
Waiting on PID 11649...
Shutting down

OK, so now we know how to delay the shutdown of the server process until all of the connection handler processes have finished.

But this means that server shutdown may pause indefinitely, waiting for clients to close their connections. In some cases this might be what we want. But in other cases we might prefer that shutting down the server also terminates all the active client connections.

It's easy to modify our program to shut down child processes at the same time the server is shut down. All we need to do is add a call to Process.kill.

Despite its name, kill doesn't inherently terminate processes. Instead, it's the somewhat misleading name inherited from the UNIX process API for sending a signal to another process. For our purposes, we'll go with the "interrupt" signal , which is the signal processes normally receive when we type Ctrl-C at the terminal. As we saw in Episode #353, the default behavior for a Ruby process when receiving an interrupt signal is to terminate with an Interrupt exception.

Now we just need to specify which processes should receive this interrupt. The Process.kill method can accept multiple process IDs as separate arguments, so we just splat out our list of PIDs.

require "socket"

pids = []

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  puts "Killing connections..."
  Process.kill("INT", *pids)
  pids.each do |pid|
    puts "Waiting on PID #{pid}..."
    Process.wait(pid)
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  pids << child_pid
  puts "Spawned #{child_pid}"
end

Let's run the server , then start up a couple of telnet connections to it in separate terminals.

Back in the server terminal, we tell it its time to shut down. Immediately, we can see that both telnet sessions are disconnected, as their handling processes are interrupted.

$ ruby server5.rb
Accepting connections (server PID: 16179)
Spawned 16551
Spawned 17031

Killing connections...
Waiting on PID 16551...
Shutting down
server4.2.rb:63:in `gets': Interrupt
    from server4.2.rb:63:in `block (2 levels) in <main>'
    from server4.2.rb:60:in `fork'
    from server4.2.rb:60:in `block in <main>'
    from server4.2.rb:58:in `loop'
    from server4.2.rb:58:in `<main>'
Waiting on PID 17031...
Shutting down
server4.2.rb:63:in `gets': Interrupt
    from server4.2.rb:63:in `block (2 levels) in <main>'
    from server4.2.rb:60:in `fork'
    from server4.2.rb:60:in `block in <main>'
    from server4.2.rb:58:in `loop'
    from server4.2.rb:58:in `<main>'
Shutting down

We also see a lot of shutdown noise in the server terminal. This is because by default, subprocesses still inherit their parent process' standard output and standard error streams. So when the child processes are interrupted, they dump their stack traces to the parent process' terminal.

Let's make the child processes shut down less messily when they are interrupted. To do that, we'll add a call to trap after each connection handler process is forked. It will handle the interrupt signal by exiting the program.

If you're not familiar with trapping signals, check out Episode #353 and Episode #354 to learn all about it.

require "socket"

pids = []

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  puts "Killing connections..."
  Process.kill("INT", *pids)
  pids.each do |pid|
    puts "Waiting on PID #{pid}..."
    Process.wait(pid)
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    trap("INT") do exit end
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  pids << child_pid
  puts "Spawned #{child_pid}"
end

This time, when we re-start the server, fire up some connections, and then kill the server, we don't see as much output.

$ ruby server6.rb
Accepting connections (server PID: 21296)
Spawned 21485
Spawned 21562

Killing connections...
Waiting on PID 21485...
Shutting down
Shutting down
Waiting on PID 21562...
Shutting down

We still do see "Shutting down" messages from both the parent and child processes. Let's take one last step to eliminate child process output unless they experience an error.

In the child process code, we'll use what we learned in Episode #501 to redirect the standard output stream into the system's null device.

require "socket"

pids = []

at_exit do
  puts "Shutting down"
end

Thread.new do
  Thread.current.abort_on_exception = true
  $stdin.getc
  puts "Killing connections..."
  Process.kill("INT", *pids)
  pids.each do |pid|
    puts "Waiting on PID #{pid}..."
    Process.wait(pid)
  end
  exit
end

server = TCPServer.new("localhost", 2000)

puts "Accepting connections (server PID: #{$$})"
loop do
  client = server.accept
  child_pid = fork do
    trap("INT") do exit end
    $stdout.reopen(File::NULL)
    count = 0
    client.puts "Hello. I'm #{$$}, and I'll be your server"
    while (input = client.gets)
      count += input.to_i
      client.puts "Count: #{count}"
    end
    client.close
  end
  pids << child_pid
  puts "Spawned #{child_pid}"
end

One more time, we restart the server, and make some connections to it. One more time, we shut it down.

This time, all we see is parent process output, as the child processes are cleanly shut down behind the scenes.

 $ ruby server7.rb
Accepting connections (server PID: 25585)
Spawned 25853
Spawned 25930

Killing connections...
Waiting on PID 25853...
Waiting on PID 25930...
Shutting down

In today's episode, we've learned about three different strategies for handling child process lifetime:

  1. If we don't care about what happens to child processes, we can detach them.
  2. If we want the parent process to wait on its children for completion, we can use Process.wait.
  3. And if we prefer to have children shut down when the parent process shuts down, we can send them appropriate signals.

While the example TCP server we've used today uses a UNIX-only forking model to spawn subprocesses, all three of the strategies we've seen for dealing with subprocess shutdown should work across most platforms.

In an upcoming episode we'll talk more about some advanced features for waiting on processes. But for now… happy hacking!

Responses