In Progress
Unit 1, Lesson 1
In Progress

Subprocesses Part 15: Capture

Ruby’s backtick operator is great for quick-and-dirty output capture, but what do you use when you need more power? In this episode you’ll learn to use Ruby’s Open3#capture* family of methods for precise control over executing and making use of output from subprocesses.

Video transcript & code

Ruby has always been optimized to make a good "glue" language: a programming language for gluing other programs together. As part of its "glue nature", Ruby makes it very easy to capture output from another program's execution. All we have to do is put the program command line in backticks, and the result of the expression will be the command's output as a string.

output = `cowsay "Quack"`
puts output

# >>  _______
# >> < Quack >
# >>  -------
# >>         \   ^__^
# >>          \  (oo)\_______
# >>             (__)\       )\/\
# >>                 ||----w |
# >>                 ||     ||

As convenient as the backtick operator is for quick scripts, it has some significant limitations.

First, it only captures the command's standard output stream. Anything the subprocess writes to standard error will go to the Ruby program's standard error stream, which by default is the console. For programs that normally output information to both standard output and standard error, such as the curl command, this can be a problem.

page = `curl -v google.com`

# !> * About to connect() to google.com port 80 (#0)
# !> *   Trying 216.58.216.238...
# !>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
# !>                                  Dload  Upload   Total   Spent    Left  Speed
# !>
# !>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* connected
# !> * Connected to google.com (216.58.216.238) port 80 (#0)
# !> > GET / HTTP/1.1
# !>
# !> > User-Agent: curl/7.28.1
# !>
# !> > Host: google.com
# !>
# !> > Accept: */*
# !>
# !> >
# !>
# !> < HTTP/1.1 301 Moved Permanently
# !>
# !> < Location: http://www.google.com/
# !>
# !> < Content-Type: text/html; charset=UTF-8
# !>
# !> < Date: Fri, 18 Aug 2017 14:02:02 GMT
# !>
# !> < Expires: Sun, 17 Sep 2017 14:02:02 GMT
# !>
# !> < Cache-Control: public, max-age=2592000
# !>
# !> < Server: gws
# !>
# !> < Content-Length: 219
# !>
# !> < X-XSS-Protection: 1; mode=block
# !>
# !> < X-Frame-Options: SAMEORIGIN
# !>
# !> <
# !>
# !> { [data not shown]
# !>
# !> 100   219  100   219    0     0    237      0 --:--:-- --:--:-- --:--:--   667
# !> * Connection #0 to host google.com left intact
# !> * Closing connection #0

Second, the command supplied in the backticks will be run through the system shell. As we talked about in Episode #389, this can lead to security vulnerabilities when dealing with user-supplied data, as well as to confusing behavior and code that fails to function reliably across platforms.

But unlike Ruby's other methods for starting subprocesses, with the backticks there is no way to supply a command line as a list. Which means we have no way to tell Ruby to skip the shell intermediary, and execute the command directly.

system("curl", "-v", "google.com")

The third drawback of the backticks is that there is no way to pass extra process-spawning arguments. For instance, with Ruby's other process-spawning methods, we can specify the initial working directory for the process. We can't to that with backticks.

system("curl -v --trace trace.txt google.com", chdir: "tmp")

So… how do we capture subprocess output as a string, while retaining more control over how the subprocess is executed?

The answer is in Ruby's open3 library.

Using Open3, we can invoke the capture2 method.. This method accepts all of the usual Process.spawn argument conventions, including taking a list of separate command line parts,, and various keyword arguments to modify the execution environment.

This method will return two values: the subprocess output and the process exit status.

When we run this, we can see that the exit status was a successful zero.

We also see that the output written to the standard error stream is not captured, and instead goes straight to our original main program's standard error. This is the same behavior we saw with backticks, and we'll look at a way to change it in a moment.

require "open3"
output, status =
  Open3.capture2("curl", "-v", "--trace", "trace.txt", "google.com",
                 chdir: "tmp")
status
# => #<Process::Status: pid 40988 exit 0>

# !> Warning: --trace overrides an earlier trace/verbose option
# !>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
# !>                                  Dload  Upload   Total   Spent    Left  Speed
# !>
# !>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
# !> 100   219  100   219    0     0    259      0 --:--:-- --:--:-- --:--:--   425

First let's check the content of the output variable. We see that it is a string containing everything the subprocess wrote to standard out.

output
# => "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n" +
#    "<TITLE>301 Moved</TITLE></HEAD><BODY>\n" +
#    "<H1>301 Moved</H1>\n" +
#    "The document has moved\n" +
#    "<A HREF=\"http://www.google.com/\">here</A>.\n" +
#    "</BODY></HTML>\n"

What if we want to capture a full log of everything the process wrote to standard output and standard error? To make that happen, all we have to do is change from calling capture2 to calling capture2e.

When we do this, both output streams are merged into the resulting output string.

require "open3"
output, status =
  .capture2e("curl", "-v",
             "--trace", "trace.txt", "google.com", chdir: "tmp")

output
# => "Warning: --trace overrides an earlier trace/verbose option\n" +
#    "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n" +
#    "                                 Dload  Upload   Total   Spent    Left  Speed\n" +
#    "\n" +
#    "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n" +
#    "100   219  100   219    0     0    241      0 --:--:-- --:--:-- --:--:--   341\n" +
#    "100   219  100   219    0     0    241      0 --:--:-- --:--:-- --:--:--   341<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n" +
#    "<TITLE>301 Moved</TITLE></HEAD><BODY>\n" +
#    "<H1>301 Moved</H1>\n" +
#    "The document has moved\n" +
#    "<A HREF=\"http://www.google.com/\">here</A>.\n" +
#    "</BODY></HTML>\n" +
#    "\n"

Another possibility is that we want to capture both streams, but we need to keep them separate. In that case, we switch to using capture3. This method returns three values: a string for standard out, a string for standard error, and an exit status.

This time when we take a look at the contents of output, it contains only the HTML that curl downloaded. And the err variable contains just the status information that curl dumped as it was running.

require "open3"
output, err, status =
  Open3.capture3("curl", "-v",
                 "--trace", "trace.txt", "google.com", chdir: "tmp")
output
# => "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n" +
#    "<TITLE>301 Moved</TITLE></HEAD><BODY>\n" +
#    "<H1>301 Moved</H1>\n" +
#    "The document has moved\n" +
#    "<A HREF=\"http://www.google.com/\">here</A>.\n" +
#    "</BODY></HTML>\n"

err
# => "Warning: --trace overrides an earlier trace/verbose option\n" +
#    "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n" +
#    "                                 Dload  Upload   Total   Spent    Left  Speed\n" +
#    "\n" +
#    "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n" +
#    "100   219  100   219    0     0    333      0 --:--:-- --:--:-- --:--:--   737\n"

For quick one-off scripts, the Ruby backticks are often all we need to grab the output of a command. But when we need more safety and more flexibility, the open3 library makes available all the variations on process output-capturing that we are likely to need. Happy hacking!

Responses