In Progress
Unit 1, Lesson 21
In Progress

Subprocesses Part 4: Redirection

Video transcript & code

Recently we introduced at Ruby's Process.spawn method, which kicks off a new subprocess that runs independently of the parent Ruby process. spawn can take a number of interesting options. Today we're going to dig into just one set of options: the ones having to do with input and output redirection.

In a previous episode we used the example of the avprobe command, which dumps all of its output to the standard error stream.

Standard error output is indicated here by the bang-wakas at the beginning of each line.

pid = Process.spawn "avprobe", "410-stay-positive.mp4"
Process.waitpid(pid)

# !> ffprobe version 2.8.6-1ubuntu2 Copyright (c) 2007-2016 the FFmpeg developers
# !>   built with gcc 5.3.1 (Ubuntu 5.3.1-11ubuntu1) 20160311
# !>   configuration: --prefix=/usr --extra-version=1ubuntu2 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
# !>   libavutil      54. 31.100 / 54. 31.100
# !>   libavcodec     56. 60.100 / 56. 60.100
# !>   libavformat    56. 40.101 / 56. 40.101
# !>   libavdevice    56.  4.100 / 56.  4.100
# !>   libavfilter     5. 40.101 /  5. 40.101
# !>   libavresample   2.  1.  0 /  2.  1.  0
# !>   libswscale      3.  1.101 /  3.  1.101
# !>   libswresample   1.  2.101 /  1.  2.101
# !>   libpostproc    53.  3.100 / 53.  3.100
# !> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '410-stay-positive.mp4':
# !>   Metadata:
# !>     major_brand     : mp42
# !>     minor_version   : 0
# !>     compatible_brands: mp42mp41
# !>     creation_time   : 2016-04-22 15:10:53
# !>   Duration: 00:03:00.95, start: 0.000000, bitrate: 1117 kb/s
# !>     Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720, 923 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
# !>     Metadata:
# !>       creation_time   : 2016-04-22 15:10:53
# !>       handler_name    : Alias Data Handler
# !>       encoder         : AVC Coding
# !>     Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s (default)
# !>     Metadata:
# !>       creation_time   : 2016-04-22 15:10:53
# !>       handler_name    : Alias Data Handler

Say we want to redirect the output to the standard output stream instead.

We can do that by supplying an option mapping the symbol :err to the symbol :out.

If we look carefully, we can see that the lines are now prefixed with waka-wakas, indicating that the output has been switched to the standard output stream.

pid = Process.spawn "avprobe", "410-stay-positive.mp4", :err => :out
Process.waitpid(pid)

# >> ffprobe version 2.8.6-1ubuntu2 Copyright (c) 2007-2016 the FFmpeg developers
# >>   built with gcc 5.3.1 (Ubuntu 5.3.1-11ubuntu1) 20160311
# >>   configuration: --prefix=/usr --extra-version=1ubuntu2 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
# >>   libavutil      54. 31.100 / 54. 31.100
# >>   libavcodec     56. 60.100 / 56. 60.100
# >>   libavformat    56. 40.101 / 56. 40.101
# >>   libavdevice    56.  4.100 / 56.  4.100
# >>   libavfilter     5. 40.101 /  5. 40.101
# >>   libavresample   2.  1.  0 /  2.  1.  0
# >>   libswscale      3.  1.101 /  3.  1.101
# >>   libswresample   1.  2.101 /  1.  2.101
# >>   libpostproc    53.  3.100 / 53.  3.100
# >> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '410-stay-positive.mp4':
# >>   Metadata:
# >>     major_brand     : mp42
# >>     minor_version   : 0
# >>     compatible_brands: mp42mp41
# >>     creation_time   : 2016-04-22 15:10:53
# >>   Duration: 00:03:00.95, start: 0.000000, bitrate: 1117 kb/s
# >>     Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720, 923 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
# >>     Metadata:
# >>       creation_time   : 2016-04-22 15:10:53
# >>       handler_name    : Alias Data Handler
# >>       encoder         : AVC Coding
# >>     Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s (default)
# >>     Metadata:
# >>       creation_time   : 2016-04-22 15:10:53
# >>       handler_name    : Alias Data Handler

Another option is to use file descriptor numbers instead of their symbolic equivalents.

pid = Process.spawn "avprobe", "410-stay-positive.mp4", 2 => 1
Process.waitpid(pid)

I'm pointing this out because you might find it useful in certain obscure scenarios. In most cases, I think it's a lot more clear to stick to the symbols.

Here's a different command example. It's an invocation of the pygmentize utility to mark up some Ruby code as HTML.

pid = Process.spawn "pygmentize",  "-f", "html", "hello.rb"
Process.waitpid(pid)

# >> <div class="highlight"><pre><span class="nb">puts</span> <span class="s2">"hello, world"</span>
# >> </pre></div>

Let's say we want to capture the output into a file. We can do this really easily, by telling spawn to redirect the process' standard output stream into a file named out.html.

The use of a string argument is what cues spawn to interpret it as a file to open.

Then we can read back the file, and see that it contains the command output.

pid = Process.spawn "pygmentize",  "-f", "html", "hello.rb", :out => "out.html"
Process.waitpid(pid)
File.read("out.html")
# => "<div class=\"highlight\"><pre><span class=\"nb\">puts</span> <span class=\"s2\">"hello, world"</span>\n</pre></div>\n"

Maybe we want to have separate output files for both standard output and standard error, so that we capture a log of any problems.

We can do that by specifying separate redirects for :out and :err.

If we then try to mark up a missing file…

…we can find the error report in the log file.

pid = Process.spawn "pygmentize",  "-f", "html", "missing.rb",
                    :out => "out.html",
                    :err => "pyg.log"
Process.waitpid(pid)
File.read("pyg.log")
# => "Error: cannot read infile: [Errno 2] No such file or directory: 'missing.rb'\n"

As you might suspect by now, this works for the standard input stream as well.

Instead of supplying the input file as an argument, we can pipe it in by using spawn redirection.

pid = Process.spawn "pygmentize",  "-f", "html", "-l", "ruby",
                    :in => "hello.rb",
                    :out => "out.html",
                    :err => "pyg.log"
Process.waitpid(pid)
File.read("out.html")
# => "<div class=\"highlight\"><pre>puts "hello, world"\n</pre></div>\n"

So far we've seen how to redirect using file descriptors and filenames. But we can also use IO objects.

As a practical example, let's say that instead of dumping our pygmentize output into a file, we want to read it back into our program.

So we create an operating system pipe, using the IO.pipe method.

IO.pipe returns two IO objects: one representing the output side of the pipe, the other the input side.

We can tell spawn to redirect the program's output into the input end of the pipe.

Then after the command finishes, we can close the input end and read the output.

output, input = IO.pipe
output                          # => #<IO:fd 7>
input                           # => #<IO:fd 14>
pid = Process.spawn "pygmentize",  "-f", "html", "hello.rb", :out => input
Process.waitpid(pid)
input.close
output.read
# => "light\"><pre><span class=\"nb\">puts</span> <span class=\"s2\">"hello, world"</span>\n</pre></div>\n"

This is just a very basic example. Pipes and asynchronous processes also allow us to do sophisticated things like processing output in a streaming fashion, as it is produced. We might dig into some examples of this kind of asynchronous process architecture in a future episode.

As you might imagine, this ability to redirect into arbitrary IO objects means we can also redirect into already-opened files. Here's an example of opening a file and then redirecting into it.

file = open("output.html", "w")
pid = Process.spawn "pygmentize",  "-f", "html", "hello.rb", :out => file
Process.waitpid(pid)
file.close
File.read("output.html")
# => "<div class=\"highlight\"><pre><span class=\"nb\">puts</span> <span class=\"s2\">"hello, world"</span>\n</pre></div>\n"

We saw earlier that we can redirect one standard stream into another. But what if we wanted to pipe both standard out and standard error into a single output sink?

For instance, the curl command normally produces output to both the standard output and error streams.

To accommodate this need, spawn allows us to redirect more than one stream at a time into a single destination, by using an array as a key.

When we read the logfile back, we see a composite of both standard out and standard error output.

pid = Process.spawn "curl", "-v", "google.com", [:out, :err] => "command.log"
Process.waitpid(pid)
puts File.read("command.log")


# >> * Rebuilt URL to: google.com/
# >>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
# >>                                  Dload  Upload   Total   Spent    Left  Speed
# >> 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 64.233.176.102...
# >> * Connected to google.com (64.233.176.102) port 80 (#0)
# >> > GET / HTTP/1.1
# >> > Host: google.com
# >> > User-Agent: curl/7.47.0
# >> > Accept: */*
# >> >
# >> < HTTP/1.1 301 Moved Permanently
# >> < Location: http://www.google.com/
# >> < Content-Type: text/html; charset=UTF-8
# >> < Date: Tue, 17 May 2016 22:11:54 GMT
# >> < Expires: Thu, 16 Jun 2016 22:11:54 GMT
# >> < Cache-Control: public, max-age=2592000
# >> < Server: gws
# >> < Content-Length: 219
# >> < X-XSS-Protection: 1; mode=block
# >> < X-Frame-Options: SAMEORIGIN
# >> <
# >> { [219 bytes data]
# >> 
100   219  100   219    0     0   2665      0 --:--:-- --:--:-- --:--:--  2670
# >> * Connection #0 to host google.com left intact
# >> <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
# >> <TITLE>301 Moved</TITLE></HEAD><BODY>
# >> <H1>301 Moved</H1>
# >> The document has moved
# >> <A HREF="http://www.google.com/">here</A>.
# >> </BODY></HTML>

Oh, but things don't stop there. What if we want to control how a redirect file is opened?

To do that, we can specify an array of arguments as the value of the redirect. Here, we specify that command.log should be truncated and opened for writing.

Then, after the first command is done, the file should be re-opened in append mode, so as to retain the first command's output.

When we read the log back, we can see output from both commands.

pid = Process.spawn "curl", "-v", "google.com",
                    [:out, :err] => ["command.log", "w"]
Process.waitpid(pid)

pid = Process.spawn "curl", "-v", "bing.com",
                    [:out, :err] => ["command.log", "a"]
Process.waitpid(pid)

puts File.read("command.log")


# >> * Rebuilt URL to: google.com/
# >>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
# >>                                  Dload  Upload   Total   Spent    Left  Speed
# >> 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 216.58.218.14...
# >> * Connected to google.com (216.58.218.14) port 80 (#0)
# >> > GET / HTTP/1.1
# >> > Host: google.com
# >> > User-Agent: curl/7.47.0
# >> > Accept: */*
# >> >
# >> < HTTP/1.1 301 Moved Permanently
# >> < Location: http://www.google.com/
# >> < Content-Type: text/html; charset=UTF-8
# >> < Date: Tue, 17 May 2016 22:15:45 GMT
# >> < Expires: Thu, 16 Jun 2016 22:15:45 GMT
# >> < Cache-Control: public, max-age=2592000
# >> < Server: gws
# >> < Content-Length: 219
# >> < X-XSS-Protection: 1; mode=block
# >> < X-Frame-Options: SAMEORIGIN
# >> <
# >> { [219 bytes data]
# >> 
100   219  100   219    0     0   2667      0 --:--:-- --:--:-- --:--:--  2703
# >> * Connection #0 to host google.com left intact
# >> <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
# >> <TITLE>301 Moved</TITLE></HEAD><BODY>
# >> <H1>301 Moved</H1>
# >> The document has moved
# >> <A HREF="http://www.google.com/">here</A>.
# >> </BODY></HTML>
# >> * Rebuilt URL to: bing.com/
# >>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
# >>                                  Dload  Upload   Total   Spent    Left  Speed
# >> 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 204.79.197.200...
# >> * Connected to bing.com (204.79.197.200) port 80 (#0)
# >> > GET / HTTP/1.1
# >> > Host: bing.com
# >> > User-Agent: curl/7.47.0
# >> > Accept: */*
# >> >
# >> < HTTP/1.1 301 Moved Permanently
# >> < Location: http://www.bing.com/
# >> < Server: Microsoft-IIS/8.5
# >> < X-MSEdge-Ref: Ref A: 1A6FF1CDE0B04DDE859462A5F9E78A46 Ref B: 8AF6F9EDD91179D4827B37DC586CCE52 Ref C: Tue May 17 15:15:46 2016 PST
# >> < Date: Tue, 17 May 2016 22:15:45 GMT
# >> < Content-Length: 0
# >> <
# >> 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
# >> * Connection #0 to host bing.com left intact

It's important to understand that the arguments spawn accepts are actually common across a number of Ruby's subprocess-starting methods. So, to use an example we've already explored in the first part in this series, we can avoid all these waitpid calls by using system instead of Process.spawn.

system "curl", "-v", "google.com",
       [:out, :err] => ["command.log", "w"]

system "curl", "-v", "bing.com",
       [:out, :err] => ["command.log", "a"]

The options are still completely valid and respected, even though the method is different. Later on in this series, we'll see even more methods which accept the same kinds of redirection options.

But I think this is plenty for today. Happy hacking!

Responses