In Progress
Unit 1, Lesson 1
In Progress

Process Substitution

In this Bash-focused episode, we’ll explore a nifty trick for reading from subcommands as if they were files!

Video transcript & code

If you’ve spent much time in the Bash shell, there’s a good chance you’ve used command substitution at some point. Just as a refresher for how that works, let’s say I want to see how a command on my system is implemented.

First, I could use which to find out where the command lives in the filesystem.

which bundle

Then I might want to know whether this is a binary executable, or a script of some kind. For this I could use the file command.

file /usr/local/bin/bundle
/usr/local/bin/bundle: Ruby script, ASCII text executable

…except I wouldn’t type out the whole path like this, because there’s an easier way.

I would instead use backticks to perform command substitution of the which command.

file `which bundle`
/usr/local/bin/bundle: Ruby script, ASCII text executable

Bash takes the contents of the backticks, executes it as a separate command, and then interpolates the result into the command-line.

We can see this clearly if we insert an echo at the beginning.

echo file `which bundle`
file /usr/local/bin/bundle

Backticks are convenient for working at the command line.

For shell scripts, there’s a more formal and more visible version of command substitution syntax that uses a dollar sign and parentheses.

file $(which bundle)
/usr/local/bin/bundle: Ruby script, ASCII text executable

So that’s command substitution. Today, I want to show you a related Bash feature which is a little less well-known.

Let’s say we’re putting together an ebook out of Markdown source files.

tapas$ tree
|-- Dockerfile
|-- chapter01.assets
|   `-- IMG_20200423_165418.jpg
|-- chapter04.assets
|   `-- image-20200513095255753.png

The markdown files reference various images in assets directories.


One thing we want to watch out for is that these references don’t fall out of sync. We don’t want extra image files in the assets directories that we are no longer using. And we especially don’t want broken image links that point to a file that’s missing or moved.

A straightforward way to audit for these discrepencies is to get a list of image link paths and a list of asset paths, and then run a diff between the two lists.

We can get the list of link paths with some fancy sed scripting.

I’m not going to go over this command in detail, but it uses a regular expression to look for markdown image links

and spits out just the link paths.

$ sed -n 's/\!\[.*\](\(.*\))/\1/p' *.md | sort -u    

Let’s dump the output of this command into a file.

sed -n 's/\!\[.*\](\(.*\))/\1/p' *.md | sort -u > image_links.txt

Next we can use ls and file glob patterns to get the list of actual image assets.

tapas$ ls *.assets/*.{png,jpg} | sort

Let’s save that output to a file as well.

tapas$ ls *.assets/*.{png,jpg} | sort > image_files.txt

Finally, we can run diff on these two files to see if the two lists of assets match up.

Looks like we have a stray, unreferenced image file!

tapas$ diff image_links.txt image_files.txt 
> chapter01.assets/IMG_20200423_165418.jpg

This method of finding discrepancies requires two intermediate files.

tapas$ ls *.txt
image_files.txt  image_links.txt

After we’re done auditing, these files become so much out-of-date clutter in our project.

tapas$ rm *.txt

Before we automate this audit into our build pipeline, it would be nice to find a way that avoids leaving temporary files lying around.

And that’s where process substitution comes in.

Instead of passing the image link list file into our diff command, we can pass the command that generates the list, surrounded by a < and parentheses.

And instead of passing in the image file listing, we can pass the command that generates that list, again surrounded by lesser-than and parens.

The output is the same as before.

tapas$ diff <(sed -n 's/\!\[.*\](\(.*\))/\1/p' *.md | sort -u) <(ls *.assets/*.{png,jpg} | sort)
> chapter01.assets/IMG_20200423_165418.jpg

So what happened here? Well, running echo once again clarifies things. Sort of.

In place of the substituted commands, we can see two different files in the /dev/fd namespace. That’s interesting!

tapas$ echo diff <(sed -n 's/\!\[.*\](\(.*\))/\1/p' *.md | sort -u) <(ls *.assets/*.{png,jpg} | sort)
diff /dev/fd/63 /dev/fd/62

But if we try to cat one of these files, the OS says it doesn’t exist.

tapas$ cat /dev/fd/63
cat: /dev/fd/63: No such file or directory

What happened here is that Bash took our substituted commands, and executed them in a subshell where all their output was routed to temporary file descriptors. That file descriptor lasted as long as the outer command-line was executing, and then Bash cleaned up those temporary descriptors.

The upshot is that we effectively got to treat two subsidiary shell commands as if they were files, for the purpose of diffing.

This command is nice and self-contained. It doesn’t leave any messy temporary files lying around.

It’s a bit a long to be typing out by hand.

But it’s now a great candidate for inclusion in automations, such as in a Makefile.

    diff <(sed -n 's/\!\[.*\](\(.*\))/\1/p' *.md | sort -u) \
         <(ls *.assets/*.{png,jpg} | sort)

Process substitution isn’t useful as often as command substitution. But if you you find yourself wanting to treat a process as a file, it’s there when you need it. Happy hacking!