In Progress
Unit 1, Lesson 21
In Progress

You Could Have Sed It in Ruby

Sometimes I feel like I ought to be more skilled with the classic UNIX “small, sharp tools”. But the truth is, when I’m working on a Ruby project, I already have a tool at hand that replaces almost all of those tools. Let’s explore how using Ruby at the command-line can give you the best of UNIX-style tools, while building on your existing language knowledge!

Video transcript & code

Say we've got a directory full of markdown files,

like this one.

This is just an example file to contain some more image references.

Like this one:


And this one!


Some of these files contain references to embedded images. Let's say we want to get a list of all the image references from every file.

This is a text-processing problem. The traditional way to tackle a problem like this from the command-line is to use a specialized tool for it.

In this case, a sed command works well to get the list we want.

# sed -n 's/\!\[.*\](\(.*\))/\1/p' *.md

If you're not familiar with it, sed is short for "Stream EDitor", and it's one of the classic UNIX command-line tools.

I'm not going to do a sed tutorial today. But in a nutshell, what this command says is:

look for a regular expression pattern on every line,

replace the line with part of the regular expression match,

and then print the resulting text.

-n says, don't print all the other lines.

sed is an example of what's often called "The Unix Way": having lots of small, sharp tools that each do one job well. Sed is great for munging or extracting data from lines of text. UNIX has lots of these little tools. sed and awk and sort and uniq and cat and head and tail and m4 and tr and so on and so forth.

Individually, they can be very useful for little tasks like the one we're tackling today. But if you try to build larger automations with them, tying them together with Bash scripts and Makefiles, it can be a bit like jamming your hand deep into a kitchen drawer full of knives and skewers and cheese graters. You can cut yourself on all these sharp little tools!

You have to keep track of all their little incompatibilities: how you escape strings in this tool? How do you interpolate variables into strings? Which values are considered "truthy"? What regular expression syntax do they support?

And there's no way to pass around complex data structures between them. Every time you move data from one program to another, you have to reduce it to text in a form that the consumer program must then parse.

That's why back in the late 1980s, Larry Wall created the Perl programming language. Perl rolled shell scripting and all these other little tools together into one giant swiss-army chainsaw. And no matter what you might think or have heard about Perl, in many ways it really WAS a huge improvement, not having to duct-tape a bunch of idiosyncratic UNIX tools together in order to automate tasks.

Why are we talking about Perl? Well, back before Ruby was a go-to language for web programming, it was a "glue language" for system automations, inspired directly by Perl. A huge amount of Ruby's syntax and built-in functionality is copied from Perl.

And that includes the ability to use Ruby one-liners to replace shell commands that would otherwise use tools like sed or awk.

Let's see if we can replace this sed command with a Ruby one-liner.

We'll start with ruby -n.

ruby -n

This is a common way to start Ruby one-liners. It tells Ruby to automatically loop over every line of input. Any code we tell Ruby to execute will be run inside this implicit input loop.

We can see this if add an e flag to tell Ruby to execute some code. Our code will output some stars followed by the contents of the special $_ variable.

We'll apply this to every Markdown file in the directory.

When we run this, we see that it reads in every file and then outputs the lines with the prefix we added.

ruby -ne 'puts "*** " + $_' *.md
*** This is just an example file to contain some more image references.
*** Like this one:
*** ![ruby](chapter05.assets/ruby-1595526744861.png)
*** And this one!
*** ![pry](chapter05.assets/pry.png)

What this illustrates is that with -n , Ruby reads every line of input and puts it into the $_ variable. This is a feature that comes straight from Perl. It's technically known as the "last read line" variable, and I'm going to call it that from here on out.

We don't actually want to print out every line. We want to find image file references.

Let's print the last read line IF the line matches a regular expression.

Let's see, that's a literal bang, followed by some text in square braces, followed by some more text in parentheses...

Run it, and there are our image links!

$ ruby -ne 'puts $_ if $_ =~ /!\[.*\]\(.*\)/' *.md

We don't actually have to type this much, though. When we evaluate code from the command-line with the -n flag, Ruby lets us take a bunch of shortcuts. Many of which don't exist in regular Ruby programs.

One of these shortcuts is that when an if statement has a condition that's a regular expression by itself, Ruby assumes we meant to match that regex against the last-read line.

$ ruby -ne 'puts $_ if /!\[.*\]\(.*\)/' *.md

This is another of those Perl-inspired features we were talking about. In regular Ruby coding this would be confusing and counter-intuitive, which is why it's normally turned off. But in one-liners, it saves typing for a common use-case.

We're part of the way there. We don't want to print out entire lines. We want to output the bare image filenames.

We can capture the filenames with a regex group, and then print just that group.

$ ruby -ne 'puts $1 if /!\[.*\]\((.*)\)/' *.md

This is taking advantage of the fact that Ruby assigns a bunch of pseudo-global variables every time it does a regular expression match. Including numbered variables for each captured group. This is yet another feature Ruby inherits from Perl, for easier command-line scripting.

At this point, we've successfully replicated our sed command, in Ruby.

sed -n 's/\!\[.*\](\(.*\))/\1/p' *.md
ruby -ne 'puts $1 if /!\[.*\]\((.*)\)/' *.md

Is it as concise to type as the sed version? Not quite. But it's very close! And I'd argue that it's a more readable than the sed version, with only a minimal increase in size.

But more important than the readability, is the fact that when we use Ruby for one-liners like this, we get to leverage all of our existing Ruby muscle-memory. We know that puts prints. We know how to use if as a statement modifier. And my favorite part: we get to re-use our knowledge of Ruby regular expressions. We don't have to go looking up the specific syntactical idiosyncrasies of sed regular expressions.

Like, this idiosyncrasy that in sed, you use escaped parentheses to group, and unescaped parens to match. In Ruby it's the opposite.

Keeping track of that distinction is friction to our work process.

Using Ruby also means we can take advantage of Ruby extensions to regular expressions.

For instance, if we decide to make this command just a little bit more self-documenting, we can give the match group for markdown image references the identifier ref, and then reference that capture group by name to print it out.

ruby -ne 'puts $~[:ref] if /!\[.*\]\((?<ref>.*)\)/' *.md

($~ is another of the special variables Ruby assigns to when it matches a regular expression.)

The moral of this story is that if you ever feel like you "ought" to get more comfortable with the classic UNIX text-processing tools such as sed and awk, consider instead investing in Ruby one-liner skills. You can do all the same command-line tricks you could do with those tools, while building on your established base of Ruby knowledge. Happy hacking!