In Progress
Unit 1, Lesson 1
In Progress

ARGF, That’s Some AWK-ward Ruby!

Diving deeper into the world of Ruby one-liners, today we’ll learn about more special variables, querying the input stream, and… ghoooooost loooooops! 👻

Video transcript & code

So we have this directory full of markdown files.

ls *.md

Recently we used a Ruby one-liner to search out and find all the image references.

$ ruby -ne 'puts $1 if /!\[.*\]\((.*)\)/' *.md
chapter04.assets/image-20200513095255753.png
chapter05.assets/ruby-1595526744861.png
chapter05.assets/pry.png

This command makes use of the -n flag.

ruby -n

This flag tells Ruby to put a "ghostly loop" around whatever code we evaluate at the command line, where every line of input is placed in the $_ variable, AKA the "last-read line" variable. Any code from -e flags gets implicitly inserted inside this ghost loop.

while $_ = ARGF.gets
  # --- code from -e flags goes here ---
  puts $1 if /!\[.*\]\((.*)\)/
  # --- end code from -e flags ---
end

Let's take this image reference report a step further: let's add the filenames and line numbers where each image reference is found!

But let's work up to this goal slowly. We'll start by stepping back and just printing the entire contents of the files with their filenames and line numbers. Before that, let's start even more simply: by outputting the contents of the files!

ruby -ne 'puts $_' *.md

At this point we've re-created the UNIX cat command.

Now let's start to build up from here, and add numbering for each line.

First we'll convert the puts to a printf so we have access to advanced number formatting.

Our string format will specify a right-aligned zero-padded three-digit number, followed by a space and then a string.

For the line number, we'll start out with a placeholder.

And followed by the original last-read line.

Now we have every line prefixed with our placeholder!

ruby -ne 'printf "%03d %s", 42, $_' *.md

As a next tiny step, let's use a dynamic value instead of a fixed placeholder.

Let's insert the number of characters in the current line.

ruby -ne 'printf "%03d %s", $_.size, $_' *.md

Now we move on to an actual line count.

For this, we need to initialize a counter variable to zero. This is a little tricky! Remember, everything we evaluate at the command-line is being executed once for every line of input. So how do we initialize a counter to zero just once at the beginning?

One way is to use the ||= operator. The first time around the variable's value will be nil because it doesn't exist. After that the ||= will have no effect.

Then in the printf slot for the line number, we use the value of the counter, in-place incremented by one.

ruby -ne 'c ||= 0; printf "%03d %s", c += 1, $_' *.md

This gives us line numbers!

We used kind of a sneaky trick to avoid constantly re-setting the counter to zero.

But there's another, more idiomatic way to do this. We can add the keyword BEGIN in all-caps, followed by curly braces, followed by a semicolon to make it a separate statement.

Inside, we can set the counter to zero, without an "or".

ruby -ne 'BEGIN{ c = 0 }; printf "%03d %s", c += 1, $_' *.md

This odd-looking construction, which you'll probably never see in Ruby outside of the command-line, is inherited from the Perl programming language. Which in turn got it from the AWK language.

Remember the "ghostly loop" that Ruby puts around our code when we use the -n flag?

while $_ = ARGF.gets
  BEGIN{ c = 0 }; printf "%03d %s", c += 1, $_
end

The function of the BEGIN block is to effectively "pull" some code outside of that loop.

c = 0
while $_ = ARGF.gets
  printf "%03d %s", c += 1, $_
end

That way we can use the -n mode but still have code that's executed once.

When we run this, it gives us line numbers, just as before.

So, something you might have noticed: the line numbers we're printing are for the concatenation of all of the files being output. They don't reset when we switch to a new input file.

If this was really what we wanted, it turns out there's a more concise way to get it.

Ruby has a special $. variable that provides precisely this running linecount.

ruby -ne 'printf "%03d %s", $., $_' *.md

This is what we want if the input is one file. But with multiple files, we want the line number relative to the source file of a given line. And the name of that file.

I promise we'll get there! First, let's talk about filenames. How can we print out the source file name for each line?

Well, let's reference back to that "ghost loop" again.

This loop iterates over the special ARGF constant. ARGF is a file-like object representing the concatenation of all input provided as either filenames at the command line, or from STDIN.

We can also reference ARGF directly in our command-line code.

Let's use ARGF to incorporate the current input filename into our output! We'll add a new 12-character-wide slot in our printf and reference ARGF.filename.

ruby -ne 'printf "%12s:%03d %s", ARGF.filename, $., $_' *.md

Perfect! Now each line of output is prefixed with its source filename.

Hmmm... does this mean we can do the same for line numbers? Let's try the lineno method on ARGF!

ruby -ne 'printf "%12s:%03d %s", ARGF.filename, ARGF.lineno, $_' *.md

This runs, but unfortunately, it gives us the exact same numbering as before. So what happened here?

As it turns out, ARGF.lineno provides a running linecount for the entire ARGF input stream.

However, ARGF also provides a way to get at the file object it's currently iterating on.

And this object has its own lineno attribute!

When we execute this, we get the line number relative to the current file.

ruby -ne 'printf "%12s:%03d %s", ARGF.filename, ARGF.file.lineno, $_' *.md

Now we have all the information we need! Let's add the if-statement from our original image reference search.

And print just the image reference rather than the whole line.

ruby -ne 'printf "%12s:%03d %s", ARGF.filename, ARGF.file.lineno, $1 if /!\[.*\]\((.*)\)/' *.md

We run this and... whoops. We've got a slight formatting problem here.

Let's add an explicit newline to the printf format and try again.

ruby -ne 'printf "%12s:%03d %s\n", ARGF.filename, ARGF.file.lineno, $1 if /!\[.*\]\((.*)\)/' *.md
chapter04.md:023 chapter04.assets/image-20200513095255753.png
chapter05.md:005 chapter05.assets/ruby-1595526744861.png
chapter05.md:009 chapter05.assets/pry.png

Bingo! Filename, line number, and image reference. All from a one-liner! Cool, huh?

Today we've learned more about the "ghost loop" that Ruby puts around scripts when run in -n mode. We've seen how BEGIN{ ... } blocks can lift initialization code out of that loop. We've discovered the $. variable for tracking the current line number of input. And we've seen how we can query the ARGF object to find out even more about our current input context.

We may never need awk or sed or a complicated grep again!

Happy hacking!

Responses