In Progress
Unit 1, Lesson 1
In Progress

Autosplit

In our series on one-liners, we’ve seen Ruby’s tools for filtering and summarizing text right from the command-line. But what about columnar input, like process listings or CSV? Today we’ll see how “autosplit mode” makes short work of data with fields!

Video transcript & code

Hm, I wonder what’s using up all my memory.

ps aux

Oh, look at all those processes.

One of these columns is about memory. How can I see only the processes using the most memory?

I think ps has some flags for sorting and filtering…

man ps

but uuuuugh who has time.

You know what I already know how use to sort and filter things? Ruby.

I know that I can pipe the output of ps into a Ruby one-liner, with code in a -e flag.

I know that I can get this “ghost loop”, which splits every line of input into the $_ variable,

while $_ = ARGF.gets
  # ...code from -e flags...
end

by using the -n flag.

In the -e code I’ll split each line into fields. Ruby’s String#split defaults to splitting on whitespace, so that should work fine.

Let’s see, uh, the memory column is called RSS for “Resident Set Size.” That’s column number 0… 1… 2… 3… 4… 5.

We’ll print the current line but only if the RSS is greater than 64k.

ps aux | ruby -ne 'fields = $_.split; print if fields[5].to_i > 2**16'

Perfect.

But wait, there’s more.

If we add the -a flag, Ruby will do some of this for us.

a stands for “autosplit”. Just as -n causes each input line to be assigned to the $_ variable, autosplit mode causes each line to be split into the special $F variable.

while $_ = ARGF.gets
  $F = $_.split
  # ...code from -e flags...
end

F stands for “fields”.

We can get a concrete picture of this if we use p to do an inspected print of the fields for each line of input.

ps aux | ruby -nae 'p $F'

See how each line has been converted to a whitespace-delimited array?

So using autosplit, we can make an even more concise command to filter for memory hogs.

ps aux | ruby -nae 'print if $F[5].to_i > 2**16'

ah-ha! It turns out that Slack is using all my memory. Well, no surprise there.

Now, I hear what you’re wondering. You’re about to ask:

“Hey Avdi! Can we use this feature to work with CSV data from the command line?”

Why yes, convenient question-asker! We can!

Let’s say we want to get a total of new COVID-19 cases from this daily snapshot.

cat snapshot.csv

Let’s see what happens when we run Ruby with autosplit over this data.

ruby -nae 'p $F' snapshot.csv

Well there’s an obvious problem here. Autosplit divides columns on whitespace. But these rows are comma-delimited.

Fortunately, we can customize how autosplit breaks up lines.

We use the -F flag with an argument to tell Ruby what delimiter to look for.

Let’s specify a comma.

ruby -F ',' -nae 'p $F' snapshot.csv

This looks a lot better. But if we look closely we can still see a small issue at the ends of lines: the last field contains the carriage-return linefeed from the end of each line.

Not to worry. The argument to -F is actually interpreted as a regular expression!

So we can say the delimiter is a comma or a carriage-return linefeed sequence.

We supply single-quotes here so that the shell doesn’t try to interpret these characters, but note that the argument still needs to be right up against the -F flag with no spaces between them.

ruby -F ',|\r\n' -nae 'p $F' snapshot.csv

That looks right.

Now that we’ve got our field parsing sorted out, we can write some actual logic in our one-liner.

We can prep a variable to track the total.

Add the appropriate field from each line to it.

And finish by outputting the total.

ruby -F',|\r\n' -nae 'BEGIN{total=0}; total+=$F[4].to_i; END{puts total}' snapshot.csv

And there’s our total, which we calculated without even enlisting the help of the csv standard library.

If you’re an old UNIX hand, you might be reminded of the AWK tool right now. And that’s not a coincidence—this functionality is inspired by AWK, by way of Ruby’s Perl lineage.

So today we’ve seen how the -a flag turns on “autosplit” mode for Ruby one-liners, giving us access to delimited fields in an array called $F. And how with -F we can customize the rule for splitting lines into fields. It’s just another convenience Ruby offers for quick, one-off data manipulations from the command-line. Happy hacking!

Responses