In Progress
Unit 1, Lesson 1
In Progress

Special Variables Part 3: Regex Matching

As you learned in the last episode in this series, Ruby’s “Perl-style” special variables are purpose-built for use in one-off command-line invocations of Ruby. Some of the most common uses for Ruby at the command line involve regular expression matching, and in this episode you’ll see how Ruby’s regex-specific variables can come in handy. After which, you’ll learn some guidelines for when, and when not, to use these concise-but-cryptic variables.

Video transcript & code

In the last two episodes in this series, we learned about two different broad categories of special variables in Ruby: those for global process and error state, and those for controlling I/O. There's one other category of special variables in Ruby, and they have to do with regular expression matching.

Here's a regex pattern for matching a telephone number, and here's some text to match it against.

After we perform a regular expression comparison…

…we can find the value of each match group in the match data assigned to the numbered special variables $1, $2, $3, and so on.

We can also see the entire match in the $& variable.

To get at the matchdata object, we use $~.

And to see the text that came before and after the match, we use the single quote and backquote.

pattern = /\((\d{3})\) (\d{3})-(\d{4})/
text    = "Call me! (555) 867-5309 (Jenny)"

pattern =~ text

$1                              # => "555"
$2                              # => "867"
$3                              # => "5309"
$&                              # => "(555) 867-5309"
$~
# => #<MatchData "(555) 867-5309" 1:"555" 2:"867" 3:"5309">
$`                              # => "Call me! "
$'                              # => " (Jenny)"

Again, while these variables are sometimes put to use in regular Ruby programs, they are particularly handy when writing short inline scripts on the command line. For instance, here's a command that filters out email addresses for privacy. This command makes use of a the Kernel#sub method. This is another of the methods which implicitly looks at and updates the $_ last-read-line variable.

When we pipe our list into the command, we can see that the email addresses are hidden.

ruby -p -a -e \
"sub(/\b[^,@]+@[^,@]+\b/, 'REDACTED')" < list.csv
Stephan D'Amore,REDACTED
Katherine Cremin Jr.,REDACTED
Theron Ortiz,REDACTED
Neal Yundt,REDACTED
Laura Herzog,REDACTED
Mr. Adela Rice,REDACTED
Simeon Pollich,REDACTED
Daphnee Kling,REDACTED
Ms. Mike Klein,REDACTED
Devyn Bernhard V,REDACTED

What if, while redacting the input file, we also wanted to capture a list of all the email addresses that had been filtered into a separate file? We'll once again use an all-caps BEGIN block, this time to open the email list file. Then, after performing the filtering, we'll append the last matched text, via the $& variable, to the list of emails.

$ ruby -p -a -e "BEGIN { emails = open('emails.txt', 'w') };
                 sub(/\b[^,@]+@[^,@]+\b/, 'REDACTED');
                 emails.puts($&)" < list.csv
Stephan D'Amore,REDACTED
Katherine Cremin Jr.,REDACTED
Theron Ortiz,REDACTED
Neal Yundt,REDACTED
Laura Herzog,REDACTED
Mr. Adela Rice,REDACTED
Simeon Pollich,REDACTED
Daphnee Kling,REDACTED
Ms. Mike Klein,REDACTED
Devyn Bernhard V,REDACTED

After running this, we can dump the email list and see that it has indeed been populated.

cat emails.txt
christelle.heaney@example.com
maryjane.buckridge@example.org
paul@example.org
bella.bahringer@example.org
christian.dach@example.net
carolyn@example.com
rosamond_fadel@example.com
brady@example.org
modesto_bradtke@example.org
garett.kozey@example.com

Notice that in this command we didn't have to perform an explicit regex match to get access to the last-match data. The fact that sub performs regex matching internally was enough to populate the last-match variables.

As I've mentioned a couple of times, I'm only showing a sampling of special variables here. The best documentation I know of for the full list of special variables in the book The Ruby Programming Language by Matz and David Flanagan. If you just want to see a list of all the special variables, you can filter Ruby's global variable list for them.

global_variables.select{|v| v.to_s.length == 2}
# => [:$;,
#     :$@,
#     :$!,
#     :$~,
#     :$&,
#     :$`,
#     :$',
#     :$+,
#     :$=,
#     :$,,
#     :$/,
#     :$\,
#     :$_,
#     :$>,
#     :$<,
#     :$.,
#     :$*,
#     :$:,
#     :$",
#     :$?,
#     :$$,
#     :$0,
#     :$1,
#     :$2,
#     :$3,
#     :$4,
#     :$5,
#     :$6,
#     :$7,
#     :$8,
#     :$9]

OK, now that we've taken a tour of Ruby's Perl-style special variables, let's talk about why we should care, and when we should use them.

First off, it's important to be aware of these variables simply because at some point you will run into them in code you read. And it will be helpful to know what they mean or at least know that you're looking at one of Ruby's special variables.

When should you use them? Well, the easy answer is "never, except for in one-off command-line scripts". They have a tendency to make code unreadable.

But, really, it's a question of context. For instance, some of the special variables, like $? and $$, are familiar to anyone who has done a lot of shell scripting. If you're writing code in the context of an organization where there is lots of institutional understanding of shell scripts, it may be fine to use these.

`ruby hello.rb`
$?                              # => #<Process::Status: pid 8416 exit 0>
$$                              # => 11712

And as we saw in Episode #490, it is sometimes essential to access the $! error info variable.

begin
  fail
rescue
  $!
  # => RuntimeError
end

As we'll see in the last episode of this series, there is actually a way to access all of these variables without sacrificing readability. But depending on the kind of code you're working on and the experience of your team, you may find that there are a few of these special variables which are so widely recognized that they don't need to be flagged in code reviews. It's probably a good idea to codify which special variables are acceptable in your team's coding standard.

In the next and final episode of this series, we'll meet a library which can clear up the confusion created by over-using these short, mnemonic variables. Until then, happy hacking!

Responses