In Progress
Unit 1, Lesson 21
In Progress

Match

Video transcript & code

Here's a regular expression that matches local phone numbers in a common US format. We can use it to match against a string that contains a phone number. Then we can output the found phone number.

r = /\d{3}-\d{4}/
match = r.match("my number is 555-1234")
puts "Found #{match}"
# >> Found 555-1234

Of course, it's always possible we might try to match against a string that doesn't contain a recognizable phone number. When a match is made, the #match method returns a MatchData object. If the regular expression isn't found, it returns nil.

r = /\d{3}-\d{4}/
r.match("my number is 555-1234") # => #<MatchData "555-1234">
r.match("no number")             # => nil

Since nil is "falsy" we can exploit this fact and insert a conditional. This if statement ensures we'll only try to use the match data if a match was found.

r = /\d{3}-\d{4}/
match = r.match("my number is 555-1234")
if match
  puts "Found #{match}"
end
# >> Found 555-1234

This code illustrates an extremely common idiom in Ruby code: first determine if a match for a pattern was found. Then, it it was, do something with the resulting data. In effect the return value of #match plays double duty: first as a flag indicating success or failure, and then as a holder of data.

This pattern is common enough that I find the code as we've typed it here to be unnecessarily awkward. I much prefer to combine the test for a match and the subsequent use of the match into a single statement. We can accomplish this by inlining the invocation of #match into the if statement's test.

r = /\d{3}-\d{4}/
if match = r.match("my number is 555-1234")
  puts "Found #{match}"
end
# >> Found 555-1234

The only downside to this is that many programmers, and some automated code linting tools, will automatically flag it as a possible bug. The reason being that it's a common typo to use a single equals in an if statement, when we intended to use a double-equals to test for equality.

To mitigate this objection to the code I sometimes enclose the whole match assignment expression in parentheses. This doesn't alter the behavior of the code at all. It's just a way of visually differentiating code where an assignment is intentionally performed inside an if-test clause. It's a way of saying "yes, I meant to do this".

r = /\d{3}-\d{4}/
if (match = r.match("my number is 555-1234"))
  puts "Found #{match}"
end
# >> Found 555-1234

However, just the other day I discovered an alternative to this idiom that I'm pretty excited about. It turns out that #match accepts an optional block. The block will be executed only if the match is successful. When it is, it will receive the match data as a block argument.

We can use this block to remove the need for an if statement entirely.

r = /\d{3}-\d{4}/
r.match("my number is 555-1234") do |match|
  puts "Found #{match}"
end
# >> Found 555-1234

This isn't a complete replacement for an if statement though. Consider the case where we also want to take a specific action when the match fails. If we get clever, we might think to make use of the or control flow operator. We tack on an or and a clause raising an exception. We expect this to work, based on the fact that the call to #match will return nil on failure. And indeed, at first this appears to work.

r = /\d{3}-\d{4}/
r.match("my number is 1234") do |match|
  puts "Found #{match}"
end or fail "No number found"
# ~> -:7:in `<main>': No number found (RuntimeError)

But when we test the code with a string that contains a valid phone number we see something weird: Both the success and failure actions are triggered.

r = /\d{3}-\d{4}/
r.match("my number is 555-1234") do |match|
  puts "Found #{match}"
end or fail "No number found"
# ~> -:4:in `<main>': No number found (RuntimeError)
# >> Found 555-1234

Why is this? The answer is that when we pass a block to #match and the regular expression match is successful, #match doesn't return the MatchData object. Instead, it returns whatever the return value of the block was.

r = /\d{3}-\d{4}/
r.match("my number is 555-1234") { 42 } # => 42

In the case of our phone-number match block, we called puts. puts always returns nil. The nil became the return value of the block, and was passed through to become the return value of #match. This then triggered the right-hand side of the or operator.

So in the case where we need to take action either for the success or failure branches, we should stick to a traditional if statement.

r = /\d{3}-\d{4}/
if match = r.match("my number is 1234")
  puts "Found #{match}"
else
  fail "No number found"
end
# ~> -:5:in `<main>': No number found (RuntimeError)

This begs the question: why does #match pass through the block return value, anyway?

Here's why. Very often, when we match against the regular expression the very next thing we do is to extract some specific piece of information about the matching text. For instance, consider a situation in which all we really are interested in is the exchange portion of matched telephone numbers. The exchange is represented by the first three digits of the phone number.

To isolate the exchange, we add a capture group to our regular expression. Then we use [1] on the match data to pull out that group.

r = /(\d{3})-\d{4}/
match = r.match("my number is 555-1234")
exchange = match[1]
exchange                        # => "555"

Of course, if there is no match, we'll get a no method error as we try to subscript index a nil value.

r = /(\d{3})-\d{4}/
match = r.match("my number is 1234")
exchange = match[1]
exchange                        # =>
# ~> -:3:in `<main>': undefined method `[]' for nil:NilClass (NoMethodError)

To avoid this, we add an if statement to only do the subscripting if a match is found. Then we assign the result of the whole if statement to the exchange variable. If there is no match, there will be no exception and the exchange variable will be set to nil.

r = /(\d{3})-\d{4}/
exchange = if match = r.match("my number is 1234")
             match[1]
           end
exchange                        # => nil

We can accomplish the same thing more concisely using a block passed to #match. If there is a match, the block will be executed and the resulting value assigned to the exchange variable. If there is no match, the block will be ignored and nil will be assigned.

r = /(\d{3})-\d{4}/
exchange = r.match("my number is 555-1234") { |match| match[1] }
exchange                        # => "555"

And there you have it: the regular expression #match method is just one more example of how blocks are used by Ruby core classes to make common operations easy. Happy hacking!

Responses