In Progress
Unit 1, Lesson 21
In Progress

Pattern Matching

Video transcript & code

As we have seen in episode 264, Ruby has some fairly sophisticated data destructuring capabilities, at least for a dynamic object-oriented language. In that episode, we saw how we could take apart and bind the different parts of a dependency specification in a single statement. The parenthesized expression on the left of the assignment mimics the shape of the data, and Ruby takes apart the data structure and assigns the matching parts of it accordingly.

dep = {"hello" => ["hello.c", "foo.o", "bar.o"]}

a = *dep                        # => [["hello", ["hello.c", "foo.o", "bar.o"]]]

((target, (first_preq, *rest_preqs))) = *dep

target                          # => "hello"
first_preq                      # => "hello.c"
rest_preqs                      # => ["foo.o", "bar.o"]

We have also seen, in various episodes, that Ruby has special tools for matching arbitrary objects against a pattern. The "case equality" operator, or "threequals", lets us apply all kinds of tests to an object. We can test its class, whether it matches a regular expression, whether it is within a given range, and so on.

obj = 23

Integer === obj                 # => true
/foo/   === obj                 # => false
(0...100) === obj               # => true

If you have used any functional programming languages with pattern-matching capabilities, such as Haskell or Elixir, you know that they combine the features of matching and destructuring assignment such that they can both be performed at once. Once you've used a language in the pattern-matching family, you might miss having it in Ruby. I know I do, so I thought it might be fun today to look at how we might add this capability to Ruby.

We will start by defining a placeholder object. A placeholder is a simple creature: it has a context and a name. It also has a custom equivalence operator. This operator behaves a bit peculiarly: it takes the value it is supposed to be matching against, and instead assigns it as the value of this placeholder's name inside the context—which it assumes is a hash-like object. Then it simply returns true, regardless of what the value was. In effect, this is a wildcard object like we saw in episode #215, except it also "captures" values as a side effect.

Next we'll define a MatchContext. It descends from BasicObject, so as to have a minimal set of methods defined. It has an internal hash, called bindings. This hash is specialized: when someone asks it for a key it doesn't have, it will return a placeholder for that key instead.

The class also has a method_missing which simply takes any message sent to the object, and looks up the method's name as a key in the bindings hash.

Placeholder = Struct.new(:bindings, :name) do
  def ==(other)
    bindings[name] = other
    true
  end
end

class MatchContext < BasicObject
  def initialize
    @bindings = ::Hash.new { |hash, key| ::Placeholder.new(hash, key) }
  end

  def method_missing(name, *)
    @bindings[name]
  end
end

Let's play around with these classes a little bit. We'll create a new MatchContext. Then we'll send it some random messages. Each time, it returns a placeholder named for that message.

If we try to match one of these placeholders to an arbitrary value, it succeeds. We are able to do this with the case-equality operator even though we didn't explicitly define it, because case-equality delegates to the double-equals equivalence operator by default.

If we then send the same message as before, we no longer get a placeholder. Instead, we get the value that was "captured" by performing an equality test.

require "./pmatch"

m = MatchContext.new

m.foo                           # => #<struct Placeholder bindings={}, name=:foo>
m.bar                           # => #<struct Placeholder bindings={}, name=:bar>

m.foo === 23                    # => true
m.foo                           # => 23

Let's put these classes to work to do some very basic pattern matching. We'll define an old favorite, a Point struct. We'll instantiate a Point object. Then we'll use our MatchContext and do a pattern match against a Point with placeholder values.

The result of the case equality test is true. And when we examine the placeholders, we can see that they are now bound to the X and Y values of the Point we matched on. In other words, we have successfully checked that that an object is a Point and bound its x and y values to variables, all in one go. Well, OK, not actually to variables, per se, but to something close enough.

require "./pmatch"

Point = Struct.new(:x, :y)

p = Point.new(5, 10)
m = MatchContext.new

Point.new(m.x, m.y) === p       # => true

m.x                             # => 5
m.y                             # => 10

How did this happen? Well, Struct derived objects implement threequals in terms of equivalence. And by default, the equivalence test for a Struct is whether the two objects are the same type and whether their attributes are also equivalent. So comparing one Point to another implicitly delegates to the equivalence operators for the x and y attributes.

So far, our placeholders match anything at all. But we'd like the option to be a little more discerning in our matches. For instance, we'd like to be able to assert that the X and Y values of a Point must be integers (and not nil) for the match to succeed.

To make this possible, we need to flesh out the Placeholder class a little bit. We add a new attribute called guards, which defaults to an empty array. And we overload a method for adding new guards, somewhat arbitrarily choosing the right-shift operator for this purpose. In this method we add the right operand to the guards list, and return self to make further chaining possible.

We then add a guard clause to the definition of the equivalence operator. It will perform case-equality matches of all of the guards against the supplied value, and return false if any of those matches fail.

Placeholder = Struct.new(:bindings, :name) do
  def ==(other)
    return false unless guards.all?{ |g| g === other }
    bindings[name] = other
    true
  end

  def guards
    @guards ||= []
  end

  def >>(guard)
    guards << guard
    self
  end
end

class MatchContext < BasicObject
  def initialize
    @bindings = ::Hash.new { |hash, key| ::Placeholder.new(hash, key) }
  end

  def method_missing(name, *)
    @bindings[name]
  end
end

Now when we match using placeholders, we can annotate the placeholders with extra patterns that the corresponding value must match in order to succeed. In this case, we specify that both attributes of a point must be integers in order to match. This succeeds with the point we've been using. But when we change one of the coordinates to nil, the match no longer returns true.

require "./pmatch2"

Point = Struct.new(:x, :y)

p = Point.new(5, 10)
m = MatchContext.new

Point.new(m.x >> Integer, m.y >> Integer) === p       # => true

m.x                             # => 5
m.y                             # => 10

p = Point.new(5, nil)
m = MatchContext.new

Point.new(m.x >> Integer, m.y >> Integer) === p # => false

Now let's use our placeholders on something slightly more practical. Let's say we have a method, get_account_balance. This method may fail, which we'll simulate in this example by giving it an explicit fail argument. If it succeeds, it returns an account balance as a string. If it fails, it returns an array of two elements: first, a symbol indicating that this is an error. And second, a string explaining the problem. Using different return types to indicate success or failure is a common style in pattern-matching programming languages.

We then open a case statement, with the return value of a call to get_account_balance as the object to switch on. For the first case, we specify a placeholder that is constrained to be a String. In that branch, we print out the account balance. For the next case, we specify a pattern which will match an error return. For the second element in the array, we use another placeholder to capture the error explanation.

If we execute this code, we can see that the case statement matches the success return value, and binds the account balance in the process. If we change the method call to force a failure and run the code again, it matches the error case this time. This time it binds the error info to a pseudo-variable that can be used in the error handling code.

As with the struct, this works because Ruby arrays implement case-equality as an alias for equivalence, and determine equivalence by going over the array members one by one and asking them if they are equivalent to their counterpart in the other array.

require "./pmatch2"

def get_account_balance(fail: false)
  if fail
    [:error, "I literally can't even"]
  else
    "$1234.56"
  end
end

m = MatchContext.new

case get_account_balance(fail: true)
when m.balance >> String
  puts "Balance: #{m.balance}"
when [:error, m.info]
  puts "Error: #{m.info}"
end

# >> Error: I literally can't even

This example really shows off the power of pattern-matching: whereas in normal ruby code we would have had to separately extract the values we were interested in after a case match, here we are able to do a case match and assign variables for later use all at the same time.

And that's plenty for today. Happy hacking!

Responses