In Progress
Unit 1, Lesson 1
In Progress

Pattern Matching

One of the things I enjoy most in functional programming languages is the rich support for pattern matching. While Ruby has long had some rudimentary pattern matching and deconstruction capabilities, it hasn’t come close to the kind of deep structural matching and data extraction available in languages like Haskell or Elixir.

As of Ruby 2.7, however, all that is starting to change. In today’s episode, guest chef Jordan Raine will show you a practical application of Ruby’s new pattern-matching case statements. Enjoy!

Video transcript & code

Pattern matching in Ruby

Today, we're going to look at a new, experimental feature in Ruby 2.7 called Pattern Matching. To do that, we'll be writing some code to sort a catalog of books by author.

It sounds easy enough but, unfortunately, because the catalog has been added to slowly over many years, the format of each book is different.

At first, it was text-only.

Then it was CSV.

Then it was a hash.

But, of course, no one anticipated books with two authors...

...or the need to store data about the author...

...or authors with multiple names...

...or textbooks with an editor and many contributors. So, the shape of the data kept changing over time.


books = [
  "To Kill a Mockingbird by Harper Lee",
  ["Of Mice and Men", "John Steinbeck"],
  {title: "Slaughterhouse-Five", author: "Kurt Vonnegut Jr."},
  {title: "Good Omens", author: ["Terry Pratchett", "Neil Gaiman"]},
  {title: "Lord of the Flies", author: {name: "William Golding", birthplace: "England"}},
  {title: "The Great Gatsby", author: {first: "J.", middle: "Scott", last: "Fitzgerald"}},
  {title: "Biology Textbook", author: {editor: "Dannielle Horton", contributors: ["Lance Smyth", "Brent Mansell", "Bethan Bains", "Michael Keeling"]}},
]

First, let's try to extract a list of authors without pattern matching. We'll take each case one at a time staring with the text-only book entry.

Let's split the book string on by and grab the author name from the end.


def extract_author(book)
  book.split(" by ").last
end

extract_author(books[0])                          # => "Harper Lee"

No problem but...

...when we pass in the second book, an error is raised.


extract_author(books[1])
# ~> -:12:in `extract_author': undefined method `split' for ["Of Mice and Men", "John Steinbeck"]:Array (NoMethodError)
# ~>    from -:26:in `
'

We can't split an array so let's account for this by adding a case statement that switches on string and array. Then, we grab the last element of the array to return the author.


def extract_author(book)
  case book
  when String then book.split(" by ").last
  when Array then book.last
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"

The next book entry is a hash, so let's add another case, grab the author, and return it.


def extract_author(book)
  case book
  when String then book.split(" by ").last
  when Array then book.last
  when Hash then book[:author]
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."

Because the next book entry is also a hash, in order handle it we need to add another case that switches on String and Array.


def extract_author(book)
  case book
  when String then book.split(" by ").last
  when Array then book.last
  when Hash
    case book[:author]
    when String then book[:author]
    when Array then book[:author].join(" & ")
    end
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."
extract_author(books[3])                          # => "Terry Pratchett & Neil Gaiman"

It's not pretty but it works so let's continue.

This one is also a hash but instead of a string or an array, the author is stored inside another hash. We're starting to repeat ourselves but let's add another branch that switches on a hash, grabs the author name, and returns it.


def extract_author(book)
  case book
  when String then book.split(" by ").last
  when Array then book.last
  when Hash
    case book[:author]
    when String then book[:author]
    when Array then book[:author].join(" & ")
    when Hash then book[:author][:name]
    end
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."
extract_author(books[3])                          # => "Terry Pratchett & Neil Gaiman"
extract_author(books[4])                          # => "William Golding"

Since we only use the author key, we can clean the branch up a bit with a temporary variable.


def extract_author(book)
  case book
  when String then book.split(" by ").last
  when Array then book.last
  when Hash
    author = book[:author]

    case author
    when String then author
    when Array then author.join(" & ")
    when Hash then author[:name]
    end
  end
end

Next, it's another author hash but this one stores the name across three different keys. Let's add a conditional that returns :name if the key is present or combines and returns first, middle, and last, if they are set.


def extract_author(book)
  case book
  when String then book.split(" by ").last
  when Array then book.last
  when Hash
    case book[:author]
    when String then book[:author]
    when Array then book[:author].join(" & ")
    when Hash
      author = book[:author]

      case author
      when String then author
      when Array then author.join(" & ")
      when Hash then author[:name]
        if author.key?(:name)
          author[:name]
        elsif author.key?(:first) && author.key?(:middle) && author.key?(:last)
          "#{author[:first]} #{author[:middle]} #{author[:last]} "
        end
      end
    end
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."
extract_author(books[3])                          # => "Terry Pratchett & Neil Gaiman"
extract_author(books[4])                          # => "William Golding"
extract_author(books[5])                          # => "J. Scott Fitzgerald "

Okay, it's getting really dense but we're almost there. Just one more book.

This one uses an author hash with an :editor and a :contributors key. We only need the editor, so let's add one more branch to the conditional.


def extract_author(book)
  case book
  when String then book.split(" by ").last
  when Array then book.last
  when Hash
    case book[:author]
    when String then book[:author]
    when Array then book[:author].join(" & ")
    when Hash
      author = book[:author]

      if author.key?(:name)
        author[:name]
      elsif author.key?(:first) && author.key?(:middle) && author.key?(:last)
        "#{author[:first]} #{author[:middle]} #{author[:last]} "
      elsif author.key?(:editor)
        author[:editor]
      end
    end
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."
extract_author(books[3])                          # => "Terry Pratchett & Neil Gaiman"
extract_author(books[4])                          # => "William Golding"
extract_author(books[5])                          # => "J. Scott Fitzgerald "
extract_author(books[6])                          # => "Dannielle Horton"

Okay, we made it and while I wouldn't be eager to send this to code review, it does the trick.

Let's try that again using pattern matching.

The first book is handled in the same way but when we add the second book, instead of using a case...when, let's use case...in.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in Array then book.last
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"

Aside from the in keyword, this looks identical to our original case statement. But we can go a step further and, instead of matching on type...

...we can deconstruct the array by matching against its two elements—title and author—then return the author.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [title, author] then author
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"

Since we're not using the title variable...

...we can make that explicit and change title to _.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"

Using an array pattern in this way has two notable differences: we get to use a variable name instead of books.last and this branch now only matches an array with two elements, unlike our case statement that matched any array.

We can test this out by...

...passing a three element array to extract_author.

Unlike a normal case statement, which will return nil when none of the branches are true, passing a value that doesn't match raises a NoMatchingPatternError. This means that pattern matching statements should always be exhaustive.


extract_author(["foo", "bar", "baz"])             # =>
# ~> -:12:in `extract_author': ["foo", "bar", "baz"] (NoMatchingPatternError)
# ~>    from -:20:in `
'

If we want to explicitly allow input that doesn't match, we can add an else branch and return nil or a default value.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  else
    "Unknown author"
  end
end

extract_author(["foo", "bar", "baz"])             # => "Unknown author"

Back to the books. The next entry is a hash with an author key so let's match against a hash and return the author value.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  in Hash then book[:author]
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."

This works but we can go further and...

...deconstruct the hash by specifying a key and variable name.

Like the array pattern, this allows us to match against a hash containing a known value and assign a variable but unlike the array pattern, where the number of elements must match, a hash pattern matches so long as the requested keys exist. Even though the hash has a title key, we don't need to include it in the pattern.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  in {author: author} then author
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."

There's also a shorthand for this kind of pattern which allows us to drop the braces and variable name to clean up the line.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  in author: then author
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."

If we try the next book it almost works, returning an array of authors.

Let's add another pattern to handle this case, matching an author key containing an array. Using the splat operator, we can match against an array of any size, assigning the contents of the array to a variable called authors and then join the authors together.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  in author: [*authors] then authors.join(" & ")
  in author: then author
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."
extract_author(books[3])                          # => "Terry Pratchett & Neil Gaiman"

Because Ruby allows patterns to be nested within other patterns, it's possible to combine a hash pattern and an array pattern in a single line. Nesting helps us with the next book too...

...which returns an author hash instead of a string or array.

This time, let's match on a hash with an author key containing another hash with a name key.


def extract_author(book)
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  in author: [*authors] then authors.join(" & ")
  in author: {name:} then name
  in author: then author
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."
extract_author(books[3])                          # => "Terry Pratchett & Neil Gaiman"
extract_author(books[4])                          # => "William Golding"

The next two books can be matched using a similar approach, matching only the keys that we expect to see and using the values to return the author.


def extract_author(book)
  some_object = # ...
  case book
  in String then book.split(" by ").last
  in [_, author] then author
  in author: [*authors] then authors.join(" & ")
  in author: {name:} then name
  in author: {first:, middle:, last:} then "#{first} #{middle} #{last}"
  in author: {editor:} then editor
  in author: then author
  end
end

extract_author(books[0])                          # => "Harper Lee"
extract_author(books[1])                          # => "John Steinbeck"
extract_author(books[2])                          # => "Kurt Vonnegut Jr."
extract_author(books[3])                          # => "Terry Pratchett & Neil Gaiman"
extract_author(books[4])                          # => "William Golding"
extract_author(books[5])                          # => "J. Scott Fitzgerald"
extract_author(books[6])                          # => "Dannielle Horton"

When compared with the previous implementation, pattern matching is more concise with one branch per use case. Rather than spreading the logic across multiple conditionals, pattern matching allows us to express the shape of the data we expect and extract what we need in a single line.

It's worth mentioning that even though the first in branch is identicial to the normal case statement, changing it to when clause causes a syntax error because the two can't be mixed in a single case statement.

So that's a quick look at pattern matching. In cases where primitives with different formats need to be handled in a single place, pattern matching can make your code easier to read and write.

Responses