In Progress
Unit 1, Lesson 21
In Progress

Arrays To Hashes

(see also Ruby 2.4 Update below)

Video transcript & code

In the previous episode, we were talking about named regular expression captures. We saw a bunch of different ways to extract captures from MatchData objects by their names.

patt = /\A(?<num>\d{3})-(?<name>[\w-]+)\.(?<ext>\w{3,4})\z/
filename = "375-named-capture.mp4"
patt =~ filename                # => 0
$~[:num]                        # => "375"
Regexp.last_match[:name]        # => "named-capture"
md = patt.match(filename)       # => #<MatchData "375-named-capture.mp4" num:...
md[:ext]                        # => "mp4"

What we didn't look at was any way of getting lists of all of the captures at once. As you might expect, these methods exist.

If we send the #names message to a MatchData object, we get a list of the capture group names.

If we send the #captures message, we get a list of the captured values, in the same order as the names.

patt = /\A(?<num>\d{3})-(?<name>[\w-]+)\.(?<ext>\w{3,4})\z/
filename = "375-named-capture.mp4"
md = patt.match(filename)       # => #<MatchData "375-named-capture.mp4" num:...
md.names                        # => ["num", "name", "ext"]
md.captures                     # => ["375", "named-capture", "mp4"]

There is one thing missing, though: there is no way to ask a MatchData object for a hash of capture names to capture values.

(see Ruby 2.4 Update below)

I find this to be a glaring and rather inexplicable omission. However, it does give me a good excuse to talk about converting arrays into hashes.

I thought we could go through several iterations of the array-to-hash transformation, and see what we can learn along the way.

The first step for any of our solutions involves combining two arrays into one.

That's what the #zip method is for. We send it to our first array, and pass the second as an argument. The result is an array of two-element arrays, that has been "zipped" together from the inputs like a zipper on a jacket.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)
# => [["num", "375"], ["name", "named-capture"], ["ext", "mp4"]]

This is a lot like a Hash. And indeed, we can treat it as one using the #assoc method on Array.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)
# => [["num", "375"], ["name", "named-capture"], ["ext", "mp4"]]
zipped.assoc("ext")             # => ["ext", "mp4"]

Notice that the return value is the whole pair, not just the value member of the pair.

That's nice and all, but we set out to create a hash. Let's pretend we don't know any shortcuts to converting this data to a hash, and think about some of the ways we might go about it. With data like this, my first thought is to look for an iterative solution of some kind.

One such solution is to use #reduce to build a hash.

We send reduce to the zipped array, and pass in an empty hash as the "seed" value. Inside the the block, we receive the result hash that we're building, and the next key-value pair.

We split the pair into name and value using destructuring assignment, as we learned about in episodes #80 and #81.

Then we use Hash#merge to take the existing hash and create a new one with the new key and value added.

The #merge method returns this hash, so it becomes the next value of the result argument. At the end, we have our hash of keys to values.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

zipped.reduce({}){|result, pair|
  name, value = pair
  result.merge(name => value)
}
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

There's a line we can eliminate from this code if we wanted to. We can get rid of the line extracting name and value from the pair, and instead destructure the passed array right in the block parameter specification, using a parenthesized group.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

zipped.reduce({}){|result, (name, value)|
  result.merge(name => value)
}
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

This solution involves creating a new hash at each step. It's a very functional solution. If we profiled our code and found that the hash allocations were a significant source of overhead, we could avoid them by switching to merge!.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

zipped.reduce({}){|result, (name, value)|
  result.merge!(name => value)
}
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

Now instead of constantly making new hashes, the original hash is repeatedly modified.

But there is still hash allocation going on with each iteration. Can you spot it? It's in the single-element hash that we construct in order to merge it into the result hash.

Let's pretend we've profiled this and found the remaining hash allocations to still be a problem. It seems odd, anyway, to be using #merge when all we are adding on each iteration is a single key and value.

Why not use ordinary hash assignment?

We can, with one caveat: we have to add a line to explicitly return the result hash from the block.

The #reduce method depends on the return value in order to carry it forward to the next iteration, and the return value of a hash value assignment is the value, not the whole hash.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

zipped.reduce({}){|result, (name, value)|
  result[name] = value
  result
}
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

But, now that we're using assignment, that calls into question whether we ought to be using #reduce at all. After all, #reduce is aimed at the kind of operation where we need to keep building on the result of the last iteration.

If all we need to do is perform iterative modifications on one object, a more appropriate method might be #each_with_object.

As with #reduce, we pass an empty hash as a seed value to #each_with_object. Unlike #reduce, the arguments are in the opposite order: first comes the pair of key and value, then comes the result.

One way to remember this order is by recalling that it is suggested by the name of the method: #each_with_object has the object at the end.

Inside the block, we assign our key and value to the result hash.

The resulting hash is the same as before.

#each_with_object ignores any block return value and just keeps yielding the same object. So we don't need to worry about what the last line of the block returns.

You might notice that I've switched to using do/end instead of curly braces.

That's because I follow the Jim Weirich convention for blocks. I'll get around to doing an episode on this at some point. But to put it briefly: I've switched to using do/end because now I'm using the block for its side-effects rather than for its return value.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

zipped.each_with_object({}) do |(name, value), result|
  result[name] = value
end
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

All of the approaches we've looked at so far involve iterating through the zipped arrays, and operating on each element with a block. This style of processing is nice because it's very flexible and open to extension.

For instance, if we decided that we wanted the resulting array to have symbolic keys instead of string keys, we could just make a quick change inside our block.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

zipped.each_with_object({}) do |(name, value), result|
  result[name.to_sym] = value
end
# => {:num=>"375", :name=>"named-capture", :ext=>"mp4"}

But what if we don't have any need to modify the keys or values before adding them to the hash? Is there a more concise way to do this transformation?

Of course there is! In fact, there are at least two ways. Before Ruby 2.0, we would have done it like this: We would have passed the zipped arrays into the special Hash square-bracket constructor method. The result is our familiar hash.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

Hash[zipped]
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

But since Ruby 2.0, there is an even more concise and idiomatic way to convert an array of pairs into a hash.

All we have to do is send it the to_h message, and we get the hash we're looking for.

names  = ["num", "name", "ext"]
values = ["375", "named-capture", "mp4"]
zipped = names.zip(values)

zipped.to_h
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

So, to make a long story very short, to turn a MatchData object into a hash, we can zip its names to its values, and send #to_h to the result.

patt = /\A(?<num>\d{3})-(?<name>[\w-]+)\.(?<ext>\w{3,4})\z/
filename = "375-named-capture.mp4"
md = patt.match(filename)       # => #<MatchData "375-named-capture.mp4" num:...
md.names.zip(md.captures).to_h
# => {"num"=>"375", "name"=>"named-capture", "ext"=>"mp4"}

[su_icon_panel background="#ffffff" shadow="0px 1px 2px #eeeeee" text_align="left" icon="https://www.rubytapas.com/wp-content/uploads/2017/11/ruby.png"]Ruby 2.4 Update: In Ruby 2.4.0 and above, you can now use MatchData #named_captures to create a Hash directly from a Regex match.

hmd = patt.match(filename).named_captures

[/su_icon_panel]

 

None of the approaches we've looked at today are the "right" one or the "wrong" one. They are all different ways of approaching the same problem; and which one you choose depends on your needs, as well as your personal tastes. Happy hacking!

Responses