In Progress
Unit 1, Lesson 1
In Progress

Dig

Video transcript & code

Last week I attended RubyConf 2015 in San Antonio, Texas. In between gorging on great tex-mex food and remembering the Alamo, I also took in a few Ruby talks. One of those was Matz' annual keynote, in which I learned about a few more of the features coming to Ruby 2.3. I want to share one of those features with you that I'm particular excited about.

Let's say we're querying a web service for weather forecasts. Like many modern web services, it returns JSON data.

And the JSON data is pretty complex: there are maps containing lists containing more maps, and so on.

In fact, just to get the predicted low temperature on a given day and time in the future, we have to traverse four levels of nesting.

Let's read in this data in order to play with it.

We ask for the "list" of predictions, then specify the index of one of the predictions, then descend into the "main" section of that prediction, then ask for the "tempmin" member.

Finally, we get the temperature back.

If at any point we specify a key that can't be found, we get a NoMethodError instead.

For now let's leave aside the fact that this is a particularly ambiguous error, since we don't know which of the four subscript messages on this line wound up being sent to nil. Maybe in our use case we don't consider a missing temperature to be an exceptional case. Maybe we really just want a nil when some segment of the data path turns out to lead to a dead end.

require "json"

forecast = JSON.parse(File.read("forecast.json"))

forecast["list"][39]["main"]["temp_min"]
# => 284.37

forecast["list"][40]["main"]["temp_min"]
# =>

# ~> NoMethodError
# ~> undefined method `[]' for nil:NilClass
# ~>
# ~> xmptmp-in25552U3h.rb:8:in `<main>'

There are a few ways to rewrite this code to return nil when the full data path is not found.

I'll start with a technique that I'm going to demonstrate only in order to tell you why you should never use it.

We can append a rescue nil statement modifier to the end of the whole expression, like this:

require "json"

forecast = JSON.parse(File.read("forecast.json"))

forecast["list"][40]["main"]["temp_min"] rescue nil
# => nil

There are very few techniques in Ruby which I will unequivocally call "antipatterns", but this is one of them. Because the rescue statement modifier indiscriminately quashes the vast majority of exception types, it is far too easy for code like this to accidentally hide mistakes such as method typos. I can't tell you how many times I've seen this technique conceal bugs in programs.

And anyway, this doesn't really express our intent. The whole point here is that for our purposes, failing to get a temperature data point here does not represent an error. So dealing with it as an exception is misleading.

Instead, we should be digging down through the layers of data in a way that responds benignly to a missing key or index.

If you've been following this show for a while, you probably know that one way we can do this is using the #fetch method.

We can re-write our query to use #fetch at each level. For every call, we specify a default blank value for when the key is not found. For the "list" key, we supply an empty array. Then for the prediction index, we specify an empty hash as a default in case that index turns out to be out of bounds. And so on down the line, until the final segment gets a default of nil.

require "json"

forecast = JSON.parse(File.read("forecast.json"))

forecast.fetch("list", [])
  .fetch(40, {})
  .fetch("main", {})
  .fetch("temp_min", nil)
# => nil

This works, and expresses our intent to handle missing data benignly. And without adding new methods to built-in classes, it's one of the best options we have in Ruby versions 2.2 and below. But we have to admit, it's not very concise.

Now, as you might recall from episode #362, Ruby 2.3 is getting a new "safe navigation" operator for traversing chains of objects. That seems like the sort of thing we could use here.

forecast&.["list"]&.[39]&.["main"]&.["temp_min"]

Unfortunately, this syntax is only supported for ordinary method calls, not for the subscript operator.

Never fear, though. Ruby 2.3 has not left us out in the cold!

Instead of the safe navigation operator, we can use the new #dig method. This method is available on all Hash, Array, and Struct objects in Ruby 2.3.

Instead of a series of subscript operations, we send the #dig message to our data root. As arguments, we specify each index or key in the path to the data. When the keys are all found, this method returns the final value found at the end of the chain.

But if one of the keys is not found, at any point in the path, we get a nil return value.

require "json"

forecast = JSON.parse(File.read("forecast.json"))

forecast.dig("list", 39, "main", "temp_min")
# => 284.37

forecast.dig("list", 40, "main", "temp_min")
# => nil

This is a lot nicer than our chain of #fetch messages.

Just to demonstrate that #dig isn't only for hashes, let's write an array and send #dig to it:

a = %w[apple pear banana]
a.dig(1)                        # => "pear"

#dig is also available on structs. Here's a simple struct that has three attributes. When we instantiate it with some data, we can then send #dig to dig down into that data.

Faves = Struct.new(:color, :fruits, :number)
myfaves = Faves.new("pink", %w[apple pear banana], 23)
myfaves.dig("fruits", 2)        # => "banana"

This is consistent with the fact that in many ways, Structs can behave like fancy hashes.

And that's what you need to know about the #dig method coming to Ruby 2.3. It's a long overdue way of easily drilling down into complex data structures, and I'm very happy to see it arrive.

Happy hacking!

Responses