In Progress
Unit 1, Lesson 21
In Progress

Dig Implementation

Video transcript & code

In the last episode, I showed you the #dig method that's coming to arrays and hashes in Ruby 2.3. But what if you wanted to use this method today, in codebases that haven't been updated to the latest version of Ruby?

One of the best ways to fully understand a Ruby feature is to re-create the functionality ourselves. I thought it might be fun and instructive to do that with the #dig method. And as a by-product, by the end of today's episode we'll have a fully-functional dig implementation we can drop into any Ruby 2.0 program.

OK, let's get started.

We'll begin by taking a look at the Ruby 2.3 code to see how the #dig method is implemented there.

#dig is implemented for hashes, arrays, and structs. Here's the Hash C source code. In it we find the the definition of the #dig method as the C function rb_hash_dig().

VALUE
rb_hash_dig(int argc, VALUE *argv, VALUE self)
{
    rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
    self = rb_hash_aref(self, *argv);
    if (!--argc) return self;
    ++argv;
    return rb_obj_dig(argc, argv, self, Qnil);
}

This definition is basically just a stub. Because hashes, arrays, and structs all share a common implementation of #dig, this method just forwards to the shared definition, which is called rb_obj_dig().

Let's go ahead and mimic this setup, only in Ruby. We define a module. We'll call it Diggable.

In it, we add refinements for the three classes which support #dig in Ruby 2.3: Array, Hash, and Struct.

In the Ruby C implementation, each class gets a stub method which forwards to a master definition of #dig which is defined elsewhere.

We're going to copy the spirit of this code by having each refined class include the Diggable module. This shows a capability of refinements we haven't explored before: not only can we add methods to refined classes, we can also add new mixin modules.

module Diggable
  def dig(*segments)
  end

  refine Array do
    include Diggable
  end

  refine Hash do
    include Diggable
  end

  refine Struct do
    include Diggable
  end
end

Now we need to know how to write our master #dig method. In order to get an idea, let's look at the Ruby 2.3 source code for it.

Here we can see the real definition of #dig functionality in the the function rb_obj_dig().

The bulk of the function is a big loop, iterating over the path segments given as arguments.

Inside the loop, there is a type-casing switch on the type of the current object. Let's look at the first switch case, which is triggered when the object is a Hash.

There is some checking to make sure that the basic #dig method definition hasn't been overridden on the object. Then, the segment is used as a hash key, via the rb_hash_aref() function. The returned object becomes the new current object, and the loop continues from the beginning.

Now let's look at the next case, for Ruby arrays. At first glance, it looks exactly the same. In fact, there's only one difference: instead of calling the rb_hash_aref() function, it calls rb_ary_at(), which is the C function for the Array subscript operator.

The same is true of the Struct case: the only difference is in the function which is ultimately called to fetches data by a key.

What this tells us is that this case statement exists because in C, different functions have to be called to execute the same operation on varying object types. This means that in our Ruby implementation, we won't need to mimic this case statement.

If the object turns out not to be a Hash, Array, or Struct; or if it turns out to have its own custom implementation of #dig, execution falls through to this last function call.

Instead of continuing the loop, this call simply delegates to the special implementation of #dig on the current object, if any. If it isn't found, it will return nil, which is the value of the notfound default argument passed in from higher up.

VALUE
rb_obj_dig(int argc, VALUE *argv, VALUE obj, VALUE notfound)
{
    struct dig_method hash = {Qnil}, ary = {Qnil}, strt = {Qnil};

    for (; argc > 0; ++argv, --argc) {
        if (!SPECIAL_CONST_P(obj)) {
            switch (BUILTIN_TYPE(obj)) {
              case T_HASH:
                if (dig_basic_p(obj, &hash)) {
                    obj = rb_hash_aref(obj, *argv);
                    continue;
                }
                break;
              case T_ARRAY:
                if (dig_basic_p(obj, &ary)) {
                    obj = rb_ary_at(obj, *argv);
                    continue;
                }
                break;
              case T_STRUCT:
                if (dig_basic_p(obj, &strt)) {
                    obj = rb_struct_lookup(obj, *argv);
                    continue;
                }
                break;
            }
        }
        return rb_check_funcall_default(obj, id_dig, argc, argv, notfound);
    }
    return obj;
}

Let's go ahead and write a Ruby version of this code. We'll enable it to take a variable number of arguments, and we'll call those arguments "segments", because they make up the segments of a kind of "data path".

Remember, because we're operating at a higher level of abstraction, the Ruby logic can be a little bit simplified compared to the C version.

We'll start by defining a current object variable, which will begin by pointing at self.

Then we will loop over segments. We'll do this a little differently than you might expect: we use a while loop to check whether there are any segments left to process.

We'll talk about why we do it this way in a moment.

On the next line, we update the current segment and the remaining list of segments using destructuring assignment.

If this code isn't clear to you, you might want to review episodes #80 and #81.

Then we begin a conditional. First we check to see if the object has its own implementation of #dig. If it does, we delegate forward to that #dig method.

Notice that we splat out the remaining segments as the argument to this message. This is why we're using a while loop and destructuring assignment to iterate over segments: because it's important to have access to both the current segment and the list of remaining, unprocessed segments. If we just used a typical #each iteration, we wouldn't have a handy list of the remaining segments right at hand. This style also keeps us a little closer to the spirit of the C version.

We use a loop-breaking return here because at this point, because we are delegating the processing of all remaining segments to this next #dig method. So we don't need to process any more segments in our loop.

Now, there's one thing you might be wondering about at this point. Won't this line always be triggered, since we are adding Diggable to all hashes, arrays, and structs?

This is where we get into the subtleties of how refinements work, and why they are an improvement on monkey-patching. Because as far as the the arrays and hashes we are operating on, they are 100% pure unmodified Ruby core objects. Refinements may make it appear, from outside, as if the object has a new or modified method. But they do not alter an objects internal sense of itself. When we ask the object if it responds to #dig, it's still going to truthfully say no.

We'll probably talk about this behavior more in a future episode.

If the current object turns out not to have its own special #dig implementation, we move on to the next case. This is the case where, in the C code, the object was tested to see if it was a Hash, Array, or Struct. We're going to be a little more broad than that, and just ask the object if it responds to the subscript operator.

If it does, we send the subscript message, with the current segment as the key, or index. The result value replaces the current object, and we loop around again.

If there is neither a specializes #dig nor a subscript operator available—if, for instance, we have hit a nil value—we simply return nil.

Finally, assuming none of the cases above have resulted in an early return, we return the final object that we got back from the last subscript call.

module Diggable
  def dig(*segments)
    object = self
    while segments.any?
      current_segment, *segments = segments
      if object.respond_to?(:dig)
        return object.dig(*segments)
      elsif object.respond_to?(:[])
        object = object[current_segment]
      else
        return nil
      end
    end
    object
  end

  refine Array do
    include Diggable
  end

  refine Hash do
    include Diggable
  end

  refine Struct do
    include Diggable
  end
end

Let's test this out. We have to specify that we are using our Diggable refinement.

Then we'll read in some deeply nested weather forecast data, as we did in the last episode.

Next we'll dig down into the data to find a particular low temperature prediction. It works!

If we modify one of the steps in the data path to point to be invalid, we get a nil result.

Now let's test an edge case. We'll modify one of the prediction objects in the data set to have a special, singleton definition of #dig which just returns a string.

When we again dig into the data, including our modified singleton in the path, we can see that we get the specialized return value.

Every key past that point has been ignored, because we delegated to the specialized #dig.

require "./diggable"

using Diggable

forecast = JSON.parse(File.read("forecast.json"))

forecast.dig("list", 39, "main", "temp_min")
# => 284.37

forecast.dig("list", 40, "main", "temp_min")
# => nil

def (forecast["list"][0]).dig(*)
  "TOO COLD!"
end

forecast.dig("list", 0, "main", "temp_min")
# => "TOO COLD!"

So there we have it: our own reusable definition of #dig, which should work in any Ruby 2.0 codebase. We've learned a little more about using refinements, as well as about how Ruby features are implemented in C code. I think that's plenty for today. Happy hacking!

Responses