In Progress
Unit 1, Lesson 21
In Progress

Ruby Spelunking

One of my favorite things about working with Ruby is that it is a tremendously reflective language. What I mean by that is that instead of coming to understand what a program is doing purely by reading code and consulting documentation, there are many ways in Ruby to dynamically *ask* a program about itself. Unfortunately, all too often I fail to take advantage of these features and I miss out on the insights they could provide.

If you have the same problem, well, put on you overalls and your carbide lamp. Because in today’s episode, guest chef Jordan Raine joins us to take us on a little cave-diving expedition into Ruby’s reflective capabilities. You’ll see how with the help of some standard Ruby methods and one extra gem, we can map out unfamiliar code without having to grep through the source files. Enjoy!

Video transcript & code

Ruby Spelunking

I often find myself working with code that surprises me. Whether it be an complex gem or my own code from six months ago, one of my favorite approaches to getting unstuck is to use method introspection to go spelunking.

Spelunking is the act of jumping from method to method using method introspection, reading the source code as you go. Much like real spelunking, spelunking through Ruby code can lead us to dark and exciting places. While documentation may be inaccurate or incomplete, source code reveals exactly what happens when you call a method.

Before we get started, let's learn a few things about method introspection to make the trip a success.

We have a single method called foo that returns the string "Hello".

Using the method method, we can get the method object for foo.

If we call source_location, we can see the file and line number where the method was defined.

A natural next step is call source but when we do, we get a NoMethodError. Thankfully, a gem called method_source provides the exact functionality we want. Let's include it and try again.

Nice! It returns a string of the method definition. To make it more readable, let's add a call to Object#display, which prints the string to STDOUT and returns nil. Now we see the method with the same formatting as we used in the code.


require "method_source"

def foo
  "Hello!"
end

method(:foo).source.display # => nil

# >> def foo
# >>   "Hello!"
# >> end

There's one other trick we need to learn before going spelunking. Let's wrap foo in a class and add a subclass that overrides the foo method and calls super.

Now, let's update our code to get the foo method object from the child. It returns the source of the Child#foo method but how do we follow the code path up into the super method? We can use the super_method method.

Going back to the foo method object, we can see it is defined on Child.

If we add super_method, it now returns a method defined on Parent.

And now we can get the source code from the super method.


require "method_source"

class Parent
  def foo
    "Hello!"
  end
end

class Child < Parent def foo super.reverse end end Child.new.method(:foo).super_method.source.display # => nil

# >>   def foo
# >>     "Hello!"
# >>   end

Okay, now we're ready to go. Now, it's not very interesting to go spelunking in a one-room cave, so let's step into some code that is deeply layered. Let's look at activerecord.

We have an example Rails app with a Book model and while writing code, we noticed some oddities with the count, size, and length methods.

All three return the same value but the underlying queries are different. count and size run SELECT COUNT(*) while length runs a SELECT *.

To make things more confusing, the queries change when we reorder the method calls, moving size last. Now we only see two queries. Why is that?


require_relative "./example-app"

books = Book.all
books.count # => 2
books.length # => 2
books.size # => 2

# >>    (0.1ms)  SELECT COUNT(*) FROM "books"
# >>   Book Load (0.1ms)  SELECT "books".* FROM "books"

Let's investigate count first by getting the count method. Here we can see it is defined by the ActiveRecord::Calculations module.


require_relative "./example-app"

books = Book.all
books.method(:count) # => #

Displaying the source, we can see it takes a column name and when there is a block, it does something special, otherwise it calls calculate.


require_relative "./example-app"

books = Book.all
books.method(:count).source.display # => nil

# >>     def count(column_name = nil)
# >>       if block_given?
# >>         unless column_name.nil?
# >>           ActiveSupport::Deprecation.warn \
# >>             "When `count' is called with a block, it ignores other arguments. " \
# >>             "This behavior is now deprecated and will result in an ArgumentError in Rails 6.0."
# >>         end
# >>
# >>         return super()
# >>       end
# >>
# >>       calculate(:count, column_name)
# >>     end

Looking at calculate next, it takes an operation and a column name and does some work. In our case, we're not using any includes so it calls perform_calculation.


require_relative "./example-app"

books = Book.all
books.method(:calculate).source.display # => nil

# >>     def calculate(operation, column_name)
# >>       if has_include?(column_name)
# >>         relation = apply_join_dependency
# >>
# >>         if operation.to_s.downcase == "count"
# >>           relation.distinct!
# >>           # PostgreSQL: ORDER BY expressions must appear in SELECT list when using DISTINCT
# >>           if (column_name == :all || column_name.nil?) && select_values.empty?
# >>             relation.order_values = []
# >>           end
# >>         end
# >>
# >>         relation.calculate(operation, column_name)
# >>       else
# >>         perform_calculation(operation, column_name)
# >>       end
# >>     end

Here's where things start to get complex, with some special cases for count and distinct operations. Ignoring the bulk of the method, let's dive one level deeper into execute_simple_calculation.


require_relative "./example-app"

books = Book.all
books.method(:perform_calculation).source.display # => nil

# >>       def perform_calculation(operation, column_name)
# >>         operation = operation.to_s.downcase
# >>
# >>         # If #count is used with #distinct (i.e. `relation.distinct.count`) it is
# >>         # considered distinct.
# >>         distinct = distinct_value
# >>
# >>         if operation == "count"
# >>           column_name ||= select_for_count
# >>           if column_name == :all
# >>             if distinct && (group_values.any? || select_values.empty? && order_values.empty?)
# >>               column_name = primary_key
# >>             end
# >>           elsif column_name =~ /\s*DISTINCT[\s(]+/i
# >>             distinct = nil
# >>           end
# >>         end
# >>
# >>         if group_values.any?
# >>           execute_grouped_calculation(operation, column_name, distinct)
# >>         else
# >>           execute_simple_calculation(operation, column_name, distinct)
# >>         end
# >>       end

It's a big one but, finally, four levels down we've found the code that generates and executes the query using select_all. We might not know the ins-and-outs of these code paths but that's okay, we now understand enough to go back and look at how length is different.


require_relative "./example-app"

books = Book.all
books.method(:execute_simple_calculation).source.display # => nil

# >>       def execute_simple_calculation(operation, column_name, distinct) #:nodoc:
# >>         column_alias = column_name
# >>
# >>         if operation == "count" && (column_name == :all && distinct || has_limit_or_offset?)
# >>           # Shortcut when limit is zero.
# >>           return 0 if limit_value == 0
# >>
# >>           query_builder = build_count_subquery(spawn, column_name, distinct)
# >>         else
# >>           # PostgreSQL doesn't like ORDER BY when there are no GROUP BY
# >>           relation = unscope(:order).distinct!(false)
# >>
# >>           column = aggregate_column(column_name)
# >>
# >>           select_value = operation_over_aggregate_column(column, operation, distinct)
# >>           if operation == "sum" && distinct
# >>             select_value.distinct = true
# >>           end
# >>
# >>           column_alias = select_value.alias
# >>           column_alias ||= @klass.connection.column_name_for_operation(operation, select_value)
# >>           relation.select_values = [select_value]
# >>
# >>           query_builder = relation.arel
# >>         end
# >>
# >>         result = skip_query_cache_if_necessary { @klass.connection.select_all(query_builder, nil) }
# >>         row    = result.first
# >>         value  = row && row.values.first
# >>         type   = result.column_types.fetch(column_alias) do
# >>           type_for(column_name)
# >>         end
# >>
# >>         type_cast_calculated_value(value, type, operation)
# >>       end

So, let's grab the length method source. Immediately, we can see a difference -- instead of a normal def keyword, we see a call to delegate, which forwards a method to another object, in this case the records object.


require_relative "./example-app"

books = Book.all
books.method(:length).source.display # => nil

# >>     delegate :to_xml, :encode_with, :length, :each, :uniq, :join,
# >>              :[], :&, :|, :+, :-, :sample, :reverse, :rotate, :compact, :in_groups, :in_groups_of,
# >>              :to_sentence, :to_formatted_s, :as_json,
# >>              :shuffle, :split, :slice, :index, :rindex, to: :records

If we look at the records method, we can see it loads data from the database then returns the records.


require_relative "./example-app"

books = Book.all
books.method(:records).source.display # => nil

# >>     def records # :nodoc:
# >>       load
# >>       @records
# >>     end

So books.length is equivalent to calling books.records.length. Okay, makes sense! Even though the method names are similar between count and length, the code paths are wildly different.


require_relative "./example-app"

books = Book.all
books.records.length # => 2

# >>   Book Load (0.1ms)  SELECT "books".* FROM "books"

Now what about the oddity with size? Why did the query disappear if size was called after length? Once we take a peek at the source, it becomes obvious: when the records are loaded, return the length, otherwise execute a count. Because calling length loads the records, it also has the side effect of changing how size behaves.


require_relative "./example-app"

books = Book.all
books.method(:size).source.display # => nil

# >>     def size
# >>       loaded? ? @records.length : count(:all)
# >>     end

So that's a quick look at how method introspection, or spelunking, can help you debug problems or explore libraries you're unfamiliar with. Thanks for watching!

Responses