In Progress
Unit 1, Lesson 1
In Progress

Memoize

Meta-programming is the art of writing code that writes code. Easy meta-programming is one of Ruby’s great strengths, but getting started at it can seem a little daunting.

In this episode, you’ll get your feet wet with meta-programming by tackling a classic problem: optimizing a method so that it reuses previously calculated results.

Video transcript & code

Memoizing the return value of a method is one of the classic examples used to demonstrate metaprogramming in Ruby as well as in other languages. We're going to go through this example today, partly because some viewers may never have seen it, and partly because it will lay the groundwork for demonstrating some other, more advanced techniques in future episodes.

So what do we mean by "memoize"? Well, consider a function which is expensive, in terms of CPU time, to execute. As an example we'll use this extremely naive anagram-finding method. Given a word, this method will discover anagrams by simply calculating all possible permutations of the letters, then checking each permutation to see if it exists in the system dictionary.

On my system, it takes a second or two to find anagrams for my name. Adding just one more letter causes it to take considerably longer.

class Dictionary
  def initialize(dict_path)
    @dict_path = dict_path
  end

  def anagrams_for(word)
    word.downcase.chars.to_a.permutation.select do |perm|
      perm_word = perm.join
      IO.foreach(@dict_path).any?{|dict_word| 
        dict_word.downcase.chomp == perm_word.downcase
      }
    end.map(&:join).uniq
  end
end
dict = Dictionary.new("/usr/share/dict/words")

require 'benchmark'
puts Benchmark.measure {
  dict.anagrams_for("avdi")          # => ["avid", "diva"]
}

Now, given that the dictionary we use doesn't change, the anagrams for a given word will also never change. So once we've calculated anagrams for a word, we could cache them somewhere. Then the next time that word was specified, instead of searching for anagrams again, we could just return the cached results.

That is memoization in a nutshell. We can modify our anagram-finding method to memoize its results by having it store cached results in a Hash, keyed on the word given, and stored in an instance variable. Then, when a previously-requested word is seen, the method returns the cached value rather than proceeding.

class Dictionary
  def initialize(dict_path)
    @dict_path = dict_path
  end

  def anagrams_for(word)
    @memoized_anagrams ||= {}
    if @memoized_anagrams.key?(word)
      return @memoized_anagrams[word]
    end
    results = word.downcase.chars.to_a.permutation.select do |perm|
      perm_word = perm.join
      IO.foreach(@dict_path).any?{|dict_word| 
        dict_word.downcase.chomp == perm_word.downcase
      }
    end.map(&:join).uniq
    @memoized_anagrams[word] = results
    results
  end
end

dict = Dictionary.new("/usr/share/dict/words")

require 'benchmark'
Benchmark.bm { |bm|
  bm.report("1st call:") { dict.anagrams_for("avdi") }
  bm.report("2nd call:") { dict.anagrams_for("avdi") }
}

This is such a useful technique that we might well want to apply it to more methods than just our anagram example. To that end, we decide to create a "macro" which will modify arbitrary methods so that they are memoized.

We create a module that we call Memoizable. Inside, we define a method memoize which takes the name of a method to modify. The first thing it does is grab a reference to the method's original implementation, using the method named instance_method().

module Memoizable
  def memoize(method_name)
    original_method = instance_method(method_name)
  end
end

As you may recall from episode 16, this returns an UnboundMethod object. In case you don't remember, here's a quick refresher.

Sending instance_method to a class, with the name of a method, results in an UnboundMethod object. That object can then be bound to a given instance of the class. Then the resulting Method object can be called. Even if the method is later overwritten in the class, the UnboundMethod continues to remember the old definition.

class Greeter
  def hello
    "Hello, world"
  end
end

um = Greeter.instance_method(:hello) # => #<UnboundMethod: Greeter#hello>
g = Greeter.new                     # => #<Greeter:0x000000009ad670>
m = um.bind(g)                      # => #<Method: Greeter#hello>
g.hello                             # => "Hello, world"
m.call                              # => "Hello, world"

class Greeter
  def hello
    "Goodbye, world"
  end
end

g.hello                         # => "Goodbye, world"
um.bind(g).call                 # => "Hello, world"

Getting back to our memoize method, after getting an object representing the original implementation of the method being memoized, it proceeds to replace that method using #define_method. Inside, it proceeds to conditionally initialize an instance variable, whose name is based on the method name, with an empty Hash. It checks to see if there are any elements in the Hash stored under a key matching the method's current arguments. If so, it returns early with the value stored under that key. Otherwise, it proceeds to call the original method implementation. It stores the result of the call in the memoization hash, for future retrieval, and then returns the result.

module Memoizable
  def memoize(method_name)
    original_method = instance_method(method_name)
    cache_ivar      = "@memoized_#{method_name}"
    define_method(method_name) do |*args, &block|
      cache = if instance_variable_defined?(cache_ivar)
                instance_variable_get(cache_ivar)
              else
                instance_variable_set(cache_ivar, {})
              end
      if cache.key?(args)
        return cache[args]
      else
        result = original_method.bind(self).call(*args, &block)
        cache[args] = result
        result
      end  
    end
  end
end

With this module finished, we proceed to extend our Dictionary class with it, and tell it to memoize the #anagrams_for method. We then re-run our benchmarks, and see that indeed, the second call with identical arguments finishes much faster than the first.

class Dictionary
  extend Memoizable

  memoize(:anagrams_for)
end

dict = Dictionary.new("/usr/share/dict/words")

require 'benchmark'
Benchmark.bm { |bm|
  bm.report("1st call:") { dict.anagrams_for("avdi") }
  bm.report("2nd call:") { dict.anagrams_for("avdi") }
}

We now have a generalized macro for making arbitrary methods into memoizing methods. There lots of variations on this theme that we can explore, but I'll save those for other episodes. Happy hacking!

Responses