In Progress
Unit 1, Lesson 21
In Progress

Identity Map

Video transcript & code

In the previous episode we encountered the problem of "aliasing" when dealing with object-relational mapper libraries. Because more than one model object may exist to represent a given row in the database, it is possible update the wrong instance and lose changes, or introduce other incorrect behavior.

Today we're going to take the code we introduced in that episode, and rewrite it to prevent aliasing problems before they can even get started. As before, we have an author and three stories. We want to update the author with pen-name information from each of the stories. Unfortunately, when we try to do this, the changes to the author are never recorded. This is because when we follow the author association back from one of the stories, we get a different object than the author object we started with—the one we are saving back to the database.

author = AM.find(1)
update_pen_names(author)
AM.store(author)
AM.find(1).pen_names
# => nil

author.object_id                              # => 22795980
author.stories.first.author.__get__.object_id # => 22747020

In order to fix this issue, we're going to use the Identity Map pattern. An identity map ensures that only one object corresponds to a given record in the database at a time.

The Identity Map itself is nothing special. In fact, it's so un-special we'll just use a Hash. Then we'll pass that hash in when we create the data mapper objects.

ID_MAP = {}
AM = AuthorMapper.new(ID_MAP)
SM = StoryMapper.new(ID_MAP)

Then we move to the data mapper code. We enable it to accept the identity map as a constructor argument. We change the #store method to update the identity map when it is done writing an object. As a key to the map, we use a two-element array containing the type of the object and it's ID. Note that this is not its Ruby object ID; this is the id field which is used as a primary key in the database table.

Using an array as a Hash key shows off the fact that Ruby Hashes can accept any kind of object as a key.

Next we update the #load method, which is responsible for making new model objects using data returned from the database. We surround the whole method in a call to @id_map.fetch. Again, we use an array of object type and ID as the key. If the identity map already contains an object matching that key, it will be returned immediately. Otherwise, a new model object will be constructed, and added to the ID map before it is returned.

class TeenyMapper
  # ...
  def initialize(id_map)
    @id_map = id_map
  end

  def store(object)
    data  = object.to_h
    store_row(object.class, data)
    object.id = data[:id]
    @id_map[[object.class, object.id]] = object
  end

  # ...

  def load(type, data)
    @id_map.fetch([type, data[:id]]) do
      object = type.new
      data.each_with_object(object) { |(key, value), o|
        o[key] = value
      }
      @id_map[type, object.id] = object
    end
  end
end

We also update the AuthorMapper and StoryMapper classes to pass their identity maps in when they create new mapper objects internally.

Those are all the changes we need to make to the code. Now when we compare the author object ID to the ID of the object we get from following associations from author to stories and back to author, we can see that it is the exact same object. There are no more aliases. And when we once again run our code to update author pen names, this time the changes "stick". Since all the changes took effect on the sole, authoritative Author instance, no matter how that instance was found, none of the changes were lost.

author = AM.find(1)
author.object_id                              # => 15423800
author.stories.first.author.__get__.object_id # => 15423800
author = AM.find(1)
update_pen_names(author)
AM.store(author)
AM.find(1).pen_names
# => "Anson Macdonald, Lyle Monroe, Caleb Saunders"

Note that the way we've implemented an identity map here will not reduce the number of database requests that are made. However, with a little bit more effort we could set things up so that the identity map also functions as a cache, avoiding repeated database requests for objects which have already been loaded once.

It's worth noting that for best results, you should carefully control the lifetime and scope of an identity map. If a web server had a single, global identity map it could easily balloon in size as it grew to contain every object that had been loaded since the app was started. For web applications, it's best to use a single, fresh identity map per request. In the RubyTapas.com app every thread has its own Identity Map, and the map is cleared at the beginning of every request.

before do
  RubyTapas.clear_id_map
  RubyTapas.base_url = Addressable::URI.parse(request.url).site.to_s
  self.current_user = load_user
end

If you want to start using an identity map in your apps today, but you don't want to write your ORM from scratch, you have a few options. ActiveRecord briefly had an optional identity map, but unfortunately it was removed because of bugs. The DataMapper gem, which confusingly does not implement the true Data Mapper pattern, uses an identity map. That project's successor, the Ruby Object Mapper, is still young but it contains an identity map in the rom-session gem. Jamie Gaskins' Perpetuity gem, which is another young but promising Ruby Data Mapper implementation, uses an identity map. As does the Mongoid gem for MongoDB interaction, in the form of an optional configuration setting.

Aliasing is a pernicious problem affecting apps that talk to a database through an ORM. However, it's not a problem we have to accept. Using the Identity Map pattern, we can ensure that for the lifetime of a request, there will be only one live object corresponding to a given record in the database. We can be confident that all of our code will see the same, consistent state for that object.

And that's it for today. Happy hacking!

Responses