In Progress
Unit 1, Lesson 21
In Progress

Aliasing

Video transcript & code

There's an old saying: "There's many a slip 'twixt the cup and the lip". In web application development, we might revise that to say there's many a slip 'twixt request and the database. In this episode I want to talk about one such "slip": a particularly tricky family of bugs that can interfere with successfully saving application data to a database.

Let's say we have a database with tables for authors and stories. An author "has many" stories, related to it via the story :author_id column.

require "sequel"

DB = Sequel.sqlite

DB.create_table "Author" do
  primary_key :id
  String      :name
  String      :pen_names
end

DB.create_table "Story" do
  primary_key :id
  String      :title
  String      :author_name
  Integer     :author_id
end

We also have classes to represent both authors and stories.

Author = Struct.new(:name, :pen_names, :stories, :id)

Story   = Struct.new(:title, :author_name, :author, :id)

These classes are intended to be domain models, and are ignorant of the database. For instance, the Story class has no author_id field; it just has an author field which is intended to point to an Author instance.

In order to map these objects to and from our database tables, we have an extremely basic set of Data Mapper classes.

class TeenyMapper
  class LazyProxy
    def initialize(&fetcher)
      @fetcher = fetcher
    end

    def method_missing(name, *args, &block)
      __get__.public_send(name, *args, &block)
    end

    def __get__
      @object ||= @fetcher.call
    end
  end

  def store(object)
    data  = object.to_h
    store_row(object.class, data)
    object.id = data[:id]
  end

  def store_row(type, data)
    table = DB[type.to_s.to_sym]
    if data[:id]
      table.where(id: data.delete(:id)).update(data)
    else
      data[:id] = table.insert(data)
    end
  end

  def find(type, id)
    table = DB[type.to_s.to_sym]
    data = table[id: id]
    load(type, data)
  end

  def find_all(type, query)
    table = DB[type.to_s.to_sym]
    table.where(query).map{|data|
      load(type, data)
    }
  end

  def load(type, data)
    object = type.new
    data.each_with_object(object) { |(key, value), o|
      o[key] = value
    }
  end
end

class AuthorMapper < TeenyMapper
  def find(id)
    super(Author, id)
  end

  def store_row(type, data)
    data.delete(:stories)
    super(type, data)
  end
 # !> instance variable @columns not initialized
  def load(type, data)
    author = super
    stories = LazyProxy.new do
      StoryMapper.new.find_all_by_author_id(author.id)
    end
    author.stories = stories
    author
  end
end

class StoryMapper < TeenyMapper
  def find(id)
    super(Story, id)
  end

  def store_row(type, data)
    author = data.delete(:author)
    data[:author_id] = author.id
    super(type, data)
  end

  def find_all_by_author_id(author_id)
    find_all(Story, author_id: author_id)
  end

  def load(type, data)
    author_id = data.delete(:author_id)
    super.tap do |story|
      story.author = LazyProxy.new do
        AuthorMapper.new.find(author_id)
      end
    end
  end
end

We'll skip over the details of these mapper classes for now. All we need to know is that we can instantiate AuthorMapper and StoryMapper objects, and use them to both load and store authors and stories.

Let's set up some records in the database. First we create and store an author. Then we store three stories written by this author. Each story was written under a pseudonym, so they each have a different value in their :author_name field. But they still all link back to the one author object.

AM = AuthorMapper.new
SM = StoryMapper.new
AM.store(heinlein = Author.new("Robert Heinlein"))
SM.store(Story.new("Waldo", "Anson Macdonald", heinlein))
SM.store(Story.new("Lost Legion", "Lyle Monroe", heinlein))
SM.store(Story.new("Elsewhere", "Caleb Saunders", heinlein))
"none"body

Let's check that we've successfully written these objects to the database. We'll fetch the author first.

require "./setup"

author = AM.find(1)
author
# => #<struct Author
#     name="Robert Heinlein",
#     pen_names=nil,
#     stories=
#      #<TeenyMapper::LazyProxy:0x000000018d0d28
#       @fetcher=
#        #<Proc:0x000000018d0cd8@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:62>>,
#     id=1>

Notice that the stories field contains a LazyProxy object. This is a proxy object provided by our Data Mapper layer which only loads associations when they are needed.

Let's check out the stories association.

require "./setup"

author = AM.find(1)
author.stories.to_a
# => [#<struct Story
#      title="Waldo",
#      author_name="Anson Macdonald",
#      author=
#       #<TeenyMapper::LazyProxy:0x0000000234f380
#        @fetcher=
#         #<Proc:0x0000000234f768@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:88>>,
#      id=1>,
#     #<struct Story
#      title="Lost Legion",
#      author_name="Lyle Monroe",
#      author=
#       #<TeenyMapper::LazyProxy:0x00000002347ea0
#        @fetcher=
#         #<Proc:0x00000002347db0@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:88>>,
#      id=2>,
#     #<struct Story
#      title="Elsewhere",
#      author_name="Caleb Saunders",
#      author=
#       #<TeenyMapper::LazyProxy:0x00000002347ab8
#        @fetcher=
#         #<Proc:0x00000002347a18@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:88>>,
#      id=3>]

We can see that all three stories were successfully saved and associated with the author table. We can also LazyProxy at work again. This time, it's holding the place of the author association in each story object.

Author objects have a pen_names field. We can see that the field is presently blank in our author.

require "./setup"

author = AM.find(1)
author.pen_names
# => nil

We'd like to use the information in our story objects to update this field. So we write a method to do this, called update_pen_names. It takes an author argument. It loops through the stories associated with the author, and calls another helper method called update_author_pen_names.

This method, in turn, takes a story, gets the current comma-separated list of pen names, adds a new name to the list, and then updates the author's pen_names attribute.

def update_pen_names(author)
  author.stories.each do |story|
    update_author_pen_names(story)
  end
end

def update_author_pen_names(story)
  author = story.author
  names = author.pen_names.to_s.split(", ")
  names << story.author_name
  author.pen_names = names.join(", ")
end

Let's try these helpers out. We pull our author out of the database. Then we call update_pen_names on it. Finally, we store the updated author back to the database.

Then we verify that this process worked, by re-fetching the author from the database and checking its pen names.

require "./setup"
require "./helpers"

author = AM.find(1)
update_pen_names(author)
AM.store(author)
AM.find(1).pen_names
# => nil

But something has gone wrong! The pen_names field is still empty.

As a sanity check, we take a look at the value of the pen_names field after running the update, but before saving it back to the database. We get an even bigger shock: after running the update_pen_names method, the pen_name field is still nil!

require "./setup"
require "./helpers"

author = AM.find(1)
update_pen_names(author)
author.pen_names                # => nil
AM.store(author)

Did we mess something up? We throw a debugging line in the update_author_pen_names method just to check that we really are updating the field.

require "./setup"
require "./helpers"

def update_author_pen_names(story)
  author = story.author
  names = author.pen_names.to_s.split(", ")
  names << story.author_name
  author.pen_names = names.join(", ")
  p author.__get__
end

author = AM.find(1)
update_pen_names(author)
author.pen_names                # => nil
AM.store(author)
# >> #<struct Author name="Robert Heinlein", pen_names="Anson Macdonald", stories=#<TeenyMapper::LazyProxy:0x000000025de948 @fetcher=#<Proc:0x000000025de8f8@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:65>>, id=1>
# >> #<struct Author name="Robert Heinlein", pen_names="Lyle Monroe", stories=#<TeenyMapper::LazyProxy:0x000000025d3e08 @fetcher=#<Proc:0x000000025d3db8@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:65>>, id=1>
# >> #<struct Author name="Robert Heinlein", pen_names="Caleb Saunders", stories=#<TeenyMapper::LazyProxy:0x000000025d1298 @fetcher=#<Proc:0x000000025d1220@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:65>>, id=1>

This just gets weirder and weirder—at each execution of update_author_pen_names, the pen_names field contains just one name. And yet all the calls are done, the author once again has no pen names associated with it.

So what the heck is going on here? It's only once we start examining the object IDs that Ruby assigns to individual instances that we get our first clue. When we annotate our code with calls to output the object_id of author objects at different points in execution, we see that every time we update the author, the object ID is different. Then we are saving the author object we found in the first place - but the updates never happened to this object! They were all applied to those mysterious other author instances.

require "./setup"
require "./helpers"

def update_author_pen_names(story) # !> method redefined; discarding old update_author_pen_names
  author = story.author
  names = author.pen_names.to_s.split(", ")
  names << story.author_name
  author.pen_names = names.join(", ") # !> previous definition of update_author_pen_names was here
  puts "Updated #{author.__get__.object_id}"
end

author = AM.find(1)
puts "Found #{author.object_id}"
update_pen_names(author)
author.pen_names                # => nil # !> previous definition of update_author_pen_names was here
AM.store(author)
puts "Stored #{author.object_id}"
# >> Found 15848980
# >> Updated 15816260
# >> Updated 15814980
# >> Updated 15789280
# >> Stored 15848980

How did this happen? The secret is in how the Story objects find their author. When we take a look at the code that loads up a Story object from the database, we can see that in order to set up the author association, it uses a LazyProxy object. This object takes a block, which it will use to fetch the author association if any code actually tries to access it.

This is a common pattern in object-relational mappers. It serves a couple of purposes: for one thing, it functions as an optimization to keep from requesting data from the database unless and until it is actually needed. It also helps to eliminate infinite recursion caused by loading up models with circular associations to each other. You'll find code similar to this in ActiveRecord, where they are known as "association proxies".

def load(type, data)
  author_id = data.delete(:author_id)
  super.tap do |story|
    story.author = LazyProxy.new do
      AuthorMapper.new.find(author_id)
    end
  end
end

Anyway, the important part here isn't really the lazy proxy. It's the fact that in order to load up its author, a Story object will use the AuthorMapper to find and load up a new Author object. Emphasis on the new part. If we dive down into the code which actually loads new objects once the relevant database row is retrieved, we can see that it uses .new to make a new instance of the model class.

def load(type, data)
  object = type.new
  data.each_with_object(object) { |(key, value), o|
    o[key] = value
  }
end

Now that we have an idea of what is going on, we can demonstrate the problem in just a couple lines of code.

require "./setup"
require "./helpers"

author = AM.find(1)
author.object_id                              # => 22795980
author.stories.first.author.__get__.object_id # => 22747020

When we walk down to an author's stories, then pick the first story, then follow the association back up to the author, we get an object which looks in every way like the original author—except that it is a different instance, with its own copies of all of its attributes. Updating this second instance does us no good if we're only going to save the first instance.

The name for this phenomenon is aliasing, and it's one of the most insidious sources of defects in code that talks to a database through some kind of ORM layer. The author data represented in one row of the database has multiple aliases in the form of multiple instances of the Author class. We have to be careful to write the same instance that we update if we want to see our changes reflected in the database.

The example I'm showing you today is ultra-simplified in order to make it easier to follow. The aliasing-based defects I've seen in real world code have been far, far more subtle, and required tracking the traversal of associations through many more layers of code. And things can get really complicated if multiple aliases of the same record are written, for instance as a result of automatic saving of associations.

Aliasing can also lead to performance problems, since every alias of the same record may potentially represent an extra database query that could have been avoided.

Our most powerful weapon against aliasing is awareness: once we realize that it can happen, we can recognize bugs that are likely the result of aliasing. And we can start checking on object IDs to test our hypothesis.

We can also be careful to write code to avoid aliasing problems. For instance, we could rewrite our author-updating methods to pass the original author instance down:

def update_pen_names(author)
  author.stories.each do |story|
    update_author_pen_names(story)
  end
end

def update_author_pen_names(author, story)
  names = author.pen_names.to_s.split(", ")
  names << story.author_name
  author.pen_names = names.join(", ")
end

This way only once instance of the author model is being updated. When we run the updated method on an author object, this time we can see that the change is correctly persisted.

Some ORMs provide tools to minimize aliasing. In ActiveRecord, we can make use of the :inverse_of option to associations. For instance, our Author and Book classes might look something like this in ActiveRecord:

class Author < ActiveRecord::Base
  has_many :stories, inverse_of: :author
end

class Story < ActiveRecord::Base
  belongs_to :author, inverse_of: :stories
end

The :inverse_of options give ActiveRecord the extra information it needs to avoid repeatedly fetching and aliasing the same record when following associations. However, this isn't a complete solution; you can find out more about its limitations in the ActiveRecord documentation.

There's a more advanced strategy that we can use to eliminate aliasing once and for all. But we'll talk about that in the next episode. Happy hacking!

Responses