Aliasing
Video transcript & code
There's an old saying: "There's many a slip 'twixt the cup and the lip". In web application development, we might revise that to say there's many a slip 'twixt request and the database. In this episode I want to talk about one such "slip": a particularly tricky family of bugs that can interfere with successfully saving application data to a database.
Let's say we have a database with tables for authors and stories. An author "has many" stories, related to it via the story :author_id
column.
require "sequel"
DB = Sequel.sqlite
DB.create_table "Author" do
primary_key :id
String :name
String :pen_names
end
DB.create_table "Story" do
primary_key :id
String :title
String :author_name
Integer :author_id
end
We also have classes to represent both authors and stories.
Author = Struct.new(:name, :pen_names, :stories, :id)
Story = Struct.new(:title, :author_name, :author, :id)
These classes are intended to be domain models, and are ignorant of the database. For instance, the Story
class has no author_id
field; it just has an author
field which is intended to point to an Author
instance.
In order to map these objects to and from our database tables, we have an extremely basic set of Data Mapper
classes.
class TeenyMapper
class LazyProxy
def initialize(&fetcher)
@fetcher = fetcher
end
def method_missing(name, *args, &block)
__get__.public_send(name, *args, &block)
end
def __get__
@object ||= @fetcher.call
end
end
def store(object)
data = object.to_h
store_row(object.class, data)
object.id = data[:id]
end
def store_row(type, data)
table = DB[type.to_s.to_sym]
if data[:id]
table.where(id: data.delete(:id)).update(data)
else
data[:id] = table.insert(data)
end
end
def find(type, id)
table = DB[type.to_s.to_sym]
data = table[id: id]
load(type, data)
end
def find_all(type, query)
table = DB[type.to_s.to_sym]
table.where(query).map{|data|
load(type, data)
}
end
def load(type, data)
object = type.new
data.each_with_object(object) { |(key, value), o|
o[key] = value
}
end
end
class AuthorMapper < TeenyMapper
def find(id)
super(Author, id)
end
def store_row(type, data)
data.delete(:stories)
super(type, data)
end
# !> instance variable @columns not initialized
def load(type, data)
author = super
stories = LazyProxy.new do
StoryMapper.new.find_all_by_author_id(author.id)
end
author.stories = stories
author
end
end
class StoryMapper < TeenyMapper
def find(id)
super(Story, id)
end
def store_row(type, data)
author = data.delete(:author)
data[:author_id] = author.id
super(type, data)
end
def find_all_by_author_id(author_id)
find_all(Story, author_id: author_id)
end
def load(type, data)
author_id = data.delete(:author_id)
super.tap do |story|
story.author = LazyProxy.new do
AuthorMapper.new.find(author_id)
end
end
end
end
We'll skip over the details of these mapper classes for now. All we need to know is that we can instantiate AuthorMapper
and StoryMapper
objects, and use them to both load and store authors and stories.
Let's set up some records in the database. First we create and store an author. Then we store three stories written by this author. Each story was written under a pseudonym, so they each have a different value in their :author_name
field. But they still all link back to the one author object.
AM = AuthorMapper.new
SM = StoryMapper.new
AM.store(heinlein = Author.new("Robert Heinlein"))
SM.store(Story.new("Waldo", "Anson Macdonald", heinlein))
SM.store(Story.new("Lost Legion", "Lyle Monroe", heinlein))
SM.store(Story.new("Elsewhere", "Caleb Saunders", heinlein))
"none"body
Let's check that we've successfully written these objects to the database. We'll fetch the author first.
require "./setup"
author = AM.find(1)
author
# => #<struct Author
# name="Robert Heinlein",
# pen_names=nil,
# stories=
# #<TeenyMapper::LazyProxy:0x000000018d0d28
# @fetcher=
# #<Proc:0x000000018d0cd8@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:62>>,
# id=1>
Notice that the stories
field contains a LazyProxy
object. This is a proxy object provided by our Data Mapper layer which only loads associations when they are needed.
Let's check out the stories
association.
require "./setup"
author = AM.find(1)
author.stories.to_a
# => [#<struct Story
# title="Waldo",
# author_name="Anson Macdonald",
# author=
# #<TeenyMapper::LazyProxy:0x0000000234f380
# @fetcher=
# #<Proc:0x0000000234f768@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:88>>,
# id=1>,
# #<struct Story
# title="Lost Legion",
# author_name="Lyle Monroe",
# author=
# #<TeenyMapper::LazyProxy:0x00000002347ea0
# @fetcher=
# #<Proc:0x00000002347db0@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:88>>,
# id=2>,
# #<struct Story
# title="Elsewhere",
# author_name="Caleb Saunders",
# author=
# #<TeenyMapper::LazyProxy:0x00000002347ab8
# @fetcher=
# #<Proc:0x00000002347a18@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:88>>,
# id=3>]
We can see that all three stories were successfully saved and associated with the author table. We can also LazyProxy
at work again. This time, it's holding the place of the author
association in each story object.
Author objects have a pen_names
field. We can see that the field is presently blank in our author.
require "./setup"
author = AM.find(1)
author.pen_names
# => nil
We'd like to use the information in our story objects to update this field. So we write a method to do this, called update_pen_names
. It takes an author
argument. It loops through the stories associated with the author, and calls another helper method called update_author_pen_names
.
This method, in turn, takes a story, gets the current comma-separated list of pen names, adds a new name to the list, and then updates the author's pen_names
attribute.
def update_pen_names(author)
author.stories.each do |story|
update_author_pen_names(story)
end
end
def update_author_pen_names(story)
author = story.author
names = author.pen_names.to_s.split(", ")
names << story.author_name
author.pen_names = names.join(", ")
end
Let's try these helpers out. We pull our author out of the database. Then we call update_pen_names
on it. Finally, we store the updated author back to the database.
Then we verify that this process worked, by re-fetching the author from the database and checking its pen names.
require "./setup"
require "./helpers"
author = AM.find(1)
update_pen_names(author)
AM.store(author)
AM.find(1).pen_names
# => nil
But something has gone wrong! The pen_names
field is still empty.
As a sanity check, we take a look at the value of the pen_names
field after running the update, but before saving it back to the database. We get an even bigger shock: after running the update_pen_names
method, the pen_name
field is still nil
!
require "./setup"
require "./helpers"
author = AM.find(1)
update_pen_names(author)
author.pen_names # => nil
AM.store(author)
Did we mess something up? We throw a debugging line in the update_author_pen_names
method just to check that we really are updating the field.
require "./setup"
require "./helpers"
def update_author_pen_names(story)
author = story.author
names = author.pen_names.to_s.split(", ")
names << story.author_name
author.pen_names = names.join(", ")
p author.__get__
end
author = AM.find(1)
update_pen_names(author)
author.pen_names # => nil
AM.store(author)
# >> #<struct Author name="Robert Heinlein", pen_names="Anson Macdonald", stories=#<TeenyMapper::LazyProxy:0x000000025de948 @fetcher=#<Proc:0x000000025de8f8@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:65>>, id=1>
# >> #<struct Author name="Robert Heinlein", pen_names="Lyle Monroe", stories=#<TeenyMapper::LazyProxy:0x000000025d3e08 @fetcher=#<Proc:0x000000025d3db8@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:65>>, id=1>
# >> #<struct Author name="Robert Heinlein", pen_names="Caleb Saunders", stories=#<TeenyMapper::LazyProxy:0x000000025d1298 @fetcher=#<Proc:0x000000025d1220@/home/avdi/Dropbox/rubytapas/177-identity-map/data_mappers.rb:65>>, id=1>
This just gets weirder and weirder—at each execution of update_author_pen_names
, the pen_names
field contains just one name. And yet all the calls are done, the author once again has no pen names associated with it.
So what the heck is going on here? It's only once we start examining the object IDs that Ruby assigns to individual instances that we get our first clue. When we annotate our code with calls to output the object_id
of author
objects at different points in execution, we see that every time we update the author, the object ID is different. Then we are saving the author object we found in the first place - but the updates never happened to this object! They were all applied to those mysterious other author instances.
require "./setup"
require "./helpers"
def update_author_pen_names(story) # !> method redefined; discarding old update_author_pen_names
author = story.author
names = author.pen_names.to_s.split(", ")
names << story.author_name
author.pen_names = names.join(", ") # !> previous definition of update_author_pen_names was here
puts "Updated #{author.__get__.object_id}"
end
author = AM.find(1)
puts "Found #{author.object_id}"
update_pen_names(author)
author.pen_names # => nil # !> previous definition of update_author_pen_names was here
AM.store(author)
puts "Stored #{author.object_id}"
# >> Found 15848980
# >> Updated 15816260
# >> Updated 15814980
# >> Updated 15789280
# >> Stored 15848980
How did this happen? The secret is in how the Story
objects find their author
. When we take a look at the code that loads up a Story
object from the database, we can see that in order to set up the author
association, it uses a LazyProxy
object. This object takes a block, which it will use to fetch the author
association if any code actually tries to access it.
This is a common pattern in object-relational mappers. It serves a couple of purposes: for one thing, it functions as an optimization to keep from requesting data from the database unless and until it is actually needed. It also helps to eliminate infinite recursion caused by loading up models with circular associations to each other. You'll find code similar to this in ActiveRecord, where they are known as "association proxies".
def load(type, data)
author_id = data.delete(:author_id)
super.tap do |story|
story.author = LazyProxy.new do
AuthorMapper.new.find(author_id)
end
end
end
Anyway, the important part here isn't really the lazy proxy. It's the fact that in order to load up its author
, a Story
object will use the AuthorMapper
to find and load up a new Author
object. Emphasis on the new part. If we dive down into the code which actually loads new objects once the relevant database row is retrieved, we can see that it uses .new
to make a new instance of the model class.
def load(type, data)
object = type.new
data.each_with_object(object) { |(key, value), o|
o[key] = value
}
end
Now that we have an idea of what is going on, we can demonstrate the problem in just a couple lines of code.
require "./setup"
require "./helpers"
author = AM.find(1)
author.object_id # => 22795980
author.stories.first.author.__get__.object_id # => 22747020
When we walk down to an author's stories, then pick the first story, then follow the association back up to the author, we get an object which looks in every way like the original author—except that it is a different instance, with its own copies of all of its attributes. Updating this second instance does us no good if we're only going to save the first instance.
The name for this phenomenon is aliasing, and it's one of the most insidious sources of defects in code that talks to a database through some kind of ORM layer. The author data represented in one row of the database has multiple aliases in the form of multiple instances of the Author
class. We have to be careful to write the same instance that we update if we want to see our changes reflected in the database.
The example I'm showing you today is ultra-simplified in order to make it easier to follow. The aliasing-based defects I've seen in real world code have been far, far more subtle, and required tracking the traversal of associations through many more layers of code. And things can get really complicated if multiple aliases of the same record are written, for instance as a result of automatic saving of associations.
Aliasing can also lead to performance problems, since every alias of the same record may potentially represent an extra database query that could have been avoided.
Our most powerful weapon against aliasing is awareness: once we realize that it can happen, we can recognize bugs that are likely the result of aliasing. And we can start checking on object IDs to test our hypothesis.
We can also be careful to write code to avoid aliasing problems. For instance, we could rewrite our author-updating methods to pass the original author instance down:
def update_pen_names(author)
author.stories.each do |story|
update_author_pen_names(story)
end
end
def update_author_pen_names(author, story)
names = author.pen_names.to_s.split(", ")
names << story.author_name
author.pen_names = names.join(", ")
end
This way only once instance of the author model is being updated. When we run the updated method on an author object, this time we can see that the change is correctly persisted.
Some ORMs provide tools to minimize aliasing. In ActiveRecord, we can make use of the :inverse_of
option to associations. For instance, our Author
and Book
classes might look something like this in ActiveRecord:
class Author < ActiveRecord::Base
has_many :stories, inverse_of: :author
end
class Story < ActiveRecord::Base
belongs_to :author, inverse_of: :stories
end
The :inverse_of
options give ActiveRecord the extra information it needs to avoid repeatedly fetching and aliasing the same record when following associations. However, this isn't a complete solution; you can find out more about its limitations in the ActiveRecord documentation.
There's a more advanced strategy that we can use to eliminate aliasing once and for all. But we'll talk about that in the next episode. Happy hacking!
Responses