In Progress
Unit 1, Lesson 1
In Progress

Ghost Load

Video transcript & code

Back in episode 164, I showed you how I used the Mapper pattern to separate data scraped from the DPD website and my Episode domain model. I have a ContentPostGateway which is only concerned with getting data from DPD; I have an Episode class which is only concerned with representing one episode of this show; and I have an EpisodeMapper which is only worried about mapping the one to the other.

But things never stay this simple. When I fetch a list of episodes, in the background the data comes from a gateway method called #content_post_list which gets a summary of all of the episodes. There's a lot of data which is missing from this request, though. For instance, I don't have each episode's full description. Right now, the episodes that are returned by the mapper are incomplete.

module DPD
  module ContentPostGateway
    # ...
    def content_post_list
      # fetch summary of all posts...
    end
    # ...
  end
end
require "./episode"
require "./episode_mapper"

mapper = RubyTapas::EpisodeMapper.new
ep = mapper.all.last
ep
# => #<RubyTapas::Episode:0x00000001a51198
#     @id=456,
#     @name="Mapper",
#     @number=164>
ep.description                  # => nil

There's another method on the gateway, #find_content_post_by_id, which returns a complete set of data for a given episode. But the price is an extra HTTP request for each episode retrieved. When building a complete list of episodes, I can't afford to use this method to get complete episode data for every single one.

module DPD
  module ContentPostGateway
    # ...
    def find_content_post_by_id(id)
      # fetch complete info for post...
    end
    # ...
  end
end

Which means I'm sometimes going to be working with incomplete Episode objects. Ideally, what I'd like to happen is for episodes to start out incomplete, but magically "fill themselves in" if I ask for an episode attribute which hasn't yet been loaded. But remember, these Episode objects have no knowledge of the EpisodeMapper, let alone of the ContentPostGateway!

The solution I turn to is a family of patterns collectively known as "Lazy Load". Martin Fowler lists four different types of lazy loading in his book Patterns of Enterprise Application Architecture. I'm going to present just one of those patterns today: the "ghost object" style of lazy loading. Unlike other lazy loading patterns which use some kind of proxy object, "ghost" objects are real domain model objects in a partial state.

To my Episode model, I add a load_state attribute, and a :data_source attribute. I make the :load_status default to a state named :ghost.

I pick one of the attributes which may need to be lazily loaded–the description attribute. I override the getter method with one that does two things: first, it calls a method called #load. Then it returns the value of the @description instance variable.

Next I define the #load method. It returns early if the object is in a :loaded state already. Otherwise, it sends the #load message to the :data_source, with self as the argument.

module RubyTapas
  class Episode

    attr_accessor :video,
                  :id,
                  :number,
                  :name,
                  :description,
                  :synopsis,
                  :video_url,
                  :publish_time,
                  :load_state,
                  :data_source


    def initialize(attributes={})
      attributes.each do |key, value|
        public_send("#{key}=", value)
      end
      @load_state = :ghost
    end

    def to_s
      inspect
    end

    def ==(other)
      other.is_a?(Episode) && other.id == id
    end

    def published?(time_now=Time.now)
      publish_time <= time_now
    end

    def load
      return if load_state == :loaded
      data_source.load(self)
    end

    def description
      load
      @description
    end
  end

end

Those are all the changes I make to the model class for now. Next up, I need to update the mapper.

In order to keep the focus on lazy loading, the mapper you see here is an ultra-simplified fake which just returns fixed values.

module RubyTapas
  class EpisodeMapper
    def all
      [
        Episode.new(
          id:     123,
          name:   "YAML::Store",
          number: 163),
        Episode.new(
          id:     456,
          name:   "Mapper",
          number: 164),
      ]
    end
  end

The first thing I do is change the method which returns a summary list of all episodes. In it, I set the data_source attribute on each returned Episode. In effect, this is the mapper's way of saying "if you ever need more data, here's my number".

Next I add the #load method. First, it extracts the id attribute from the Episode. If this were a real implementation, it would then use the ContentPostGateway to fetch data using that ID and then transform it into domain terms. For this example I'm going to pretend I've already taken care of that, and instead move straight on to filling in the model.

Before I do that, though, I set the load_state attribute to :loading. This isn't strictly necessary right now. But it's part of the pattern, and it may become important down the road. That's because once we start loading networks of associated objects, we may need to flag objects that are already being loaded in order to avoid infinite recursion.

Next I load up the episode object with detailed field data. Again, I'm just using hardcoded values here to keep this demonstration simple.

When I'm done fully loading the episode, I set the load_state to :loaded, and return.

Notice here that it is the mapper which is responsible for telling an object when it is fully loaded - it is not the object's responsibility.

module RubyTapas
  class EpisodeMapper
    def all
      [
        Episode.new(
          id:     123,
          name:   "YAML::Store",
          number: 163,
          data_source: self),
        Episode.new(
          id:     456,
          name:   "Mapper",
          number: 164,
          data_source: self)
      ]
    end

    def load(episode)
      id = episode.id
      # ...retrieve episode data based on ID...
      episode.load_state = :loading
      episode.description  = "Today we explore a pattern for bridging "\
                             "the gap between different domain models."
      episode.synopsis     = "Bridging the gap between domain models"
      episode.publish_time =   Time.new(2013, 12, 30, 9, 11)
      episode.load_state = :loaded
    end
  end
end

Now, when I grab an episode object from the list that the mapper returns, I can see that it is in the :ghost state. Then I can access the description attribute, and get text back. When I check the load state, the object is now :loaded. In the background, fetching the description attribute triggered the object to go from a ghost to a fully-loaded model.

require "./ghost_episode"
require "./episode_mapper"

mapper = RubyTapas::EpisodeMapper.new
ep = mapper.all.last
ep.load_state                   # => :ghost
ep.description
# => "Today we explore a pattern for bridging the gap between different domain models."
ep.load_state                   # => :loaded

So far, I've only implemented ghost-loading for the description attribute. But it would be a simple matter to extend that to other fields, like the synopsis and the publish_time.

When I started out, my Episode objects were totally isolated from the messy world of loading and transforming data from external sources. With this new design, I've compromised a little bit. But what I like about this pattern is that it's a compromise with very clear limits. It's not a slippery slope.

Episode objects are now aware that they might start out without all of their data. But they still have no idea how to retrieve or process that data. All they know is that there is some object out there to which they can say: "load me up, please!". I could add a dozen more lazy-loaded attributes to the Episode class, and it still wouldn't need to know anything more than this about how to fill those attributes in. The responsibility of mapping from the ContentPostGateway to Episode data is still firmly in the EpisodeMapper's court.

As it stands now, this code only lazy-loads triggered on a single attribute of a single model class. In a future episode I'll talk about generalizing this approach to many attributes and arbitrary model classes. But this is enough for now. Happy hacking!

Responses