In Progress
Unit 1, Lesson 21
In Progress

Caching Gateway

Video transcript & code

On off for several months I've shown you some of my behind-the-scenes work on the rubytapas.com website. When we last left off, I'd constructed a screen-scraping Gateway to the DPD website, and EpisodeMapper class responsible for mapping from DPD data to Episode objects.

Let's skip forward a bit. Since writing that code, I've expanded the gateway and mapper classes considerably. The Gateway is still capable of fetching a list of content posts, as you can see here. But it can also find a specific content post by its DPD-assigned ID and return extended information about it. In both cases, I've arranged it so that the data returned is in the form of hashes, strings, integers, and other "basic" data types.

require "~/dev/rubytapas.com/environment"
require "ruby_tapas"

gateway = DPD::ContentPostGateway.new(
  RubyTapas.dpd_admin_session,
  logger: RubyTapas.logger)

gateway.content_post_list.take(3)
# => [{:title=>"001 Binary Literals",
#      :publish_time=>2012-09-24 09:00:00 -0400,
#      :show_url=>"https://getdpd.com/plan/showpost/10?post_id=18",
#      :id=>18},
#     {:title=>"002 Large Integer Literals",
#      :publish_time=>2012-09-26 09:00:00 -0400,
#      :show_url=>"https://getdpd.com/plan/showpost/10?post_id=20",
#      :id=>20},
#     {:title=>"003 Character Literals",
#      :publish_time=>2012-09-28 09:00:00 -0400,
#      :show_url=>"https://getdpd.com/plan/showpost/10?post_id=21",
#      :id=>21}]

gateway.find_content_post_by_id(18)
# => {:title=>"001 Binary Literals",
#     :content=>
#      "<p>In this inaugural episode, a look at a handy syntax for writing out binary numbers.</p>",
#     :synopsis=>"",
#     :publish_time=>2012-09-24 09:00:00 -0400,
#     :send_email=>false}

Now, screen-scraping is a time-consuming process. The code has to log-in to the site, navigate to the appropriate page or pages, download lots of HTML, parse the HTML, and then locate the bits of information it actually cares about. And this is data that doesn't change very often. I add a new episode to the list twice a week. Other than that and the occasional correction to an older episode, the lists of content posts remains static.

There is clearly an opportunity to cache data here. To make this happen, I create a new class, CachedContentPostGateway. It is initialized with a gateway object and a cache object. I implement the content_post_list method, matching the one in the original ContentPostGateway class. The content of this method is short and straightforward: it constructs a cache key. Then it checks to see if the key is present in the cache, using #fetch. If so, #fetch will return the cached value. If not, it delegates to the "real" #content_post_list method, and caches the result before returning it. If you watched episode 66 on caching an API, you probably recognize this pattern.

I also write a cached version of #find_content_post_by_id. The only difference in this method is that it uses the passed ID as part of the cache key.

module DPD
  class CachedContentPostGateway
    def initialize(gateway, cache)
      @gateway = gateway
      @cache   = cache
    end

    def content_post_list
      cache_key = "ContentPostGateway:content_post_list"
      @cache.fetch(cache_key) do
        @cache[cache_key] = @gateway.content_post_list
      end
    end

    def find_content_post_by_id(id)
      cache_key = "ContentPostGateway:find_content_post_by_id:#{id}"
      @cache.fetch(cache_key) do
        @cache[cache_key] = @gateway.find_content_post_by_id(id)
      end
    end
  end
end

What remains is to glue this together with the existing EpisodeMapper and ContentPostGateway. I update the method that makes a content post gateway available to the rest of the system to wrap the returned object in a CachedContentPostGateway.

def self.content_post_gateway
  scope[:content_post_gateway] ||= DPD::CachedContentPostGateway.new(
    DPD::ContentPostGateway.new(dpd_admin_session, logger: logger),
    cache)
end

The method that provisions an EpisodeMapper uses this method, so any new episode mappers will be equipped with caching gateways.

def self.episode_map
  scope[:episode_mapper] ||= EpisodeMapper.new(content_post_gateway, id_map: id_map)
end

The cache method that's used when instantiating the CachedContentPostGateway is defined here. As you can see, it uses Moneta to create a cache object backed by Memcache.

def self.cache
  scope[:cache] ||= Moneta.new(
    :MemcachedDalli,
    server: ENV.fetch('MEMCACHE_SERVERS'){"localhost"},
    expires: default_cache_expire_seconds,
    logger: {out: log_io})
end

And that's it: from now on, requests to the ContentPostGateway will be cached in Memcache.

Now that I'm caching the results of these gateway methods, the decision to return only simple data structures from the methods really starts to shine. There won't be any difficulty serializing these data structures to any caching backend I might choose. I don't have to worry about circular references, references to un-serializable objects, or object versioning.

I started out with an EpisodeMapper that depended on a ContentPostGateway. Then I inserted a caching layer between the mapper and the gateway. Let's talk about something I didn't do: I didn't have to change either EpisodeMapper or ContentPostGateway at all. This is the ideal of object-oriented design: composable parts that enable us to add functionality by adding new classes or methods, without changing existing code.

Let's dive a little deeper into the design decisions that enabled this composability. There are four that come to mind:

First, there's the choice to carefully delineate roles: one class is strictly concerned with talking to the DPD site, and another only cares about mapping DPD data to RubyTapas domain concepts.

Second, the decision to return data in a simplified format from the gateway, rather than invent "smart" classes to represent a foreign site's domain model.

Third, the choice to strictly constrain the interface of the ContentPostGateway class. It doesn't try to be a universal interface to all things DPD; instead, it offers all of two public methods, each returning simple, regularized data. These are the bare minimum needed to get the information the EpisodeMapper depends on.

Fourth, EpisodeMapper is not a collection of singleton methods for finding episodes, as you might see in some data-access frameworks. Instead, I have to create instances of it like any other object. This makes it straightforward to pass in different versions of the content post gateway it relies on when setting up mapper instances that the rest of the app will use.

As a result of these design decisions, building and integrating a caching wrapper for the ContentPostGateway was straightforward.

I wasn't on my own in making these choices. In creating each of these classes, I consulted the book Patterns of Enterprise Application Architecture, and chose patterns that fit the problem at hand. And this is the true beauty of design patterns and pattern languages: at their best, they coalesce the hard-won wisdom of many programmers into an easy-to-follow set of steps. Sometimes it isn't until we start to combine or extend a pattern-based architecture that we realize just how carefully chosen the small decisions that make up each pattern really are.

Happy hacking!

Responses