In Progress
Unit 1, Lesson 21
In Progress

Mapper

Video transcript & code

In this episode I'm returning to the code that powers the RubyTapas.com public website. As you might recall from back in episode 100, I had created a ContentPostGateway class. This class serves to hide the details of screen-scraping DPD, which is the third-party app that powers the RubyTapas subscriber side.

I can use this class to get a list of "content posts". Each "content post" corresponds to a RubyTapas episode. The result is a list of simple hashes with attributes for a single content post. This data structure distills, in a very simple way, data gleaned from the original HTML admin pages. However, while this form is simplified, it still reflects the DPD data model. The method is called #content_post_list, not #episode_list. There is an id and a URL that are DPD-specific. And the episode number is still embedded in the title. This class doesn't know anything about RubyTapas episode numbering.

require "~/dev/rubytapas.com/environment"
require "ruby_tapas"
RubyTapas.configure do |config|
  config.log_file = "/dev/null"
end
gw = DPD::ContentPostGateway.new(
  RubyTapas.dpd_admin_session, 
  logger: RubyTapas.logger)
gw.content_post_list.first
# => {:title=>"001 Binary Literals",
#     :publish_time=>2012-09-24 09:00:00 -0400,
#     :show_url=>"https://getdpd.com/plan/showpost/10?post_id=18",
#     :id=>18}

As I'm coding up the RubyTapas website, I want to work with abstractions that accurately represent my concept of an episode. Toward that end, I introduce a new class to model this concept. I use Struct to quickly give it a number of attributes, including the episode number, the name, the full description, and so on.

I add only one custom method to this model. I override the default Struct-provided constructor to accept a hash of attributes.

Episode = Struct.new(
  :id,
  :number,
  :name,
  :description,
  :video_url,
  :publish_time,
  :free) do

  def initialize(attributes={})
    attributes.each do |key, value|
      self[key] = value
    end
  end
end

This is so that I can initialize Episode objects using key-value properties.

require "./episode"

e = Episode.new(number: 164, name: "Mapper")
e
# => #<struct RubyTapas::Episode
#     id=nil,
#     number=164,
#     name="Mapper",
#     description=nil,
#     video_url=nil,
#     publish_time=nil,
#     free=nil>

At this point I have a class responsible for dealing with content post data in a DPD-centric way. And I have a class responsible for representing RubyTapas episodes. Now I need to somehow translate from one to the other.

I could make the Episode class responsible for knowing how to fill itself in using DPD data. But that sounds like cramming two distinct responsibilities into one class. That could lead to a lot of complication down the road.

Instead, I turn to the Mapper pattern. "Mapper", as defined in the book "Patterns of Enterprise Application Architecture", is term for a broad family of objects whose role is to set up communications between two subsystems which need to stay ignorant of each other. In my case, I want my domain model class to remain ignorant of where its data comes from.

My EpisodeMapper mapper will start out relatively simple. First of all, it will receive a gateway object on initialization, to use as the source of its data. It will provide an #all method, which should return a list of all episodes. For this method I choose a pattern that I sometimes like to use where the method delegates to another enumerator-generating method, and then eagerly evaluates the enumerator by sending it the #to_a message. I'll talk more about why I opt for this pattern in a minute.

The heart of this class comes next. The #each_episode method starts out with a line that we first saw in episode #64. This line makes it possible to use this method either with or without a block. When no block is given, it will return an Enumerator. The latter is the form that the #all method makes use of.

Next it initializes a variable called last_number. This variable exists to help assign episode numbers to any episodes that are missing numbers in their titles.

The method then asks the gateway for its list of content posts. For each one, it does some munging on the post title to extract a separate episode number and name. Then it fills in a brand new Episode object with the extracted information. Finally, it yields the new Episode.

class EpisodeMapper
  attr_reader :gateway

  def initialize(gateway)
    @gateway = gateway
  end

  def all
    each_episode.to_a
  end

  def each_episode
    return to_enum(__callee__) unless block_given?
    last_number = 0
    gateway.content_post_list.each do |ep_summary|
      title_parts = /^(\d{3})\W+(.*)/.match(ep_summary[:title])
      if title_parts
        name   = title_parts[2]
        number = title_parts[1].to_i
      else
        name   = ep_summary[:title]
        number = last_number + 1
      end
      last_number = number
      episode = Episode.new(
        id:     ep_summary[:id],
        number: number,
        name:   name,
        publish_time: ep_summary[:publish_time])
      yield(episode)
    end
  end
end

I use this mapper by plugging in a ContentPostGateway. When I then ask it for a list of all episodes, that's exactly what I get.

require "~/dev/rubytapas.com/environment"
require "ruby_tapas"
RubyTapas.configure do |config|
  config.log_file = "/dev/null"
end
gw = DPD::ContentPostGateway.new(
  RubyTapas.dpd_admin_session, 
  logger: RubyTapas.logger)

require "./episode"
require "./episode_mapper"
mapper = EpisodeMapper.new(gw)
mapper.all.take(10)
# => [#<struct Episode
#      id=18,
#      number=1,
#      name="Binary Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-09-24 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=20,
#      number=2,
#      name="Large Integer Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-09-26 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=21,
#      number=3,
#      name="Character Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-09-28 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=26,
#      number=4,
#      name="Barewords",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-01 09:00:00 -0400,
#      free=nil>, # !> loading in progress, circular require considered harmful - /home/avdi/dev/rubytapas.com/bundle/ruby/2.0.0/gems/wistia-api-0.2.3/lib/wistia/config.rb
#     #<struct Episode
#      id=27,
#      number=5,
#      name="Array Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-03 09:00:00 -0400,
#      free=nil>, # !> assigned but unused variable - session
#     #<struct Episode
#      id=31,
#      number=6,
#      name="Forwardable",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-05 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=32,
#      number=7,
#      name="Constructors",
#      description=nil, # !> assigned but unused variable - ex
#      video_url=nil,
#      publish_time=2012-10-08 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=38,
#      number=8,
#      name="fetch as an Assertion",
#      description=nil, # !> method redefined; discarding old logger=
#      video_url=nil,
#      publish_time=2012-10-10 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=39,
#      number=9,
#      name="Symbol Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-12 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=43,
#      number=10, # !> private attribute?
#      name="Finding $HOME",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-15 09:00:00 -0400,
#      free=nil>]

There are over 150 RubyTapas episodes at present, but here I just wanted to look at the first 10, so I added #take(10). Fetching a limited subset of records is a pretty common scenario; for instance, consider the common case of paginated listings in a web app. However, in the background the code I wrote here still processed all 150-some-odd episodes before the #take method skimmed off the first 10.

As the code works right now, that's not too big a deal. The gateway.content_post_list message causes just one HTTP request to be made, which fetches the entire content post list at once . But as you may have noticed, it's not presently filling in all the attributes of an Episode. Filling in attributes like the detailed description is going to require making another HTTP request for every single episode. Multiply that by 150, and suddenly that call to EpisodeMapper#all could take a very, very long time.

This is the explanation for why I prefer to base the #all method on another method that generates an Enumerator. As we've seen in episodes #59 and #60, an enumerated method will only process for as many iterations as are actually needed. So if I change our invocation to use .each_episode instead of .all, the output remains the same, but under the covers the #each_episode method only iterates 10 times instead of over a hundred. This could become an important capability down the road, and I didn't have to add much code to make it possible.

require "~/dev/rubytapas.com/environment"
require "ruby_tapas"
RubyTapas.configure do |config|
  config.log_file = "/dev/null"
end
gw = DPD::ContentPostGateway.new(
  RubyTapas.dpd_admin_session, 
  logger: RubyTapas.logger)

require "./episode"
require "./episode_mapper"
mapper = EpisodeMapper.new(gw)
mapper.each_episode.take(10)
# => [#<struct Episode
#      id=18,
#      number=1,
#      name="Binary Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-09-24 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=20,
#      number=2,
#      name="Large Integer Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-09-26 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=21,
#      number=3,
#      name="Character Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-09-28 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=26,
#      number=4,
#      name="Barewords",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-01 09:00:00 -0400,
#      free=nil>, # !> loading in progress, circular require considered harmful - /home/avdi/dev/rubytapas.com/bundle/ruby/2.0.0/gems/wistia-api-0.2.3/lib/wistia/config.rb
#     #<struct Episode
#      id=27,
#      number=5,
#      name="Array Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-03 09:00:00 -0400,
#      free=nil>, # !> assigned but unused variable - session
#     #<struct Episode
#      id=31,
#      number=6,
#      name="Forwardable",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-05 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=32,
#      number=7,
#      name="Constructors",
#      description=nil, # !> assigned but unused variable - ex
#      video_url=nil,
#      publish_time=2012-10-08 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=38,
#      number=8,
#      name="fetch as an Assertion",
#      description=nil, # !> method redefined; discarding old logger=
#      video_url=nil,
#      publish_time=2012-10-10 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=39,
#      number=9,
#      name="Symbol Literals",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-12 09:00:00 -0400,
#      free=nil>,
#     #<struct Episode
#      id=43,
#      number=10, # !> private attribute?
#      name="Finding $HOME",
#      description=nil,
#      video_url=nil,
#      publish_time=2012-10-15 09:00:00 -0400,
#      free=nil>]

I now have three classes: a DPD content post gateway, an Episode representation, and a mapper that knows how to map from one to the other. The mapper doesn't do much yet. In the future I'll add more functionality, such as the ability to fetch individual episodes by ID or number. But this separation of concerns feels like a healthy foundation to build on as I flesh this system out.

And that's it for today. Happy hacking!

Responses