In Progress
Unit 1, Lesson 21
In Progress

Client Session Object

Video transcript & code

In episode 100 I showed you how I had built a screen-scraping gateway class to encapsulate the details of communicating with the service I use to manage RubyTapas subscriptions. I spent most of my time on the #content_post_list method. It starts out by instantiating an agent object from the Mechanize screen-scraping gem. Then it steps the agent through the process of logging in like a human user. Only then can it proceed to download the episode list and extract the data I'm interested in.

def content_post_list
  agent         = Mechanize.new
  login_page    = agent.get('https://getdpd.com/login')
  form          = login_page.form_with(action: '/login')
  form.username = @login
  form.password = @password
  home_page     = agent.submit(form)
  unless home_page.title =~ /^Dashboard/
    raise "DPD admin session login failed for user #{login}"
  end
  list_page = agent.get('https://getdpd.com/plan')
  content_post_table = list_page.search('table').detect { |t|
    headings = t.search('th').map(&:text)
    headings == ['Name', 'Release Date']
  }
  content_post_rows  = content_post_table.search('tbody tr')
  content_post_rows[0..-2].reverse_each.map { |row|
    columns      = row.search('td')
    title        = columns[0].text.strip
    published_at = columns[1].text.strip
    show_path    = columns[0].at('a')['href']
    show_url     = URI.join('https://getdpd.com', show_path)
    id           = show_url.query[/post_id=(\d+)/, 1]
    {
      title:        title,
      published_at: Time.parse(published_at),
      show_url:     show_url.to_s,
      id:           id.to_i
    }
  }
end

This method is really long. And almost half of it is taken up with login logic that is only tangentially related to getting a list of content posts. Nor is it obvious to the casual reader where the login processing stops, and the reading of the target data begins.

Since writing this method, I've also added another method called #find_content_post_by_id. This method takes a content post ID, looks up the ID in the list of posts returned by #get_content_post_list, and finds the URL for a post "edit" page that includes various details for that post—details which aren't included in the main post list. Then it proceeds to fetch that page, and extract those details and return them.

def find_content_post_by_id(id)
  content_post_summary = content_post_list.detect { |ep| ep[:id] == id }
  return nil unless content_post_summary
  show_url                 = content_post_summary[:show_url]
  edit_url                 = show_url.sub(%r(/showpost/), '/editpost/')

  # ??? Get the contents of the edit_page somehow ???

  post_form                = edit_page.form_with(id: 'postform')
  published_at             = post_form.published_at.strip
  publish_time_parts       = DateTime._strptime(published_at,
                                                '%m/%d/%Y %k:%M')
  publish_time             = Time.new(
      *publish_time_parts.values_at(:year, :mon, :mday, :hour, :min))
  send_publish_email_value = edit_page.at('#send_publish_email')[:checked]
  {
      title:        post_form.title,
      content:      post_form.content,
      synopsis:     post_form.synopsis,
      publish_time: publish_time,
      send_email:   send_publish_email_value == 'checked'
  }
end

As you can see I've left out the part of this method that actually fetches the content of the post edit page. I have a few options for filling this part in.

First, I could just copy and paste the lines for logging in and requesting a page using a Mechanize agent from the #content_post_list method. But I probably don't need to tell you that using copy and paste is a bad idea. Down that road lies unnecessary duplication and a maintenance nightmare.

It's tempting to simply extract the login-and-fetch logic to its own method, and call that method here. But the more I think about it, the more I don't like this idea. Fetching the login page, filling in the form, and submitting it is a very slow process. Normally I avoid premature optimization, but in this case I know that logging in can easily take several seconds. Do I really want to add all that overhead to every request for content post details?

And it really shouldn't be necessary to log in multiple times in a single session. After all, I don't have to log in over and over when I open the DPD site in my browser. Once I've logged in once, I have a cookie in my browser that lets the DPD site know that I have an authenticated session. Why should it be any different for this gateway class? And in fact, maintaining client state is precisely why Mechanize has an "agent" class.

I also think about the fact that logging in is a separate concern from looking up info about content posts. I would need to use the same login logic for any other admin operations, such as listing RubyTapas subscribers.

It feels like I've identified a distinct responsibility—logging in as a DPD admin—which has its own associated state. When I put it this way, it starts to sound like this belongs in a separate object.

So I create a new class, called AdminSession, inside the DPD module. The initializer saves a login and password, and initializes a Mechanize agent. It also configures logging for the agent.

The #agent method first checks to make sure the session has been established, and then returns the agent object. This way the actual logging in is deferred until it is first needed.

The #established? predicate method simply checks a status variable.

The core of this class is the #establish method, which performs the simulated user login which I had originally done inside the #get_content_post_list method. Once it has succeeded, it sets the @status variable to indicate as much.

module DPD
  class AdminSession
    def initialize(login, password, options={})
      @login        = login
      @password     = password
      @logger       = options.fetch(:logger) { Logger.new($stderr) }
      @agent        = Mechanize.new
      @agent.log    = @logger
    end

    def agent
      establish unless established?
      @agent
    end

    def established?
      @status == :established
    end

    private

    def establish(login=@login, password=@password)
      login_page    = @agent.get('https://getdpd.com/login')
      form          = login_page.form_with(action: '/login')
      form.username = login
      form.password = password
      home_page     = @agent.submit(form)
      unless home_page.title =~ /^Dashboard/
        raise "DPD admin session login failed for user #{login}"
      end
      @status = :established
    end

  end
end

Now that I've represented a DPD admin session as its own object, I can retrofit the ContentPostGateway to collaborate with a session object.

Instead of initializing it with a username and password, I'll be passing a session object in. I add an #agent method, which simply delegates to the session's #agent method. All of the login logic in #get_content_post_list goes away. Where the agent variable had once been a local variable, now it is a reference to the method I just defined. And in #find_content_post_by_id, I also use the agent to fetch the needed post edit page.

module DPD
  class ContentPostGateway
    def initialize(session, options={})
      @session = session
      @logger  = options.fetch(:logger) { Logger.new($stderr) }
    end

    def find_content_post_by_id(id)
      content_post_summary = content_post_list.detect { |ep| ep[:id] == id }
      return nil unless content_post_summary
      show_url                 = content_post_summary[:show_url]
      edit_url                 = show_url.sub(%r(/showpost/), '/editpost/')
      edit_page                = agent.get(edit_url)
      # ...
    end

    def get_content_post_list
      list_page          = agent.get('https://getdpd.com/plan')
      # ...
    end

    private

    # ...

    def agent
      @session.agent
    end
  end
end

My responsibilities are now better separated, and I've ensured that a given ContentPostGateway will only ever log in once—which should save my program considerable time. I've laid the groundwork for re-using the session object among multiple gateway objects. And if, in the future, I decide to persist DPD session cookies across requests to the rubytapas.com app, I've ensured that I'll only have to make that change in one place.

Notice that once this class has initialized an agent, it doesn't try to mediate calls to that agent. This class is not intended to be yet another abstraction layer for communicating with the DPD site—we already have the ContentPostGateway for that. I've limited this class's responsibilities to setting up a logged-in session. Once it has taken care of that, it exposes its agent to the ContentPostGateway to be used directly.

As I promised the last time we talked about this subject, the next time we return to the RubyTapas.com code I'll dig into how I mapped from the DPD business domain to the RubyTapas business domain. Until then, happy hacking!

Responses