Client Session Object
Video transcript & code
In episode 100 I showed you how I had built a screen-scraping gateway class to encapsulate the details of communicating with the service I use to manage RubyTapas subscriptions. I spent most of my time on the #content_post_list
method. It starts out by instantiating an agent object from the Mechanize screen-scraping gem. Then it steps the agent through the process of logging in like a human user. Only then can it proceed to download the episode list and extract the data I'm interested in.
def content_post_list
agent = Mechanize.new
login_page = agent.get('https://getdpd.com/login')
form = login_page.form_with(action: '/login')
form.username = @login
form.password = @password
home_page = agent.submit(form)
unless home_page.title =~ /^Dashboard/
raise "DPD admin session login failed for user #{login}"
end
list_page = agent.get('https://getdpd.com/plan')
content_post_table = list_page.search('table').detect { |t|
headings = t.search('th').map(&:text)
headings == ['Name', 'Release Date']
}
content_post_rows = content_post_table.search('tbody tr')
content_post_rows[0..-2].reverse_each.map { |row|
columns = row.search('td')
title = columns[0].text.strip
published_at = columns[1].text.strip
show_path = columns[0].at('a')['href']
show_url = URI.join('https://getdpd.com', show_path)
id = show_url.query[/post_id=(\d+)/, 1]
{
title: title,
published_at: Time.parse(published_at),
show_url: show_url.to_s,
id: id.to_i
}
}
end
This method is really long. And almost half of it is taken up with login logic that is only tangentially related to getting a list of content posts. Nor is it obvious to the casual reader where the login processing stops, and the reading of the target data begins.
Since writing this method, I've also added another method called #find_content_post_by_id
. This method takes a content post ID, looks up the ID in the list of posts returned by #get_content_post_list
, and finds the URL for a post "edit" page that includes various details for that post—details which aren't included in the main post list. Then it proceeds to fetch that page, and extract those details and return them.
def find_content_post_by_id(id)
content_post_summary = content_post_list.detect { |ep| ep[:id] == id }
return nil unless content_post_summary
show_url = content_post_summary[:show_url]
edit_url = show_url.sub(%r(/showpost/), '/editpost/')
# ??? Get the contents of the edit_page somehow ???
post_form = edit_page.form_with(id: 'postform')
published_at = post_form.published_at.strip
publish_time_parts = DateTime._strptime(published_at,
'%m/%d/%Y %k:%M')
publish_time = Time.new(
*publish_time_parts.values_at(:year, :mon, :mday, :hour, :min))
send_publish_email_value = edit_page.at('#send_publish_email')[:checked]
{
title: post_form.title,
content: post_form.content,
synopsis: post_form.synopsis,
publish_time: publish_time,
send_email: send_publish_email_value == 'checked'
}
end
As you can see I've left out the part of this method that actually fetches the content of the post edit page. I have a few options for filling this part in.
First, I could just copy and paste the lines for logging in and requesting a page using a Mechanize agent from the #content_post_list
method. But I probably don't need to tell you that using copy and paste is a bad idea. Down that road lies unnecessary duplication and a maintenance nightmare.
It's tempting to simply extract the login-and-fetch logic to its own method, and call that method here. But the more I think about it, the more I don't like this idea. Fetching the login page, filling in the form, and submitting it is a very slow process. Normally I avoid premature optimization, but in this case I know that logging in can easily take several seconds. Do I really want to add all that overhead to every request for content post details?
And it really shouldn't be necessary to log in multiple times in a single session. After all, I don't have to log in over and over when I open the DPD site in my browser. Once I've logged in once, I have a cookie in my browser that lets the DPD site know that I have an authenticated session. Why should it be any different for this gateway class? And in fact, maintaining client state is precisely why Mechanize has an "agent" class.
I also think about the fact that logging in is a separate concern from looking up info about content posts. I would need to use the same login logic for any other admin operations, such as listing RubyTapas subscribers.
It feels like I've identified a distinct responsibility—logging in as a DPD admin—which has its own associated state. When I put it this way, it starts to sound like this belongs in a separate object.
So I create a new class, called AdminSession
, inside the DPD module. The initializer saves a login and password, and initializes a Mechanize agent. It also configures logging for the agent.
The #agent
method first checks to make sure the session has been established, and then returns the agent object. This way the actual logging in is deferred until it is first needed.
The #established?
predicate method simply checks a status variable.
The core of this class is the #establish
method, which performs the simulated user login which I had originally done inside the #get_content_post_list
method. Once it has succeeded, it sets the @status
variable to indicate as much.
module DPD
class AdminSession
def initialize(login, password, options={})
@login = login
@password = password
@logger = options.fetch(:logger) { Logger.new($stderr) }
@agent = Mechanize.new
@agent.log = @logger
end
def agent
establish unless established?
@agent
end
def established?
@status == :established
end
private
def establish(login=@login, password=@password)
login_page = @agent.get('https://getdpd.com/login')
form = login_page.form_with(action: '/login')
form.username = login
form.password = password
home_page = @agent.submit(form)
unless home_page.title =~ /^Dashboard/
raise "DPD admin session login failed for user #{login}"
end
@status = :established
end
end
end
Now that I've represented a DPD admin session as its own object, I can retrofit the ContentPostGateway
to collaborate with a session object.
Instead of initializing it with a username and password, I'll be passing a session object in. I add an #agent
method, which simply delegates to the session's #agent
method. All of the login logic in #get_content_post_list
goes away. Where the agent
variable had once been a local variable, now it is a reference to the method I just defined. And in #find_content_post_by_id
, I also use the agent
to fetch the needed post edit page.
module DPD
class ContentPostGateway
def initialize(session, options={})
@session = session
@logger = options.fetch(:logger) { Logger.new($stderr) }
end
def find_content_post_by_id(id)
content_post_summary = content_post_list.detect { |ep| ep[:id] == id }
return nil unless content_post_summary
show_url = content_post_summary[:show_url]
edit_url = show_url.sub(%r(/showpost/), '/editpost/')
edit_page = agent.get(edit_url)
# ...
end
def get_content_post_list
list_page = agent.get('https://getdpd.com/plan')
# ...
end
private
# ...
def agent
@session.agent
end
end
end
My responsibilities are now better separated, and I've ensured that a given ContentPostGateway
will only ever log in once—which should save my program considerable time. I've laid the groundwork for re-using the session object among multiple gateway objects. And if, in the future, I decide to persist DPD session cookies across requests to the rubytapas.com app, I've ensured that I'll only have to make that change in one place.
Notice that once this class has initialized an agent, it doesn't try to mediate calls to that agent. This class is not intended to be yet another abstraction layer for communicating with the DPD site—we already have the ContentPostGateway
for that. I've limited this class's responsibilities to setting up a logged-in session. Once it has taken care of that, it exposes its agent
to the ContentPostGateway
to be used directly.
As I promised the last time we talked about this subject, the next time we return to the RubyTapas.com code I'll dig into how I mapped from the DPD business domain to the RubyTapas business domain. Until then, happy hacking!
Responses