In Progress
Unit 1, Lesson 1
In Progress

Gem Logging

Video transcript & code

If I say "let's talk about using logs to diagnose problems", there are a couple of ideas that probably come to mind immediately. You probably think about adding logging statements to your own code, in order to keep track of what it's doing at runtime. And you probably also think about reading and analyzing framework logging output: for instance, logs of client requests or database queries.

But there's another rich vein of data that we sometimes forget to tap into, and that's log messages from third-party libraries.

Let's say I want to get a list of the books I own for my Kobo e-reader. There's no API that I know of, so I have to use screen-scraping to get at the information.

Here's a program which does just that.

I've tried to shrink this example code down to the bare essentials, but it's still a bit long. Let's go over it a piece at a time.

First, we require the libraries we'll be using. These are all libraries we've met in previous episodes.

We talked about the mechanize screen-scraping gem in episode #100.

We talked about the moneta library for interfacing with key/value stores in episode #67.

stringio is a standard library we've mentioned in several episodes. It's a way to simulate an IO object like a file using a Ruby string.

And then there's the logger library, which we met in episode #339 and elaborated on in many later episodes.

We're also going to use the UnpagingFacade class that we built in episode #347, to help us work with data spread across multiple web pages.

Our code starts by creating a logger.

Then it sets up a connection to a key-value store, using Moneta.

We're going to use this store as a way to persist our website session, so we don't have to log in over and over again. In production applications I've used a key-value database, such as Redis, to store session information. But here we're just using a local YAML file. Regardless of back-end, Moneta gives us a consistent API.

The session information is stored under the "cookies" key.

Next, we create a Mechanize agent to do the screen-scraping.

If we found cookie information, we note that we are re-using an existing session.

We load the cookie data into the agent, with the help of a StringIO.

Otherwise, if no cookie data is found, we log that we are starting a new session.

Then we use the Mechanize agent to go to a login page, fill in the requisite fields, submit the form, and finally save the new session information back to the key-value store.

Once we have a logged-in agent, we create a page-fetching lambda, very similar to the one we wrote in episode #347.

We plug this page fetcher into a new UnpagingFacade.

And then, finally, we use the facade object to index into the list of scraped book titles.

require "mechanize"
require "moneta"
require "stringio"
require "logger"
require "./unpaging_facade"

logger = Logger.new($stdout)

store = Moneta.new(:YAML, file: "store.yaml")
cookies = store[:cookies]

agent = Mechanize.new

if cookies
  logger.info "Re-using session"
  cookies_file = StringIO.new(cookies)
  agent.cookie_jar.load_cookiestxt(cookies_file)
else
  logger.info "Starting new session"
  page = agent.get("https://secure.kobobooks.com/auth/Kobo/login")
  form = page.forms[0]
  form.field_with(name: "EditModel.Email").value = ENV.fetch("KOBO_EMAIL")
  form.field_with(name: "EditModel.Password").value = ENV.fetch("KOBO_PASSWORD")
  agent.submit(form)

  cookies_file = StringIO.new
  agent.cookie_jar.dump_cookiestxt(cookies_file)
  store["cookies"] = cookies_file.string
end


fetcher = ->(page_num) {
  page_url = "https://www.kobobooks.com/library/purchased/index.html?t={1}}&p=1&s=titleasc&g=both"
  page = agent.get(page_url)
  book_elts = page.search("ul.LibraryItemList h3 a")
  titles = book_elts.map{|elt| elt.text}
  titles
}

titles = UnpagingFacade.new(page_size: 10, page_fetcher: fetcher)

titles[1]
# => "Ancillary Mercy"

Now we run this program a few times, and discover a problem: every time we run it, we are getting the log message "Starting new session".

I, [2016-03-10T10:39:22.011404 #31249]  INFO -- : Starting new session

Why aren't we re-using saved session information after the first time?

We could pore over the program text. We could fire up a debugger. We could add lots of temporary puts statements.

Or, we could give the Moneta object access to our logger object.

When we run the program again, we can quickly spot the problem: we're storing under the string "cookies", but we're loading using the symbol :cookies.

# ...
store = Moneta.new(:YAML, file: "store.yaml", logger: logger)
# ...

# >> Moneta load(:cookies) -> nil
# >> I, [2016-03-10T10:41:45.003929 #4686]  INFO -- : Starting new session
# >> Moneta store("cookies", "# HTTP Cookie File\n.kobobooks...) -> "# HTTP C...

We fix this, by changing the key used to load cookies from a string to a symbol.

But after playing around some more, we discover a new problem: our code can't seem to retrieve more than the first page of 10 book titles. After that it just cycles around.

# ...
cookies = store["cookies"]
# ...

titles[0]
# => "Abaddon's Gate"
titles[10]
# => "Abaddon's Gate"

Once again, there are various ways we could painstakingly debug our code. Or, we could give the mechanize library access to our logger, and let it tell us more about what's going on.

Plugging our logger into mechanize works a little differently than it did for Moneta. After instantiating an agent, we assign our logger to its log attribute.

When we run the code again, we get a massive amount of logging information from mechanize. There's information on every request, every response header, every cookie set, and more.

It turns out we don't need quite this level of detail to debug this particular problem.

We change the loggers level to only show messages of INFO priority or above.

When we run it again, we have a much more manageable amount of information to work with.

And as we compare the two HTTP requests our code triggered, we realize something: it requested the exact same URL each time.

# ...
logger = Logger.new($stdout)
logger.level = Logger::INFO

# ...

agent = Mechanize.new
agent.log = logger

# ...

# >> Moneta load("cookies") -> "# HTTP Cookie File\n.kobobooks...
# >> I, [2016-03-10T10:57:54.475061 #9203]  INFO -- : Re-using session
# >> I, [2016-03-10T10:57:54.478870 #9203]  INFO -- : Net::HTTP::Get: /library/purchased/index.html?t=%7B1%7D%7D&p=1&s=titleasc&g=both
# >> I, [2016-03-10T10:57:55.111174 #9203]  INFO -- : status: Net::HTTPOK 1.1 200 OK
# >> I, [2016-03-10T10:57:55.159781 #9203]  INFO -- : Net::HTTP::Get: /library/purchased/index.html?t=%7B1%7D%7D&p=1&s=titleasc&g=both
# >> I, [2016-03-10T10:57:55.548230 #9203]  INFO -- : status: Net::HTTPOK 1.1 200 OK

So we go back to our code, and sure enough: we forgot to actually interpolate the page_num variable into our page URL anywhere.

page_url = "https://www.kobobooks.com/library/purchased/index.html?t={1}}&p=1&s=titleasc&g=both"

No matter what page is requested, it will always return data from page 1.

When we fix this, we are able to get to the second page of titles.

# ...
fetcher = ->(page_num) {
  page_url = "https://www.kobobooks.com/library/purchased/index.html?t={1}&p=#{page_num+1}&s=titleasc&g=both"
  page = agent.get(page_url)
  book_elts = page.search("ul.LibraryItemList h3 a")
  titles = book_elts.map{|elt| elt.text}
  titles
}
# ...

titles[0]
# => "Abaddon's Gate"
titles[10]
# => "House of Suns"

When we're deep in the throes of debugging a problem, we can sometimes get so focused we fail to notice or use all of the resources available to us. When dealing with code that interacts with third-party libraries, the logging built-in to those libraries can be a goldmine of useful information. And often all we need to do to tap into that information is to configure the library with a logger object.

My recommendation is this: When you start to use a new Ruby library in your code, look to see if there is a way to pass a logger in. If there is, configure the library to log to your standard logging object as part of your programs initialization. Do this before you run into problems, so that when something does go wrong, you'll be able to comb through the logs and know, not just what the framework and your own application code was doing, but also what was going on in the libraries your code makes use of.

In fact, I'd go so far as to say that robust internal logging is one of the criteria for gauging the quality of a third-party library. If you're evaluating a gem for potential use, take a look at its code. Does it make use of a logger object, and log useful data about notable events? Does it have a configurable logging option so you can plug your own logger in?

This isn't the only a criterion a library should be judged on. But copious and configurable internal logging is a big mark in any gem's favor.

And that's all I have for today. Happy hacking!

Responses