In Progress
Unit 1, Lesson 21
In Progress

Unpaging Facade

Video transcript & code

Once upon a time building web apps was simple. And then we started to have to have our apps talk to other web apps, and nothing has ever been quite as nice since.

Case in point: often, we find we need to get lists of things from other web services. For instance, we might need to get a list of videos from a media hosting service.

It would be great if we could just make a request and get the whole list. But if there are a thousand videos to list, that might not be practical, either for the service or for us. On their end, they probably don't want to be bogged down with long, massive queries all the time. And on our end, we may only need to present or process a slice of the collection at a time.

And so, what we often end up dealing with are paged APIs. In a paged API, we can only request one page of items at a time. Either there is a fixed page size defined by the foreign service, or we can specify how large we want our pages to be. But even when we get to choose a page size, there is usually an upper limit. After all, the whole point of paging is to limit the load placed on the service, so it wouldn't make sense to let clients specify an arbitrarily large page size.

Paging is one of those things that can quickly complicate gateway objects. Code that mixes HTTP request details with paging logic can test our abilities to keep track of multiple concerns at once.

Fortunately, paging is also a problem that lends itself to being handled orthogonally from the rest of our gateway logic. Let's look at one way we might go about it.

First, let's define a class to act as a facade over a paged list of some kind. We'll call it an UnpagingFacade, since its job is to hide the paging logic and make the collection appear to be continuous. Maybe not the best name ever, but it will serve.

Our class will take two required arguments on initialization. One is the page size. This class will not control page sizes; we're just telling it what size of pages it can expect.

The other argument is a page_fetcher. This object is responsible for actually retrieving a page of data. We'll talk more about the API we expect this object to expose a little bit later.

We assign these arguments to instance variables, and also initialize a blank array of pages. We could get by without this last variable, but it will function as a cache so that the same pages aren't fetched multiple times.

Next up, we define the subscript operator. We intend our UnpagingFacade to behave a bit like a Ruby Array, and in order to do that we need to be able to handle this operator.

Just like Array, our subscript operator takes an integer index. But if all we have is an index into a theoretical continuous collection, how do we convert that into paged terms? We're going to need to convert our absolute index into two bits of information: a page number, and an offset.

Let's take a sidebar and talk about how to extract these values. Say the absolute index being requested is "23", and the page size is 10.

To get the page number, we can simply divide the index by the page_size. Since we are dealing strictly with integers, any fractional parts are stripped of, and we get page 2.

To get the offset, we need to again divide the index by the page size, but this time take the remainder instead of the quotient. We do this using the modulo operator. The result is the offset 3.

Now, as you might already know, there is a way to do this in one operation instead of two. But since this show is all about learning one thing at a time, we'll leave that for the next episode.

index = 23
page_size = 10
page = index / page_size              # => 2
offset = index % page_size              # => 3

Let's jump back to our class under construction and add the code for extracting page and offset.

Now that we know what page of items we need, we have to fetch it somehow. We'll delegate that to another method, which we'll call #get_page.

Once we have the page of items, we just need to index into it with our offset value, and we have our result.

Now we need to define the #get_page method. For this, we'll first look up the page in the @pages array. If it's not found, resulting in a nil, we'll update the value by using the page fetcher.

Again, our facade doesn't know anything about HTTP or other ways of fetching pages. It just knows is that it has a page fetcher collaborator. We'll have it send the #call message to the fetcher, with desired page number as an argument. The fetcher object, in turn, should return a single page of results in the form of an array.

class UnpagingFacade
  def initialize(page_size:, page_fetcher:)
    @page_size    = page_size
    @page_fetcher = page_fetcher
    @pages        = []
  end


  def [](index)
    page_num = index / @page_size
    offset   = index % @page_size
    page     = get_page(page_num)
    page[offset]
  end

  def get_page(page_num)
    @pages[page_num] ||= @page_fetcher.call(page_num)
  end
end

Now let's put this class through it's paces. We have a fetcher, which is just a lambda that takes a page_number, logs the page it is fetching, constructs a URL, makes an HTTP request, and parses the result. We're using the http gem here, which we might talk about in a future episode.

To plug this into our UnpagingFacade, we construct a new object with a page size of 10, and the lambda as the page_fetcher.

Then we can specify any index we want. We can see from the logging that the appropriate page—and only that page—is fetched. If we ask for another item from the same page of results, the page is only fetched once, demonstrating our caching in action.

require "./unpaging_facade"

require "http"
require "json"

fetcher = ->(page_num) {
  puts "Fetching page #{page_num}"
  project_id = ENV.fetch("PROJECT_ID")
  pass = ENV.fetch("WISTIA_PASS")
  url = "https://api.wistia.com/v1/medias.json?project_id=#{project_id}&type=Video&per_page=10&page=#{page_num + 1}"
  response = HTTP.basic_auth(user: "api", pass: pass)
             .get(url)
  data = JSON.parse(response.body)
  data
}

videos = UnpagingFacade.new(page_size: 10, page_fetcher: fetcher)
videos[82]["name"]
# => "075 Tail Part 4: copy_stream"
videos[87]["name"]
# => "069 Gem-Love Part 4"

# >> Fetching page 8

So far, we've provided random access, and only random access. But another way we might like to access this paged resource is by iterating over it.

The usual Ruby idiom for iteration is to define the #each method. Then we usually include the Enumerable module, which adds lots of handy methods built on top of #each. We'll go ahead and include Enumerable now, and then define an #each implementation.

We start #each with an idiom we learned in episode #64, where we enable the method to return an Enumerator if no block is passed. This is consistent with the behavior of Ruby's built-in #each methods.

Now, how do we iterate over a collection when the only operation we have defined is the subscript operator? We can use a technique we learned in episode #254 to iterate over all integers starting with 0 using the #step method. At each iteration, we can use our subscript operator to fetch the item at that index. If this ever returns nil, we know we've gone past the last item available, and so we break off using the or control structure we mastered in episode #125.

So long as we have a non-nil item, we continue and yield it, then go on to the next index.

class UnpagingFacade
  include Enumerable

  def initialize(page_size:, page_fetcher:)
    @page_size    = page_size
    @page_fetcher = page_fetcher
    @pages        = []
  end

  def [](index)
    page_num = index / @page_size
    offset   = index % @page_size
    page     = get_page(page_num)
    page[offset]
  end

  def get_page(page_num)
    @pages[page_num] ||= @page_fetcher.call(page_num)
  end

  def each
    return to_enum(__callee__) unless block_given?
    0.step.each do |index|
      item = self[index] or break
      yield item
    end
  end
end

Let's give this a whirl. We'll change the page size to a bigger 100 elements, in order to minimize the number the of requests. Then we'll convert the unpaged collection to an array using to_a. This is one of the conveniences we get from including Enumerable.

Let's put this in a file and watch it live. We can see that it triggers consecutive page loads as it iterates through indexes. Finally it reaches the last non-nil item, stops iterating, and we get our item count.

require "./unpaging_facade2"

require "http"
require "json"

fetcher = ->(page_num) {
  puts "Fetching page #{page_num}"
  project_id = ENV.fetch("PROJECT_ID")
  pass = ENV.fetch("WISTIA_PASS")
  url = "https://api.wistia.com/v1/medias.json?project_id=#{project_id}&type=Video&per_page=100&page=#{page_num + 1}"
  response = HTTP.basic_auth(user: "api", pass: pass)
             .get(url)
  data = JSON.parse(response.body)
  data
}

videos = UnpagingFacade.new(page_size: 100, page_fetcher: fetcher)
videos.to_a.size              # => 339

# >> Fetching page 0
# >> Fetching page 1
# >> Fetching page 2
# >> Fetching page 3

In the end, we have an object that behaves a lot like a typical Ruby array representing the entire collection. But it's backed by a paged remote resource, and it efficiently only fetches pages as they are needed.

This class is pretty small. And there's nothing wildly clever about it. What's most notable about it is how we have isolated the responsibility of dealing with paged data. Again, this class knows absolutely nothing of HTTP requests.

What I love about this approach is that it demonstrates yet again how, with a little thought, we can extract classes that handle exactly one concern. And then we can compose those objects with other focused objects, and suddenly we have something with impressive functionality.

I hope this example inspires you to build more tiny, composable objects of your own. Happy hacking!

Responses