In Progress
Unit 1, Lesson 21
In Progress

Two Refactorings

Video transcript & code

Today I want to show you some code that has multiple responsibilities, and then walk through two different possibilities for refactoring it. One way is a refactoring in a functional style, the other embraces objects and state.

Here's some code from the rubytapas.com codebase. If you're a faithful viewer you may recognize it. But I've drastically simplified it for the purpose of this episode.

This is code from the EpisodeMapper, which takes data from a gateway object and maps that data into RubyTapas domain terms. The central method is each_episode. In a nutshell, it takes a list of episode summaries, which are simple hashes. It builds an Episode object out of each one, and yields it to a passed block.

Along the way, it performs some other tasks. For one thing, it checks to see if an object already exists in the identity map, and if so returns that object instead.

As you can see, there is also a bunch of code here having to do with episode numbers. This is because at one point there were some episodes that lacked numbers in their titles. This method tries to deal gracefully with this added complication.

In order to do so, it instantiates an episode number variable. When an episode is found that lacks a number, its number is set to the last episode number plus one. Then the last episode number is updated for the next cycle.

Let's take a look at the input data. For the purpose of this demonstration, the data is simply hardcoded instead of being sourced form another object. As you can see, the second episode is missing a number.

When we put this method through its paces, we can see that it successfully uses the preceding episode number, 182, to guess that the second episode's number should be 183.

Episode = Struct.new(:number, :name, :data_source)

class EpisodeMapper
  def episode_summaries
    [
      {id: 1, title: "182 Schwartzian Transform"},
      {id: 2, title: "Macro"},
      {id: 3, title: "184 Sequel, Postgres, JSON"}
    ]
  end

  def id_map
    @id_map ||= {}
  end

  def new_ghost_episode_from_summary(summary)
    title, number, name = */(\d{3})?\s*(.*)$/.match(summary[:title]) # !> assigned but unused variable - title
    Episode.new(number && number.to_i, name)
  end

  def each_episode
    last_number = 0
    episode_summaries.each do |ep_summary|
      id = ep_summary[:id]
      episode = id_map.fetch([Episode, id]) do
        ep = new_ghost_episode_from_summary(ep_summary)
        ep.data_source      = self
        id_map[[Episode, id]] = ep
        ep
      end
      unless episode.number
        episode.number = last_number + 1
      end
      last_number = episode.number
      yield(episode)
    end
  end
end

em = EpisodeMapper.new
em.each_episode do |ep|
  p ep
end
# >> #<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000235b8d8 @id_map={[Episode, 1]=>#<struct Episode:...>}>>
# >> #<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000235b8d8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000235b8d8 ...>>, [Episode, 2]=>#<struct Episode:...>}>>
# >> #<struct Episode number=184, name="Sequel, Postgres, JSON", data_source=#<EpisodeMapper:0x0000000235b8d8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000235b8d8 ...>>, [Episode, 2]=>#<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000235b8d8 ...>>, [Episode, 3]=>#<struct Episode:...>}>>

The #each_episode method is overcomplicated for my tastes. I'd like to extract out at least some of the episode numbering code.

But this is an example of one of the more difficult types of refactorings to think through, because it involves state. We can't simply factor out a method to guess the number for a given episode. Without also keeping track of the last episode number, from the previous iteration, we don't have anything to base that guess on.

let's look at one possible refactoring. It's the first one that occurred to me when I took on this problem. For this one we'll use an approach inspired by the functional programming style.

We create a new method, #enumerate_episodes. It will receive an array of episode summaries and a starting value for the "last episode number". It uses #inject to iterate over episode summaries while also passing the last_episode variable forward to each iteration. For each episode it yields both the current summary and the last number plus one to the caller. It expects that the passed block will return an episode object to it. Once the block is finished, it uses the number in the returned episode object to update the last_number counter.

Over in the #each_episode method, we update the code to use #enumerate_episodes. We remove most of the code for keeping track of episode numbers. Instead, we use the yielded episode number guess as a fallback in case an episode number is not found. And then we are careful to have the episode be the last expression in the block, so that it is returned to #enumerate_episodes=.

Episode = Struct.new(:number, :name, :data_source)

class EpisodeMapper
  def episode_summaries
    [
      {id: 1, title: "182 Schwartzian Transform"},
      {id: 2, title: "Macro"},
      {id: 3, title: "184 Sequel, Postgres, JSON"}
    ]
  end

  def id_map
    @id_map ||= {}
  end

  def new_ghost_episode_from_summary(summary)
    title, number, name = */(\d{3})?\s*(.*)$/.match(summary[:title]) # !> assigned but unused variable - title
    Episode.new(number && number.to_i, name)
  end

  def enumerate_episodes(summaries, last_number)
    summaries.inject(last_number) do |last_number, ep_summary| # !> shadowing outer local variable - last_number
      episode = yield ep_summary, last_number + 1
      episode.number
    end
  end

  def each_episode
    enumerate_episodes(episode_summaries, 0) do |ep_summary, guessed_number|
      id = ep_summary[:id]
      episode = id_map.fetch([Episode, id]) do
        ep = new_ghost_episode_from_summary(ep_summary)
        ep.data_source      = self
        id_map[[Episode, id]] = ep
        ep
      end
      episode.number ||= guessed_number
      yield(episode)
      episode
    end
  end
end

em = EpisodeMapper.new
em.each_episode do |ep|
  p ep
end
# >> #<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000172a280 @id_map={[Episode, 1]=>#<struct Episode:...>}>>
# >> #<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000172a280 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000172a280 ...>>, [Episode, 2]=>#<struct Episode:...>}>>
# >> #<struct Episode number=184, name="Sequel, Postgres, JSON", data_source=#<EpisodeMapper:0x0000000172a280 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000172a280 ...>>, [Episode, 2]=>#<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000172a280 ...>>, [Episode, 3]=>#<struct Episode:...>}>>

When we execute this code, it works just fine. And there are some nice things about this method. We've eliminated the need for a counter variable, and replaced it with this nice, functional, mutation-free code. By wrapping the #enumerate_episodes around our other code, we're able to maintain state as we iterate, but without actually updating any variables.

But there's also a lot I don't like about this refactoring. If you're a longtime viewer you've probably gathered by now that I'm quite comfortable with solutions that involve introducing methods that take blocks. And yet, I still find this code difficult to think about, with its nested levels of blocks and give-and-take between blocks.

In addition, there is a lot of dependency between these methods. #enumerate_episodes has to know about both episode summaries and episode objects. It assumes the responsibility for iterating over the episode summary array.

Add the coupling goes both ways. #enumerate_episodes knows a lot about #each_episode's job. But #each_episode also has to remember to return the episode object back to #enumerate_episodes in order for it to work correctly. With all this coupling, it's difficult to imagine factoring this logic further out, into its own object.

Now let's look at another possible refactoring. First, we revert back to the original state of the code. Then we introduce a new inner class: EpisodeEnumeration. On initialization, it sets an instance variable @next_number to 1.

This class has one other method: number_episode. This method receives an episode as an argument. It grabs the episode's current number. If the number was already set, it updates its own @next_number counter to equal the episode's number. Otherwise, it sets the episode's number. Either way, it finishes by incrementing the @next_number.

In the #each_episode code, we replace all of the episode numbering code with two lines: one to instantiate an EpisodeEnumeration. And another to tell that object to number the current episode.

That's it. When we run the code, is correctly numbers the episode with a missing number.

Episode = Struct.new(:number, :name, :data_source)

class EpisodeMapper
  class EpisodeEnumeration
    def initialize
      @next_number = 1
    end

    def number_episode(episode)
      number = episode.number
      if number
        @next_number = number
      else
        episode.number = @next_number
      end
      @next_number += 1
    end
  end

  def episode_summaries
    [
      {id: 1, title: "182 Schwartzian Transform"},
      {id: 2, title: "Macro"},
      {id: 3, title: "184 Sequel, Postgres, JSON"}
    ]
  end

  def id_map
    @id_map ||= {}
  end

  def new_ghost_episode_from_summary(summary)
    title, number, name = */(\d{3})?\s*(.*)$/.match(summary[:title]) # !> assigned but unused variable - title
    Episode.new(number && number.to_i, name)
  end

  def each_episode
    episode_enumeration = EpisodeEnumeration.new
    episode_summaries.each do |ep_summary|
      id = ep_summary[:id]
      episode = id_map.fetch([Episode, id]) do
        ep = new_ghost_episode_from_summary(ep_summary)
        ep.data_source      = self
        id_map[[Episode, id]] = ep
        ep
      end
      episode_enumeration.number_episode(episode)
      yield(episode)
    end
  end
end

em = EpisodeMapper.new
em.each_episode do |ep|
  p ep
end
# >> #<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x00000001b1e0a8 @id_map={[Episode, 1]=>#<struct Episode:...>}>>
# >> #<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x00000001b1e0a8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x00000001b1e0a8 ...>>, [Episode, 2]=>#<struct Episode:...>}>>
# >> #<struct Episode number=184, name="Sequel, Postgres, JSON", data_source=#<EpisodeMapper:0x00000001b1e0a8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x00000001b1e0a8 ...>>, [Episode, 2]=>#<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x00000001b1e0a8 ...>>, [Episode, 3]=>#<struct Episode:...>}>>

Unlike the functional solution, this approach explicitly embraces state. It also embraces the object-oriented principle of "tell, don't ask". The relationship between EpisodeMapper and EpisodeEnumeration is strictly one-way: the mapper creates an enumeration, and tells it to do its thing. That's it.

As a result, the EpisodeEnumeration is far less tied to the internals of EpisodeMapper. It knows nothing about summaries or lists of summaries. All it knows about are Episode objects.

I don't know about you, but I find this solution much easier to read and understand. I would much rather see this when coming to an unfamiliar codebase, than see the functional solution.

The naming of EpisodeEnumeration is very important. I originally wanted to call it EpisodeEnumerator. But that would likely have lead me down a very different road, one that would have looked a lot more like the original, functional refactoring.

The key decision in making the EpisodeEnumeration as simple as it is was choosing to represent, not a "thing that enumerates episodes", but a single enumeration session as an object. EpisodeEnumeration is a one-shot, throwaway object. After iterating through one series of episodes, it's done. Yes it embraces state, but it's short-lived state.

I've found this to be a recurring pattern. Some of the most elegant applications of object-oriented design involve wrapping up state in small, disposable objects that last only for the length of a single method call, or less.

Now this is not to say that the therefore object-oriented approaches are better than functional ones. Ruby is an object-oriented language, so the deck is stacked against the functional solution to begin with. For instance, we started out with the assumption of performing an imperative iteration over episodes. In many functional programming languages, the starting code would have been modeled using recursion rather than iteration. This would have changed things.

One more thing: I know I said that we'd look at two solutions. But there's a third approach to this problem. That approach is to go to the source of the data, and make sure there are no episodes with the number missing from their titles in the first place!

That solution probably would have saved me the most time. I point this out just as a reminder that the simplest possible solution to a problem is often to avoid having the problem to begin with!

And that's it for today. Happy hacking!

Responses