Two Refactorings
Video transcript & code
Today I want to show you some code that has multiple responsibilities, and then walk through two different possibilities for refactoring it. One way is a refactoring in a functional style, the other embraces objects and state.
Here's some code from the rubytapas.com codebase. If you're a faithful viewer you may recognize it. But I've drastically simplified it for the purpose of this episode.
This is code from the EpisodeMapper
, which takes data from a gateway object and maps that data into RubyTapas domain terms. The central method is each_episode
. In a nutshell, it takes a list of episode summaries, which are simple hashes. It builds an Episode
object out of each one, and yields it to a passed block.
Along the way, it performs some other tasks. For one thing, it checks to see if an object already exists in the identity map, and if so returns that object instead.
As you can see, there is also a bunch of code here having to do with episode numbers. This is because at one point there were some episodes that lacked numbers in their titles. This method tries to deal gracefully with this added complication.
In order to do so, it instantiates an episode number variable. When an episode is found that lacks a number, its number is set to the last episode number plus one. Then the last episode number is updated for the next cycle.
Let's take a look at the input data. For the purpose of this demonstration, the data is simply hardcoded instead of being sourced form another object. As you can see, the second episode is missing a number.
When we put this method through its paces, we can see that it successfully uses the preceding episode number, 182, to guess that the second episode's number should be 183.
Episode = Struct.new(:number, :name, :data_source)
class EpisodeMapper
def episode_summaries
[
{id: 1, title: "182 Schwartzian Transform"},
{id: 2, title: "Macro"},
{id: 3, title: "184 Sequel, Postgres, JSON"}
]
end
def id_map
@id_map ||= {}
end
def new_ghost_episode_from_summary(summary)
title, number, name = */(\d{3})?\s*(.*)$/.match(summary[:title]) # !> assigned but unused variable - title
Episode.new(number && number.to_i, name)
end
def each_episode
last_number = 0
episode_summaries.each do |ep_summary|
id = ep_summary[:id]
episode = id_map.fetch([Episode, id]) do
ep = new_ghost_episode_from_summary(ep_summary)
ep.data_source = self
id_map[[Episode, id]] = ep
ep
end
unless episode.number
episode.number = last_number + 1
end
last_number = episode.number
yield(episode)
end
end
end
em = EpisodeMapper.new
em.each_episode do |ep|
p ep
end
# >> #<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000235b8d8 @id_map={[Episode, 1]=>#<struct Episode:...>}>>
# >> #<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000235b8d8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000235b8d8 ...>>, [Episode, 2]=>#<struct Episode:...>}>>
# >> #<struct Episode number=184, name="Sequel, Postgres, JSON", data_source=#<EpisodeMapper:0x0000000235b8d8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000235b8d8 ...>>, [Episode, 2]=>#<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000235b8d8 ...>>, [Episode, 3]=>#<struct Episode:...>}>>
The #each_episode
method is overcomplicated for my tastes. I'd like to extract out at least some of the episode numbering code.
But this is an example of one of the more difficult types of refactorings to think through, because it involves state. We can't simply factor out a method to guess the number for a given episode. Without also keeping track of the last episode number, from the previous iteration, we don't have anything to base that guess on.
let's look at one possible refactoring. It's the first one that occurred to me when I took on this problem. For this one we'll use an approach inspired by the functional programming style.
We create a new method, #enumerate_episodes
. It will receive an array of episode summaries and a starting value for the "last episode number". It uses #inject
to iterate over episode summaries while also passing the last_episode
variable forward to each iteration. For each episode it yields both the current summary and the last number plus one to the caller. It expects that the passed block will return an episode object to it. Once the block is finished, it uses the number in the returned episode object to update the last_number
counter.
Over in the #each_episode
method, we update the code to use #enumerate_episodes
. We remove most of the code for keeping track of episode numbers. Instead, we use the yielded episode number guess as a fallback in case an episode number is not found. And then we are careful to have the episode be the last expression in the block, so that it is returned to #enumerate_episodes=
.
Episode = Struct.new(:number, :name, :data_source)
class EpisodeMapper
def episode_summaries
[
{id: 1, title: "182 Schwartzian Transform"},
{id: 2, title: "Macro"},
{id: 3, title: "184 Sequel, Postgres, JSON"}
]
end
def id_map
@id_map ||= {}
end
def new_ghost_episode_from_summary(summary)
title, number, name = */(\d{3})?\s*(.*)$/.match(summary[:title]) # !> assigned but unused variable - title
Episode.new(number && number.to_i, name)
end
def enumerate_episodes(summaries, last_number)
summaries.inject(last_number) do |last_number, ep_summary| # !> shadowing outer local variable - last_number
episode = yield ep_summary, last_number + 1
episode.number
end
end
def each_episode
enumerate_episodes(episode_summaries, 0) do |ep_summary, guessed_number|
id = ep_summary[:id]
episode = id_map.fetch([Episode, id]) do
ep = new_ghost_episode_from_summary(ep_summary)
ep.data_source = self
id_map[[Episode, id]] = ep
ep
end
episode.number ||= guessed_number
yield(episode)
episode
end
end
end
em = EpisodeMapper.new
em.each_episode do |ep|
p ep
end
# >> #<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000172a280 @id_map={[Episode, 1]=>#<struct Episode:...>}>>
# >> #<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000172a280 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000172a280 ...>>, [Episode, 2]=>#<struct Episode:...>}>>
# >> #<struct Episode number=184, name="Sequel, Postgres, JSON", data_source=#<EpisodeMapper:0x0000000172a280 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x0000000172a280 ...>>, [Episode, 2]=>#<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x0000000172a280 ...>>, [Episode, 3]=>#<struct Episode:...>}>>
When we execute this code, it works just fine. And there are some nice things about this method. We've eliminated the need for a counter variable, and replaced it with this nice, functional, mutation-free code. By wrapping the #enumerate_episodes
around our other code, we're able to maintain state as we iterate, but without actually updating any variables.
But there's also a lot I don't like about this refactoring. If you're a longtime viewer you've probably gathered by now that I'm quite comfortable with solutions that involve introducing methods that take blocks. And yet, I still find this code difficult to think about, with its nested levels of blocks and give-and-take between blocks.
In addition, there is a lot of dependency between these methods. #enumerate_episodes
has to know about both episode summaries and episode objects. It assumes the responsibility for iterating over the episode summary array.
Add the coupling goes both ways. #enumerate_episodes
knows a lot about #each_episode
's job. But #each_episode
also has to remember to return the episode object back to #enumerate_episodes
in order for it to work correctly. With all this coupling, it's difficult to imagine factoring this logic further out, into its own object.
Now let's look at another possible refactoring. First, we revert back to the original state of the code. Then we introduce a new inner class: EpisodeEnumeration. On initialization, it sets an instance variable @next_number
to 1.
This class has one other method: number_episode
. This method receives an episode as an argument. It grabs the episode's current number. If the number was already set, it updates its own @next_number
counter to equal the episode's number. Otherwise, it sets the episode's number. Either way, it finishes by incrementing the @next_number
.
In the #each_episode
code, we replace all of the episode numbering code with two lines: one to instantiate an EpisodeEnumeration
. And another to tell that object to number the current episode.
That's it. When we run the code, is correctly numbers the episode with a missing number.
Episode = Struct.new(:number, :name, :data_source)
class EpisodeMapper
class EpisodeEnumeration
def initialize
@next_number = 1
end
def number_episode(episode)
number = episode.number
if number
@next_number = number
else
episode.number = @next_number
end
@next_number += 1
end
end
def episode_summaries
[
{id: 1, title: "182 Schwartzian Transform"},
{id: 2, title: "Macro"},
{id: 3, title: "184 Sequel, Postgres, JSON"}
]
end
def id_map
@id_map ||= {}
end
def new_ghost_episode_from_summary(summary)
title, number, name = */(\d{3})?\s*(.*)$/.match(summary[:title]) # !> assigned but unused variable - title
Episode.new(number && number.to_i, name)
end
def each_episode
episode_enumeration = EpisodeEnumeration.new
episode_summaries.each do |ep_summary|
id = ep_summary[:id]
episode = id_map.fetch([Episode, id]) do
ep = new_ghost_episode_from_summary(ep_summary)
ep.data_source = self
id_map[[Episode, id]] = ep
ep
end
episode_enumeration.number_episode(episode)
yield(episode)
end
end
end
em = EpisodeMapper.new
em.each_episode do |ep|
p ep
end
# >> #<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x00000001b1e0a8 @id_map={[Episode, 1]=>#<struct Episode:...>}>>
# >> #<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x00000001b1e0a8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x00000001b1e0a8 ...>>, [Episode, 2]=>#<struct Episode:...>}>>
# >> #<struct Episode number=184, name="Sequel, Postgres, JSON", data_source=#<EpisodeMapper:0x00000001b1e0a8 @id_map={[Episode, 1]=>#<struct Episode number=182, name="Schwartzian Transform", data_source=#<EpisodeMapper:0x00000001b1e0a8 ...>>, [Episode, 2]=>#<struct Episode number=183, name="Macro", data_source=#<EpisodeMapper:0x00000001b1e0a8 ...>>, [Episode, 3]=>#<struct Episode:...>}>>
Unlike the functional solution, this approach explicitly embraces state. It also embraces the object-oriented principle of "tell, don't ask". The relationship between EpisodeMapper
and EpisodeEnumeration
is strictly one-way: the mapper creates an enumeration, and tells it to do its thing. That's it.
As a result, the EpisodeEnumeration
is far less tied to the internals of EpisodeMapper
. It knows nothing about summaries or lists of summaries. All it knows about are Episode
objects.
I don't know about you, but I find this solution much easier to read and understand. I would much rather see this when coming to an unfamiliar codebase, than see the functional solution.
The naming of EpisodeEnumeration
is very important. I originally wanted to call it EpisodeEnumerator
. But that would likely have lead me down a very different road, one that would have looked a lot more like the original, functional refactoring.
The key decision in making the EpisodeEnumeration
as simple as it is was choosing to represent, not a "thing that enumerates episodes", but a single enumeration session as an object. EpisodeEnumeration
is a one-shot, throwaway object. After iterating through one series of episodes, it's done. Yes it embraces state, but it's short-lived state.
I've found this to be a recurring pattern. Some of the most elegant applications of object-oriented design involve wrapping up state in small, disposable objects that last only for the length of a single method call, or less.
Now this is not to say that the therefore object-oriented approaches are better than functional ones. Ruby is an object-oriented language, so the deck is stacked against the functional solution to begin with. For instance, we started out with the assumption of performing an imperative iteration over episodes. In many functional programming languages, the starting code would have been modeled using recursion rather than iteration. This would have changed things.
One more thing: I know I said that we'd look at two solutions. But there's a third approach to this problem. That approach is to go to the source of the data, and make sure there are no episodes with the number missing from their titles in the first place!
That solution probably would have saved me the most time. I point this out just as a reminder that the simplest possible solution to a problem is often to avoid having the problem to begin with!
And that's it for today. Happy hacking!
Responses