Video transcript & code
Today I want to tell you a story about rewriting code. About how I decided to rewrite some classes, and then proceeded to code myself into a corner. And how I then thought better of my approach, and found a way to carry out the rewrite smoothly and successfully.
I had this mapper class I was working on. I think I've shown it in a few episodes before. Its responsibility was finding and filling in Episode objects. In order to do so, it pulled information from two different external sources: DPD, the site that manages RubyTapas subscriptions; and Wistia, a video host. It did this via some intermediary classes, but they are tangential to our story today.
This all worked fine.
Then one day I added some functionality that required more information to be associated with episodes. This information was metadata, such as whether or not an episode was available as a free sample. This was information that I would control, and which couldn't be retrieved from either of the existing data sources.
So in addition to mapping data from two web services, I also needed to map in data from a database.
Since I was just going to be taking the episode info I already had and then "adding a little extra on top", I thought it might make sense to simply wrap the existing
EpisodeMapper in a new
EpisodeDataMapper. It would proxy all of the
EpisodeMapper methods, but add database-stored information to the episodes on the way through.
While this seemed like a good idea at the time, it quickly became overcomplicated and difficult to reason about. I was spending long periods of time puzzling over how to add new behavior in a way that kept responsibilities correctly separated. And when things broke, I found it difficult to identify the cause of the breakage.
I realized that the incrementally evolved design I had arrived at wasn't working out. It was time for a new design, one which put all three data sources on equal footing, and in which a new, unified mapper could draw from these three data sources directly in order to find, fill in, and store
So I set about to refactor my code into this new arrangement. I figured that since the rest of the application was already talking to the "wrapper"
EpisodeDataMapper class, I would just move parts of the
EpisodeDataMapper until there was nothing left of the original
This sounded good in theory. Unfortunately, as I carried it out I ran into problems. The methods I was trying to move over had never been written with a unified mapper in mind. I found I was breaking numerous tests. And it was difficult to tell which test failures were to be expected because I had moved code and functionality and the test was now superfluous, and which ones were indicating legitimate defects introduced as part of the refactoring effort. In fact, the word "refactoring" had ceased to apply at all, since refactorings ordinarily keep all the tests green.
I found myself in an all-too familiar situation: in the middle of a major change, with my code "on the floor" like the parts of a car in the middle of a big repair job. I had broken existing functionality, and at the same time the newly changed parts of the code weren't ready for prime time. My tests were failing, and I couldn't get them to pass without changing them or throwing some of them away. But to rip up my tests in the middle of this major redesign would be to lose confidence that the final product was correct.
I'd been down this road before, and I knew that it usually led to a having a long-lived rewrite branch in my source control system, and working on the rewrite in fits and starts while the master branch steadily diverged from it. In my experience, more often than not these rewrite branches eventually end up getting thrown away.
So I decided to cut to the chase and throw away all of my changes. I needed a new strategy. Here's what I did.
First, I beefed up my high-level acceptance tests. I added some extra coverage of the user-visible functionality that the episode mapper classes supported. Since these tests interacted with the system at a user-visible level, I knew that these tests could stay the same even as I redesigned the code that supported the scenarios.
Next, I began a brand new class I called
UnifiedEpisodeMapper. I drove it out with new tests, mostly integration tests. I used the existing code as inspiration, but with a few exceptions I wrote the new class from scratch. I also didn't try to re-use any of my old tests for
I built it out just enough to support fetching episodes, but not enough to support listing, filtering, or storing them. Then I switched just the web server action for showing individual episodes over to use the new mapper class. With just a little final tweaking I was able to get the tests passing. At this point, some server requests were using the old dual-nested-mapper code, and some were exercising my brand new unified mapper.
I then proceeded to expand the unified mapper to support listing episodes. Once it was ready I switched the action for showing episode lists over to use the new mapper as well. I made sure the tests still ran, and did some manual checking to make sure there were no surprises.
At this point, I was pretty confident that the new mapper was just about done. There were still one or two places in the code that referenced the old mapper. It was easy to update them, however. As I discussed in Episode #172, in my codebase the RubyTapas module function as as a registry for various objects. Only the registry instantiates these objects. It is also responsible for injecting any dependencies the objects might need, such as gateway objects, a reference to the database, or a logger. Everywhere else in the app where an episode mapper is needed, the code just calls
RubyTapas.episode_map rather than explicitly instantiating a paired
This provided a handy choke point, or as Michael Feathers calls it, a "seam". I didn't have to change multiple points in the code to instantiate a new kind of object, including passing down any dependencies needed by a
UnifiedEpisodeMapper. Instead, I just replaced the internals of the
RubyTapas.episode_map registry method to return a
UnifiedEpisodeMapper instead. Then I checked that everything still worked.
Now I had effectively rendered the old episode mapper classes redundant. I removed them as well as their tests. With the old classes out of the way, I renamed the UnifiedEpisodeMapper to simply
This whole process turned out to be one of the smoothest re-designs I've performed. There were few surprises, and I was able to keep the tests passing the whole time. And at no point did I feel the need fork off a special redesign branch - I did the whole thing on the master branch. I didn't actually deploy any code during the redesign process, but I easily could have—even when both the new mapper and the old mappers were working in parallel. I had some performance regressions as I brought the new code up to par with the old, but at no point did I lose functionality.
I attribute this smoothness to several factors:
- I recognized that it was a redesign, not a refactoring.
- I had acceptance tests to give me confidence that my redesign was not breaking functionality.
- I built up the new code in parallel with the still-working old code. I didn't try to update the old code in-place. I accepted the temporary duplication imposed by this approach.
- I didn't try to make the new code have exactly the same interface as the old, or try to make it pass the same tests. This would have hampered my ability to make it a true redesign.
- I replaced uses of the old code with uses of the new code piecemeal, rather than going for a single "big bang" replacement.
- My code contained seams abstracting the instantiation and provisioning of major roles like that of "episode mapper". This made it trivial to switch to a new provider for that role.
The reason I tell you this story is not to trumpet how clever I am to have achieved this smooth redesign. There were other factors at work here besides the ones I've already mentioned, including the fact that it is a relatively small and young codebase, and I'm the only programmer on it.
Rather, I thought I'd share this tale in hopes it inspires you to tackle the parts of your projects that are slowing you down, leaking bugs and giving you a headache every time you have to think about them. With the right set of approaches it is possible to rewrite small parts of a system without throwing the whole system away, and without spending months on a divergent branch with tests permanently failing. It is also possible to structure a system to make those piecemeal redesigns easier in the future.
I hope you find some of the tactics I've presented today useful. Happy hacking!