In Progress
Unit 1, Lesson 1
In Progress

Coincidental Duplication

You have probably heard the DRY mantra: Don’t Repeat Yourself! Wherever we find duplication, we try to eliminate it and create a single source for a given piece of knowledge.

But not all duplication is created equal. While some duplication indicates that a piece of knowledge has been copied from one place to another, other times two pieces of code can be similar purely by coincidence.

In this episode, you’ll see an example of this second form of duplication. You’ll learn that just because two things look the same, that doesn’t necessarily imply that they need to be collapsed into one.

Video transcript & code

If there's one acronym you're familiar with as a Ruby programmer, it's probably "DRY" - which stands for "Don't Repeat Yourself". As originally put forth in the book The Pragmatic Programmer, the DRY principle says that "every piece of knowledge must have a single, unambiguous, authoritative representation within a system".

This seems straightforward enough to follow. Anytime we see information or logic duplicated, we should find a way to eliminate that duplication. But as with any other rule, it's possible to go too far in the pursuit of DRYness.

Consider a library for making RESTful connections to HTTP services. The base Client class has methods for GETing, PUTing, and POSTing. We'll ignore the other HTTP verbs for brevity.

module RoflRest
  class Client
    def initialize(uri)
      @uri = uri
    end

    def get(params={})
      # GET a representation
    end

    def put(params={})
      response = do_request(:put, params)
    end

    def post(params={})
      response = do_request(:post, params)
    end

    def do_request(method, params)
      # ...
    end
  end
end

One of the services this library provides over and above Ruby's Net::HTTP library is the ability to log all errors resulting from a request.

module RoflRest
  class Client
    def initialize(uri, logger=Logger.new($stderr))
      @uri    = uri
      @logger = logger
    end

    def get(params={})
      response = do_request(:get, params)
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
      end
      response
    end

    def put(params={})
      response = do_request(:put, params)
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
      end
      response
    end

    def post(params={})
      response = do_request(:post, params)
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
      end
      response
    end
  end
end

Another service it provides is to raise an exception when the response status code is in the 500 range.

module RoflRest
  class ServerError < StandardError
  end

  class Client
    def initialize(uri, logger=Logger.new($stderr))
      @uri    = uri
      @logger = logger
    end

    def get(params={})
      response = do_request(:get, params)
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
        raise ServerError, response.code
      end
      response
    end

    def put(params={})
      response = do_request(:put, params)
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
        raise ServerError, response.code
      end
      response
    end

    def post(params={})
      response = do_request(:post, params)
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
        raise ServerError, response.code
      end
      response
    end
  end
end

Of course, as the maintainers of this library we are smart enough to see that there is a lot of duplication here, so we have extracted out a #handle_error method which is called from every top-level method.

module RoflRest
  class RequestError < StandardError
    attr_reader :original
    def initialize(original)
      @original = original
    end
  end

  class Client   
    def initialize(uri, logger=Logger.new($stderr))
      @uri    = uri
      @logger = logger
    end

    def get(params={})
      response = do_request(:get, params)
      handle_error(response)
    end

    def put(params={})
      response = do_request(:put, params)
      handle_error(response)
    end

    def post(params={})
      response = do_request(:post, params)
      handle_error(response)
    end

    def handle_error(response)
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
        raise ServerError, response.code
      end
      response
    end
  end
end

Let's say we're using this library in an application. One day we realize we need to do more than instantiate a client object. We're hitting a web service which sometimes gets overwhelmed and starts returning 503 errors to GET requests, and we'd like our client object to retry a few times before giving up. So we inherit a new class from the Client class.

The new #get method keeps a count of tries and keeps retrying, with a wait in between, until either the request succeeds or three attempts have been made.

class MyClient = RoflRest::Client
  def initialize
    super('http://example.com/foo/bar')
  end

  def get(params={})
    tries = 0
    begin
      response = do_request(:get, params)
      return response if response.code == '200'
      tries += 1
      sleep 1
    end while tries < 3
    raise ServerError, response.code
  end
end

We'd like our new method to log errors just like the old one. Unfortunately, since the base class bundles up logging errors with raising exceptions, we have to duplicate the error logging portion in our own code.

class MyClient = RoflRest::Client
  def initialize
    super('http://example.com/foo/bar')
  end

  def get(params={})
    tries = 0
    begin
      response = do_request(:get, params)
      return response if response.code == '200'
      if response.code.to_i >= 500
        @logger.error "Request #{@uri} failed: #{response.code}"
      end
      tries += 1
      sleep 5
    end while tries < 3
    raise ServerError, response.code
  end
end

This is not ideal. Now if the base class error logging code ever changes, we'll have to take note and update our own copy of it in order to stay consistent.

The root of the problem is that the base class #handle_error method bundles up both true duplication and coincidental duplication. The logging portion is true duplication - chances are, we'll always want to log errors the same way. But the policy for dealing with server error codes is quite likely to change for certain services and resources. The fact that it was always handled the same way by default in the Client class is coincidental, not a truly shared trait across the different HTTP verbs.

def handle_error(response)
  if response.code.to_i >= 500
    @logger.error "Request #{@uri} failed: #{response.code}"
    raise ServerError, response.code
  end
  response
end

The coincidental duplication made it look like this was all one common concern: handling errors. But in fact, there are two independent concerns here - logging error codes, and choosing how to react to them.

If we were to rewrite the base class, we could represent these concerns with two different methods: #raise_on_error and #log_error.

def raise_on_error(response)
  if response.code.to_i >= 500
    raise ServerError, response.code
  end
  response
end

def log_error(response)
  if response.code.to_i >= 500
    @logger.error "Request #{@uri} failed: #{response.code}"
  end
  response
end

With this change, we could re-use #log_error in our subclass code, while substituting our own error retrying policy.

class MyClient = RoflRest::Client
  def initialize
    super('http://example.com/foo/bar')
  end

  def get(params={})
    tries = 0
    begin
      response = do_request(:get, params)
      return response if response.code == '200'
      log_error(response)
      tries += 1
      sleep 5
    end while tries < 3
    raise ServerError, response.code
  end
end

As we've seen here, it's not always easy to tell coincidental duplication from real knowledge duplication. Often it's easiest to tell the difference in hindsight as we build upon existing code. The important thing is to stay alert to the signs, and be ready to refactor when we see them. The last thing we want is to build real duplication into our code out of a misplaced sense of loyalty to our past decisions.

Hopefully this has given you some food for thought. Happy hacking!

Responses