In Progress
Unit 1, Lesson 1
In Progress

Consistency

Video transcript & code

A few weeks back I published a blog post about whether it's a good idea to use a coding standard in your programming projects. I'm not going to bore you with the whole article today; you can go and read it if you want. But I do want to take a closer look at one of the examples I used.

In the article, I showed these two blocks of code. I'll give you a moment to look them over.

# Block #1
output = []
while word = input.shift
  unless STOP_WORDS.include?(word.downcase)
    output << word
  end
end

# Block #2
output = input.map(&:downcase) - STOP_WORDS

These two examples are so different, that it takes a bit of reading to realize that they do substantially the same thing. Both take as input a list of words, and return a new list with a set of "stop words" removed. This is an operation that is often performed on text documents before indexing them for search.

That's the first realization about these two samples: they both have the same functionality.

But there's a second, even less obvious realization, which is that their behavior actually differs in two subtle but important ways.

If you want, you can pause the video and see if you can spot the differences.

Have you figured it out? The first difference in behavior is how these two samples treat their input. The first one clears out the input array.

Whereas the second leaves it intact.

STOP_WORDS = %w[in the of]

# Block #1
input = %w"When in the course of human events"
output = []
while word = input.shift
  unless STOP_WORDS.include?(word.downcase)
    output << word
  end
end
input                           # => []

# Block #2
input = %w"When in the course of human events"
output = input.map(&:downcase) - STOP_WORDS
input                           # => ["When", "in", "the", "course", "of", "h...

The second difference is in output. The first example re-uses words from the original input array.

Whereas the second one produces downcased words.

You can see the difference in the 'W' at the beginning of the sentence.

If you were testing this code with only lower-case sentences, you might not notice this difference for a long time.

STOP_WORDS = %w[in the of]

# Block #1
input = %w"When in the course of human events"
output = []
while word = input.shift
  unless STOP_WORDS.include?(word.downcase)
    output << word
  end
end
output
# => ["When", "course", "human", "events"]

# Block #2
input = %w"When in the course of human events"
output = input.map(&:downcase) - STOP_WORDS
output
# => ["when", "course", "human", "events"]

Let's code up a version that's in the style of the second block, but which behaves like the first one.

STOP_WORDS = %w[in the of]

# Block #2
input = %w"When in the course of human events"
output = input.map(&:downcase) - STOP_WORDS
input
# => ["When", "in", "the", "course", "of", "human", "events"]
output
# => ["when", "course", "human", "events"]

# Block #3
input = %w"When in the course of human events"
output = input.shift(input.size).reject{|w| STOP_WORDS.include?(w.downcase)}
input
# => []
output
# => ["When", "course", "human", "events"]

Here, it's clearer that both samples take an input list, and produce a modified list based on that input. But it's also very clear that they have some differences.

In particular, there's no idiomatic way in this functional, chaining style to wind up with an empty input array. To force that behavior, we had to start by shifting every element into a new array.

This is a good thing, because clearing out the input array was probably an accident in the first place.

In Ruby, there's always more than one way to do anything. More important than doing everything some mythical "right" way, is doing things consistently within a given project. On a project team, this is an area where practices like code review, having a coding standard, and frequent pair-programming can help.

Consistency seems like an obvious win when it comes to identifying parts of the code that are identical. But the major point here is that consistency is even more important for making small differences obvious. Make same things look the same, and make different things look different.

This example also makes an interesting case for keeping code concise. Because the method chaining style takes up less space than the looping version, the difference between our last two code samples is relatively more pronounced. The same difference in behavior between two more verbose examples might have been harder to spot.

And that's all I have for today. Happy hacking!

Responses