In Progress
Unit 1, Lesson 1
In Progress

Reduce Redux

Video transcript & code

In episode 149 we used the Enumerable#reduce method to sum up lists of numbers. I showed that we can pass a "seed" value to #reduce in order to give it something to combine the first element of the list with. In that episode I also claimed that you could leave off the "seed" value and the reduction would still work.

[1,2,3].reduce(0, :+)           # => 6
[1,2,3].reduce(:+)              # => 6

But as several astute viewers reminded me, this can lead to surprising behavior when the collection being reduced happens to be empty. Given a seed value of zero, summing an empty array simply returns the zero value back. But when we leave the seed value off, we get nil back. This is because, as a dynamically typed language, Ruby has no way of knowing what sort of objects we expected to be in that empty collection, so it can't come up with a sensible default seed value on its own.

[].reduce(0, :+)                # => 0
[].reduce(:+)                   # => nil

This can be a real problem in production code. Consider the statistical calculations we introduced in episode 150. When we replace the list of data samples with an empty array, the code doesn't get very far before blowing up. The sum value is nil, because we didn't pass a seed to #reduce. On the next line, we try to divide sum by the number of samples. This fails, because nil doesn't understand the / operator.

Unfortunately, this particular code doesn't get any further if we add an explicit seed value, because the size of the array is zero and dividing anything by zero raises an exception. But at least this failure is more immediately comprehensible. We can instantly see that the size must have been zero, rather than scratching our heads over how a nil snuck into the code. In other contexts, returning a zero may be the difference between a successful result and a crash.

samples = []
samples.size                    # => 0
min = samples.min               # => nil
max = samples.max               # => nil
sum = samples.reduce(:+)        # => nil
avg = sum / samples.size        # => 
sorted = samples.sort
half_size = samples.size / 2
middle = if samples.size.even?
           samples[half_size - 1, 2]
         else
           samples[half_size, 1]
         end
11.divmod(2)                    # => 
10.divmod(2)                    # => 
q, r = samples.size.divmod(2)
middle2 = samples[q - 1 + r, 2 - r]
middle                          # => 
middle2                         # => 
med = middle.reduce(:+) / middle.size # => 
dev = Math.sqrt(samples.map{|n| (n - avg) ** 2}.reduce(:+) / samples.size)
# => 
# ~> -:6:in `<main>': undefined method `/' for nil:NilClass (NoMethodError)
samples = []
samples.size                    # => 0
min = samples.min               # => nil
max = samples.max               # => nil
sum = samples.reduce(0, :+)        # => 0
avg = sum / samples.size        # => 
sorted = samples.sort
half_size = samples.size / 2
middle = if samples.size.even?
           samples[half_size - 1, 2]
         else
           samples[half_size, 1]
         end
11.divmod(2)                    # => 
10.divmod(2)                    # => 
q, r = samples.size.divmod(2)
middle2 = samples[q - 1 + r, 2 - r]
middle                          # => 
middle2                         # => 
med = middle.reduce(0, :+) / middle.size # => 
dev = Math.sqrt(samples.map{|n| (n - avg) ** 2}.reduce(:+) / samples.size)
# => 
# ~>    from -:6:in `<main>'
# ~> -:6:in `/': divided by 0 (ZeroDivisionError)
# ~>    from -:6:in `<main>'

As a general rule, code that works on collections is more robust and less surprising when it behaves similarly for any size collection, including empty ones. For this reason, I'm amending what I said in episode 149 to say that while you can omit the seed value to #reduce, in most cases you probably shouldn't.

I hate to leave you with just a correction today, though. So let's talk about another use of the #reduce method.

It's easy to think of ways to use #reduce to produce various mathematical reductions, such as sums, products, factorials, or even checksums using binary XOR. But #reduce is an elegant solution to more exotic problems as well.

[1,2,3].reduce(0, :+)           # => 6
(1..3).reduce(:*)               # => 6
[0xA, 0x5].reduce(0, :^)       # => 15

Here's an example: let's say we have some deeply nested data in the forms of hashes and arrays, perhaps retrieved as JSON data from a web service.

data = {
  "books" => [
    { 
      "title" => "Exceptional Ruby",
      "revisions" => [
        {"date" => "2011-12-01"},
        {"date" => "2012-06-15"}
      ]
    },
    {
      "title" => "Confident Ruby",
      "revisions" => []
    }
  ]
}

Now let's say we'd like to build a helper method to do "deep fetches" into this data structure. It receives a list of arguments, each indicating the next key or index to retrieve at each layer of the data.

deep_fetch(data, "books", 0, "revisions", 1, "date")

#reduce is perfect for this kind of traversal. We write our helper to do a #reduce across the given keys. It uses the original hunk of nested data as its seed value. At each iteration, it pulls out the next level down using the #fetch method on the last value. (We use #fetch rather than square brackets because #fetch will alert us if we try to access a missing key or index.) When it gets to the end of the keys, the method returns whatever it found at the last level.

We can pass this #deep_fetch method a list of keys and array indexes, forming a "path" of sorts. It will return the data item corresponding to the end of that "path".

def deep_fetch(data, *keys)
  keys.reduce(data) {|value, key|
    value.fetch(key)
  }
end
deep_fetch(data, "books", 0, "revisions", 1, "date") # => "2012-06-15"

We can shrink this code still further by simply passing the symbol :fetch to #reduce, and doing away with the block entirely.

def deep_fetch(data, *keys)
  keys.reduce(data, :fetch)
end
deep_fetch(data, "books", 0, "revisions", 1, "date") # => "2012-06-15"

As we can see, #reduce has unexpected depth: not only is it great for mathematical reductions, it provides an elegant way to traverse data structures. Happy hacking!

Responses