In Progress
Unit 1, Lesson 1
In Progress


Video transcript & code

According to Larry Wall, laziness is one of the three virtues of programming, along with impatience and hubris. Laziness is the quality that causes us to write programs that do work for us instead of doing the work ourselves.

But laziness isn't just for programmers. Programs can exhibit this trait as well.

We've seen lazy programs before on this show. As you know, I like to talk about Enumerators a lot, and Enumerators are often a way to iterate over some stream or collection in a lazy fashion.

For instance, say we want to get the first 100 words out of the system dictionary. We could use the readlines  method to read all 10,000 or so lines of the dictionary, and then select just the first 100.


Or, we could save our program some effort and memory, and use the #each_line method. When called without a block, this method returns an Enumerator over the lines of the file. Ruby hasn't actually read the lines yet; but it's ready to. We can then as the Enumerator for the first 100 lines, and it will read just as far as it needs, and no farther.

enum = open('/usr/share/dict/words').each_line
# => #<Enumerator: #<File:/usr/share/dict/words>:each_line>


This is the essence of laziness from a program's perspective: only doing as much work as necessary.

Laziness becomes even more important when dealing with collections that have no end.

For instance, take the set of all prime numbers. Ruby represents this set with the Primes class. We can iterate over this set with #each, and the iteration will never terminate. Obviously, asking this set to convert itself to an array would be a bad idea.

require "prime"

Prime.each do |n|
  puts n

However, if we convert #each to an Enumerator by calling it without a block, we can pluck prime numbers as we need them. We only cause the computer to do as much calculation as are needed to get as many primes as we want.

require "prime"

enum = Prime.each               # => #<Prime::EratosthenesGenerator:0x2614f98 @last_prime_index=-1, @ubound=nil>                       # => 2                       # => 3                       # => 5

In this case, laziness is important not just to avoid unnecessary work. We need it in order to avoid an infinite iteration.

The examples we've looked at so far have demonstrated that right out of the box, Ruby gives us tools for dealing with collections and streams of data lazily. But you may have heard about a feature in Ruby called "lazy enumerators". If we can already deal with sets in a lazy way, what do these lazy enumerators add?

To explain the role of lazy enumerators, let's introduce a third example. This time, we want to build a set of numbers produced by squaring positive integers. Let's say we need the first 10 of these squares.

This is straightforward enough. We start with one, and get an Enumerator for incrementing it by sending #step without a block. We saw this technique in episode #254. Then we take the first 10 numbers produced by this Enumerator. Next we map these numbers through a block that squares each one. The result is an array of integers.

1.step                          # => #<Enumerator: 1:step>
1.step.first(10).map{|n| n**2}
# => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

OK, this is fine, but what if we want to make the size of the resulting set variable? We have to introduce the number of squares needed at the beginning of the message chain. So we might encapsulate this into a method, which takes the desired size as an argument.

def squares(size)
  1.step.first(size).map{|n| n**2}

# => [1, 4, 9, 16, 25]

Again, this works fine. But it conflates two different concepts in a single method: how to generate a sequence of squared numbers, and how many squares to generate. Any code that needs to work with squares has to first generate a fixed array of squares and then pass that array on to whatever other calculations need to be performed. Unlike the notional set of all prime numbers that we worked with before, we can't just send an object which represents the set of all squared positive integers into some other method, which can then take as many as it needs.

In order to do this, we need a lazy enumerator. We begin, as before, with the step message to produce an ordinary enumerator. But then instead of taking a fixed number of items, we interject the lazy message. The result of this send is a lazy enumerator object.

Now we add the map. When we look at the result of the map, instead of a concrete array of numbers we see that it has produced another lazy enumerator.

This object represents the potential to produce squared integers, but it hasn't actually produced any yet. Now that we have constructed this chain, we can decide how many squares we need and ask our lazy enumerator for that many.

1.step                          # => #<Enumerator: 1:step>
1.step.lazy                     # => #<Enumerator::Lazy: #<Enumerator: 1:step>>
squares ={|n| n**2}       # => #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:step>>:map>
# => [1, 4, 9, 16, 25, 36, 49, 64]

Let's introduce a third requirement. Let's say that for some reason, we really need pairs of two consecutive squares. To make this happen, after the map we add another step, using #each_slice to group the numbers into sets of 2. Then we can ask the resulting enumerator for its first four items, and get a list of pairs.

squares ={|n| n**2}.each_slice(2)
# => #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: 1:step>>:map>:each_slice(2)>

# => [[1, 4], [9, 16], [25, 36], [49, 64]]

You've probably started to see what's going on here. Each time we add a new link to the chain of enumerable messages, we get another lazy enumerator. Each enumerator has a reference to the one "before" it in the chain. As a result, when we ask the final enumerator for four pairs, it is able to ask the map enumerator for as 8 items in order to make up its slices. The map enumerator, in turn, asks the step enumerator to cough up 8 items.

(You might be wondering why #first produces a concrete array instead of yet another lazy enumerator. We selected the #first method specifically for this reason: it is defined by Ruby as always producing an actual array. As such, it "forces" the chain of enumerators to actually produce some values. If we wanted to limit the size of the final collection without actually producing it, we would use #take instead.)

We can think of the #first message at the end, which demands that actual concrete results be produced, as a way of "pulling" on this chain of enumerators we've produced. Pulling on each link in the chain then pulls on the one before it, until finally it reaches the end where it becomes a demand for more integers from the step method. In the end, we pull exactly as many numbers through the chain of enumerators as we need, and no more.

So this, then, is what makes lazy enumerators special. It's not that they let us produce values lazily; lots of Ruby objects do that without any help. What lazy enumerators add to the picture is a kind of infectious laziness. Once we add a call to #lazy, every Enumerable method afterwards is infected by this laziness and produces another lazy enumerator.

In this way we can represent arbitrarily complex streams of data, which may be expensive to produce, have multiple transformations performed on each element, and which may have no end, in a way that is transparent to the algorithms using these streams. From the point of view of client code, our lazy stream of squares can be treated exactly the same as a simple array of squares.

And that, in a nutshell, is what Ruby's lazy enumerators are all about. Happy hacking!