In Progress
Unit 1, Lesson 21
In Progress

Flat Map

Video transcript & code

Our tasks today involve extracting data from an expense report. The report is in a slighlt peculiar format. There are payment objects, which have a payee, an amount in cents, and optionally some tags for categorization. But interspersed with the payments are "dateline" objects, which serve to separate charges made on different days.

Payment  = Struct.new(:payee, :amount, :tags)
Dateline = Struct.new(:date)
TRANSACTIONS = [
  Dateline.new("2015-01-01"),
  Payment.new("Milliways", 25476, "dining travel"),
  Payment.new("Sirius Cybernetics", 7839),
  Payment.new("Old Pink Dog Bar", 2790, "dining"),
  Dateline.new("2015-01-02"),
  Payment.new("Magrathea", 49900000000000000, "real-estate"),
  Payment.new("Big Bang Burger Bar", 4780),
  Payment.new("Megadodo", 7850, "travel"),
]

The first thing we want to do with this data is to generate a human-readable report. To do this, first we declare a variable for keeping track of the current transaction date. Then we map over the transactions. For each one, we check to see if it is a Dateline object. If so, we use it to update the current date and then skip to the next item. Otherwise, we emit a string formatted for human consumption.

require "./register"

date = nil
puts TRANSACTIONS.map{|t|
  case t
  when Dateline
    date = t.date
    next
  else
    "%s: %.2f to %s" % [date, t.amount / 100.0, t.payee]
  end
}
# >>
# >> 2015-01-01: 254.76 to Milliways
# >> 2015-01-01: 78.39 to Sirius Cybernetics
# >> 2015-01-01: 27.90 to Old Pink Dog Bar
# >>
# >> 2015-01-02: 499000000000000.00 to Magrathea
# >> 2015-01-02: 47.80 to Big Bang Burger Bar
# >> 2015-01-02: 78.50 to Megadodo

The resulting output in it has gaps, because everywhere we noted a Dateline change resulted in a nil value in the output. As you might recall from episodes #261 and #262, nil is what the next keyword generates when invoked without an argument.

require "./register"

date = nil
TRANSACTIONS.map{|t|
  case t
  when Dateline
    date = t.date
    next
  else
    "%s: %.2f to %s" % [date, t.amount / 100.0, t.payee]
  end
}
# => [nil, "2015-01-01: 254.76 to Milliways", "2015-01-01: 78.39 to Sirius Cybernetics", "2015-01-01: 27.90 to Old Pink Dog Bar", nil, "2015-01-02: 499000000000000.00 to Magrathea", "2015-01-02: 47.80 to Big Bang Burger Bar", "2015-01-02: 78.50 to Megadodo"]

In order to make out output seamless, we have to remember to tack #compact onto the end of the map.

require "./register"

date = nil
puts TRANSACTIONS.map{|t|
  case t
  when Dateline
    date = t.date
    next
  else
    "%s: %.2f to %s" % [date, t.amount / 100.0, t.payee]
  end
}.compact

# >> 2015-01-01: 254.76 to Milliways
# >> 2015-01-01: 78.39 to Sirius Cybernetics
# >> 2015-01-01: 27.90 to Old Pink Dog Bar
# >> 2015-01-02: 499000000000000.00 to Magrathea
# >> 2015-01-02: 47.80 to Big Bang Burger Bar
# >> 2015-01-02: 78.50 to Megadodo

Now let's move on to another task. We also want to get a summary of all the categorization tags used in this report. For this purpose, we don't need to take note of datelines at all, so we can filter them out using #grep. Then we map over the transactions. We grab the #tags field for each one. Tags are in the form of either a string of space-separated flags, or (in the case of payments that haven't been categorized), nil values. In order to handle the nil fields gracefully we explicitly convert the fields to strings before splitting them on whitespace into an array of tags.

The result is a an array of arrays, some of which are empty. We flatten these arrays into a single array of strings, then use #uniq to remove duplicates. We now have our global list of tags.

require "./register"

TRANSACTIONS.grep(Payment).map{|t|
  t.tags.to_s.split
}
# => [["dining", "travel"], [], ["dining"], ["real-estate"], [], ["travel"]]


TRANSACTIONS.grep(Payment).map{|t|
  t.tags.to_s.split
}.flatten
# => ["dining", "travel", "dining", "real-estate", "travel"]


TRANSACTIONS.grep(Payment).map{|t|
  t.tags.to_s.split
}.flatten.uniq
# => ["dining", "travel", "real-estate"]

Both of the operations we've just performed can be optimized, both in terms of code size and in terms of resource usage, using the #flat_map method.

Let's take a look at how #flat_map works in the abstract. If we take an array of strings, then map over that array and break each one into an array of its component characters, we end up with an array of arrays.

(We can tighten up this code by using Symbol#to_proc to get rid of the block.)

%W[foo bar baz].map(&:chars)
# => [["f", "o", "o"], ["b", "a", "r"], ["b", "a", "z"]]

But if we change the #map to a #flat_map, the result is a flattened array with all of the characters together.

%W[foo bar baz].map{|s| s.chars}
# => [["f", "o", "o"], ["b", "a", "r"], ["b", "a", "z"]]

%W[foo bar baz].flat_map{|s| s.chars}
# => ["f", "o", "o", "b", "a", "r", "b", "a", "z"]

Flat map, as its name suggests, combines the operations of mapping and flattening. But unlike when we do a map followed by a flatten, #flat_map does not build an intermediate array-of-arrays of the whole output before flattening it. Instead, it takes the array generated by each iteration of the block and concatenates it to the last one. For very large data sets, this could result in a significant savings in memory.

Also unlike #flatten, because #flat_map is a method that yields to a block, it can participate in lazy, streaming-style computations. If we make our array of strings lazy and then send #flat_map to it, we get back a lazy enumerator over the characters of the strings. We can take as many as we need, without forcing the entire source collection to be processed. As we saw in episode #278, this can be especially helpful when the source collection is very large, or when it is a stream that has no end.

chars = %W[foo bar baz].lazy.flat_map{|s| s.chars}

chars.first(6)                  # => ["f", "o", "o", "b", "a", "r"]

Now back to the tasks at hand. Let's apply #flat_map to our human-readable report. We change the #map to a #flat_map. We give the next keyword an empty array as an argument, so that #flat_map will have something to stitch into the the resulting concatenation of arrays. Then we get rid of the call to #compact. The result is the same report as before, but now we don't have to worry about remembering to #compact nils out of the data before presenting it.

require "./register"

date = nil
puts TRANSACTIONS.flat_map{|t|
  case t
  when Dateline
    date = t.date
    next []
  else
    "%s: %.2f to %s" % [date, t.amount / 100.0, t.payee]
  end
}

# >> 2015-01-01: 254.76 to Milliways
# >> 2015-01-01: 78.39 to Sirius Cybernetics
# >> 2015-01-01: 27.90 to Old Pink Dog Bar
# >> 2015-01-02: 499000000000000.00 to Magrathea
# >> 2015-01-02: 47.80 to Big Bang Burger Bar
# >> 2015-01-02: 78.50 to Megadodo

Moving on to our list of tags: again, we replace #map with #flat_map. We are then able to do away with the call to #flatten. The resultant array is pre-flattened, and all we need to do is remove duplicates and we're done.

require "./register"

TRANSACTIONS.grep(Payment).flat_map{|t|
  t.tags.to_s.split
}
# => ["dining", "travel", "dining", "real-estate", "travel"]

TRANSACTIONS.grep(Payment).flat_map{|t|
  t.tags.to_s.split
}.uniq
# => ["dining", "travel", "real-estate"]

One last thing: if you prefer the Smalltalk-style names to Lisp-style names, #flat_map is also aliased to #collect_concat.

So that is #flat_map, and that is all for today. Happy hacking!

Responses