In Progress
Unit 1, Lesson 1
In Progress

Zip

Today’s topic might feel like fundamentals if you have a background in functional programming. On the other hand, if you’re not as familiar with FP styles, it might be brand new to you. We’ll be digging into techniques for interleaving elements from multiple source collections at once. What does Ruby have to do with a classic fabric fastener? Read or watch to find out!

Video transcript & code

Let's say we need to add two arrays together.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

And when I say "add", I mean: we want to add every number in the first array to the corresponding number in the second array.

How do we go about this?

Well, when we iterate over arrays in Ruby we usually default to the each method.

But wait, how do we iterate over the second array...

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.each do |n|
  a2 #...  
end

Given an element from the first array, what do we need to get the corresponding element from the second? We need the index.

We could kick it old-school here and make a for loop from zero to the last index of the first array.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

for i in (0...a1.size)
  puts a1[i] + a2[i]
end

# >> 1
# >> 5
# >> 9
# >> 13
# >> 17
# >> 21

A more Ruby-ish idiom for this is each_with_index, where we iterate over every element of the first array along with the index of that element.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.each_with_index do |e, i|
  puts e + a2[i]
end

# >> 1
# >> 5
# >> 9
# >> 13
# >> 17
# >> 21

OK, but actually we don't want to print out the results, we want a new array with the results in it.

When we want to functionally transform input collections into an output collection, we usually turn to map.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.map {
  # ...
}

I'm following the Weirich block convention here, where blocks used for their imperative side-effects use do...end, and blocks used for their functional return values use curly braces.

There's no map_with_index in Ruby...

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.map_with_index { # ~> NoMethodError: undefined method `map_with_index' for [0, 2, 4, 6, 8, 10]:Array
  # ...
}

# ~> NoMethodError
# ~> undefined method `map_with_index' for [0, 2, 4, 6, 8, 10]:Array
# ~>
# ~> tapas.rb:4:in `<main>'

But there's map.with_index. We get both an element and an index yielded to the block.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.map.with_index { |e, i|
  e + a2[i]
}
# => [1, 5, 9, 13, 17, 21]

So that's a fun little review tour of iteration options in Ruby, but as you've probably guessed from the title, we're actually here to talk about a method that's purpose-built for this kind array-merging operation.

The zip method is available on any enumerable Ruby collection.

We give it a second collection.

And it returns an array of sub-arrays.

Each one contains an element from the first array, and an element from the second.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.zip(a2)
# => [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11]]

This operation is known as a "zip", because it's kind of like a how a zipper on jacket interleaves two separate sets of teeth.

This array-of-arrays is a good candidate for maping over.

We could then take each sub-array and sum it.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.zip(a2).map { |elements|
  elements.sum
}
# => [1, 5, 9, 13, 17, 21]

This can be abbreviated with the ampersand symbol-to-proc operator.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.zip(a2).map(&:sum)
# => [1, 5, 9, 13, 17, 21]

zip can also accept a block directly, in which case it returns nothing. Instead it imperatively executes the block for each set of input elements.

a1 = [0, 2, 4, 6, 8, 10]
a2 = [1, 3, 5, 7, 9, 11]

a1.zip(a2) do
  puts _1 + _2
end
# => nil

# >> 1
# >> 5
# >> 9
# >> 13
# >> 17
# >> 21

zip is not limited to zipping together just two collections.

Let's say we have a set of arrays for temperature data, each with one entry for each day of the week.

We can zip all of these together.

#      S   M   T   W   T   F   S
STL = [85, 58, 65, 70, 63, 59, 66]
TYS = [76, 77, 63, 64, 73, 77, 79]
BNA = [83, 75, 64, 68, 74, 79, 76]
SFO = [86, 86, 94, 84, 76, 71, 71]

STL.zip(TYS, BNA, SFO).map {
  # ...
}

And then calculate daily averages.

#      S   M   T   W   T   F   S
STL = [85, 58, 65, 70, 63, 59, 66]
TYS = [76, 77, 63, 64, 73, 77, 79]
BNA = [83, 75, 64, 68, 74, 79, 76]
SFO = [86, 86, 94, 84, 76, 71, 71]

STL.zip(TYS, BNA, SFO).map { |temps|
  temps.sum / temps.size
}
 # => [82, 74, 71, 71, 71, 71, 73]

Or we could zip the averages together with their day of the week.

#      S   M   T   W   T   F   S
STL = [85, 58, 65, 70, 63, 59, 66]
TYS = [76, 77, 63, 64, 73, 77, 79]
BNA = [83, 75, 64, 68, 74, 79, 76]
SFO = [86, 86, 94, 84, 76, 71, 71]

"SMTWTFS".chars.zip(STL, TYS, BNA, SFO).map { |day, *temps|
  [day, temps.sum / temps.size]
}
# => [["S", 82],
#     ["M", 74],
#     ["T", 71],
#     ["W", 71],
#     ["T", 71],
#     ["F", 71],
#     ["S", 73]]

So far we've been using arrays with uniform length as inputs.

But what happens when some of the arrays are longer than others?

Well as you can see here, in that case zip pads any missing elements from shorter arrays with nil.

#      S   M   T   W   T   F   S
STL = [85, 58, 65, 70, 63, 59, 66]
TYS = [76, 77, 63, 64, 73]
BNA = [83, 75, 64, 68, 74, 79, 76]
SFO = [86, 86, 94, 84, 76, 71]

"SMTWTFS".chars.zip(STL, TYS, BNA, SFO)
# => [["S", 85, 76, 83, 86],
#     ["M", 58, 77, 75, 86],
#     ["T", 65, 63, 64, 94],
#     ["W", 70, 64, 68, 84],
#     ["T", 63, 73, 74, 76],
#     ["F", 59, nil, 79, 71],
#     ["S", 66, nil, 76, nil]]

If we try to do our averaging now, we run into trouble.

#      S   M   T   W   T   F   S
STL = [85, 58, 65, 70, 63, 59, 66]
TYS = [76, 77, 63, 64, 73]
BNA = [83, 75, 64, 68, 74, 79, 76]
SFO = [86, 86, 94, 84, 76, 71]

"SMTWTFS".chars.zip(STL, TYS, BNA, SFO).map { |day, *temps|
  [day, temps.sum / temps.size] # ~> TypeError: nil can't be coerced into Integer
}
# =>

# ~> TypeError
# ~> nil can't be coerced into Integer
# ~>
# ~> tapas.rb:8:in `+'
# ~> tapas.rb:8:in `sum'
# ~> tapas.rb:8:in `block in <main>'
# ~> tapas.rb:7:in `map'
# ~> tapas.rb:7:in `<main>'

If we want our zip code to work with inputs of varying length, we need to make provisions for it. In this example, we can compact out any missing data points.

#      S   M   T   W   T   F   S
STL = [85, 58, 65, 70, 63, 59, 66]
TYS = [76, 77, 63, 64, 73]
BNA = [83, 75, 64, 68, 74, 79, 76]
SFO = [86, 86, 94, 84, 76, 71]

"SMTWTFS".chars.zip(STL, TYS, BNA, SFO).map { |day, *temps|
  temps.compact!
  [day, temps.sum / temps.size]
}
# => [["S", 82],
#     ["M", 74],
#     ["T", 71],
#     ["W", 71],
#     ["T", 71],
#     ["F", 69],
#     ["S", 71]]

It's important to note, though, that .zip doesn't iterate up to the length of the longest array in its set of inputs. It always iterates once for every element in the collection that .zip was called on. In this case, that's the set of weekday abbreviations.

So if we wanted to limit the output to the days for which we have complete data, we could first calculate the shortest dataset by mapping over the array sizes and taking the smallest.

Then we could iterate through only that many days of the week.

The result is an output array that stops at Thursday.

#      S   M   T   W   T   F   S
STL = [85, 58, 65, 70, 63, 59, 66]
TYS = [76, 77, 63, 64, 73]
BNA = [83, 75, 64, 68, 74, 79, 76]
SFO = [86, 86, 94, 84, 76, 71]

datasets = [STL, TYS, BNA, SFO]
shortest = datasets.map(&:size).min  # => 5

"SMTWTFS".chars[0,shortest].zip(*datasets).map { |day, *temps|
  [day, temps.sum / temps.size]
}
# => [["S", 82], ["M", 74], ["T", 71], ["W", 71], ["T", 71]]

And there you have it: the .zip method for interleaving collections. Happy hacking!

Responses