In Progress
Unit 1, Lesson 1
In Progress

Logical Matrix with Piotr Szotkowski

In today’s episode, you’ll learn how to drastically reduce the amount of memory necessary to store large matrices of boolean data. It’s a technique that normally requires substantially rewriting code in order to do binary-level bit manipulation. But guest chef Piotr Szotkowski will show you how to accomplish it with just some trivial changes, using an obscure Ruby integer method.

Video transcript & code

There are so many little hidden gems of functionality in the Ruby core libraries. Sometimes I hold off on demonstrating one of these unsung features simply because I don't feel like I have just the right practical example to illustrate it.

For instance, in today's episode you're going to learn about a rarely-used capability of Ruby's integer classes for manipulating their internal bit representation. To show you why you might want to use this feature, we're joined today by guest chef 

Piotr originally started dabbling in Ruby for his PhD thesis and went on to work as an assistant professor at Warsaw University of Technology. In his other life he runs a Ruby and JavaScript software house, develops a system for operating solar power plants, speaks at various Ruby conferences, coaches at Rails Girls Warsaw, organises Warsaw Ruby Users Group, co-maintains the Bogus and Reek gems, and writes small gems for personal use – like the word-wrapping Lovely Rufus we talked about in Episode #355 and Episode #356.

Here he is to talk about implementing logical matrices in Ruby. Enjoy!

--- Avdi

Hello and welcome to another guest chef episode of RubyTapas. My name is Piotr (that’s Peter in Polish) and I’d love to talk to you about logical matrices and the Integer#[] method.

As any avid RubyTapas connoisseur knows, there were many guest chefs before; to name but a few, Sam Livingston-Gray taught us about using Structs and refactoring our code, Rob Miller showed us how to use Kernel.open in anger and Noel Rappin told us wonderful tall tales about building our own RSpec matchers. If we were to present those three chefs vis-à-vis the three topics of core Ruby features, refactoring practices and proper testing we could come up with a table not unlike this one.

#       │ core │ ideas │ tests
# ──────┼──────┼───────┼───────
#  Sam  │  ✓   │   ✓   │
#  Rob  │  ✓   │       │
#  Noel │      │       │   ✓

This table is an example of a logical matrix: a rectangular array of true/false values describing the relationships between a row (representing a given chef) and a column (representing a potential topic). A tick in any given cell means that the given chef covered the given topic.

Let’s figure out how we could build a system which we can ask whether a given chef covered a given topic – while storing this information in a memory-efficient manner. We don’t want to assume anything about the density of the matrix (the number of ‘true’ values relative to the size of the matrix; in the above case there are four ticks out of possible nine, so the density is ⁴⁄₉ or ~44%); there are better representations for dense and sparse matrices, but we want to go with something more universal.

Let’s put our chefs in one Array, our topics in another and let’s represent the matrix with an Array of Arrays of true and false values.

chefs  = %w[Sam Rob Noel]
topics = %w[core ideas tests]

matrix = [
  [true,  true,  false],
  [true,  false, false],
  [false, false, true ],
]

Let’s create a Guests class that takes chefs, topics and the boolean matrix; the constructor will turn the chefs and topics Arrays into lookup Hashes. We can now lookup whether a given chef covered a given topic by first looking up their row and column numbers and subsequently indexing into the matrix.

We can now query our Guests object on whether a given chef covered a given topic.

class Guests
  def initialize(chefs, topics, matrix)
    @chefs  = chefs.map.with_index.to_h
    @topics = topics.map.with_index.to_h
    # => {"core"=>0, "ideas"=>1, "tests"=>2}
    @matrix = matrix
  end

  def covered?(chef, topic)
    row = @chefs.fetch(chef)
    col = @topics.fetch(topic)
    @matrix[row][col]
  end
end

guests = Guests.new(chefs, topics, matrix)
guests.covered?('Sam', 'core')  # => true
guests.covered?('Rob', 'tests') # => false

Not diving too much into the internals of Ruby object representation (especially given that they differ between various implementations of the language), we can see that the boolean matrix requires us to store an Array containing three more Arrays, which in turn contain three boolean values each.

But booleans are not the only way to represent truth and falshood in a logical matrix. Rather than using booleans we could use zeroes and ones; traditionally zeroes represent falshood and ones represent truth.

If we were to switch from booleans to Integers in the matrix, we would have to also adjust the #covered? method; we still want it to be a predicate, not return the literal 0 or 1 from the table.

We can verify that our new implementation works with the zero- and one-based matrix.

matrix = [
  [1, 1, 0],
  [1, 0, 0],
  [0, 0, 1],
]

class Guests
  def covered?(chef, topic)
    row = @chefs.fetch(chef)
    col = @topics.fetch(topic)
    @matrix[row][col] == 1
  end
end

guests = Guests.new(chefs, topics, matrix)
guests.covered?('Sam', 'core')  # => true
guests.covered?('Rob', 'tests') # => false

Memory-wise this does not get us anything, though: we still need an Array containing three other Arrays and nine Integers between them. If anything, those Integers might take more space than booleans (although they do not in CRuby)!

But wait a minute. We don’t really need full, 64-bit Integers to represent truth and falshood; all we need are zero and one bits/… Is there a way for us to represent the cells of the logic table on /bits without forfeiting too much of its readability?

The true RubyTapas connoisseurs are shouting now ‘well of course there is, just go all the way back to The Original Tapa: Episode 1 on Binary Literals!’. In that episode we learned how to construct Ruby Integers by prefixing their binary representation with 0b. Let’s try to use that with our matrix; we will squash all of the zeroes and ones of every row into a single binary Integer.

Rather than having an Array holding three other Arrays and nine Integers we now have a single Array of three Integers (that incidentally happen to equal six, four and one) – but we don’t lose any of the relevant information.

matrix = [
  0b110,
  0b100,
  0b001,
]
# => [6, 4, 1]

That’s (relatively) nice, but surely querying it will require some dark incantations of integer shifting and bitwise masking? Well, it turns out it doesn’t: thanks to the relatively-obscure Integer#[] method our #covered? method can stay exactly the same. As it turns out Integer#[] – like most decent #[] methods in Ruby – allows us to ‘index’ into the given Integer; the index in question refers to the n-th youngest (or rightmost) bit position and the return value is the value of the bit – either a 1 or a 0.

0b110    # => 6
0b110[0] # => 0
0b110[1] # => 1
0b110[2] # => 1
0b110[3] # => 0

To work with this kind of matrix we need another subtle, but crucial change: we’re now doing the column lookup from right to left (because the index of 0 is the youngest, rightmost bit), so we need to reverse the topics list – but if we do that early enough our Guests implementation can stay the same.

matrix = [
  0b110,
  0b100,
  0b001,
]

guests = Guests.new(chefs, topics.reverse, matrix)
guests.covered?('Sam', 'core')  # => true
guests.covered?('Rob', 'tests') # => false

Is this the best possible representation of a logical matrix in Ruby? Probably not, but it’s relatively memory-efficient while still being very readable – all thanks to Integer#[].

The gain might seem trivial for a 3×3 matrix; with CRuby’s forty-byte objects and eight-byte immediates the original boolean matrix implemented as an Array of three Arrays and nine booleans takes 232 bytes, while the final binary matrix (an Array of three Integers) takes 64 bytes – as much as 28% of the original.

But an 62-row-and-column boolean matrix will take over 32 kilobyts, while the binary representation will take 536 bytes – 1.6% of the original. Things get a little more complicated above 62 columns, as we stop representing Integers as the good old Fixnum immediate values and start using Bignums, but that’s a story for another tapa – and for some more happy hacking!

Responses