# Logical Matrix with Piotr Szotkowski

In today’s episode, you’ll learn how to drastically reduce the amount of memory necessary to store large matrices of boolean data. It’s a technique that normally requires substantially rewriting code in order to do binary-level bit manipulation. But guest chef Piotr Szotkowski will show you how to accomplish it with just some trivial changes, using an obscure Ruby integer method.

## Video transcript & code

*There are so many little hidden gems of functionality in the Ruby core libraries. Sometimes I hold off on demonstrating one of these unsung features simply because I don't feel like I have just the right practical example to illustrate it.*

*For instance, in today's episode you're going to learn about a rarely-used capability of Ruby's integer classes for manipulating their internal bit representation. To show you why you might want to use this feature, we're joined today by guest chef *

*Piotr originally started dabbling in Ruby for his PhD thesis and went on to work as an assistant professor at Warsaw University of Technology. In his other life he runs a Ruby and JavaScript software house, develops a system for operating solar power plants, speaks at various Ruby conferences, coaches at Rails Girls Warsaw, organises Warsaw Ruby Users Group, co-maintains the Bogus and Reek gems, and writes small gems for personal use – like the word-wrapping Lovely Rufus we talked about in Episode #355 and Episode #356.*

*Here he is to talk about implementing logical matrices in Ruby. Enjoy!*

*--- Avdi*

Hello and welcome to another guest chef episode of RubyTapas. My name is Piotr (that’s Peter in Polish) and I’d love to talk to you about logical matrices and the `Integer#[]`

method.

As any avid RubyTapas connoisseur knows, there were many guest chefs before; to name but a few, Sam Livingston-Gray taught us about using Structs and refactoring our code, Rob Miller showed us how to use `Kernel.open`

in anger and Noel Rappin told us wonderful tall tales about building our own RSpec matchers. If we were to present those three chefs vis-à-vis the three topics of core Ruby features, refactoring practices and proper testing we could come up with a table not unlike this one.

```
# │ core │ ideas │ tests
# ──────┼──────┼───────┼───────
# Sam │ ✓ │ ✓ │
# Rob │ ✓ │ │
# Noel │ │ │ ✓
```

This table is an example of a logical matrix: a rectangular array of true/false values describing the relationships between a row (representing a given chef) and a column (representing a potential topic). A tick in any given cell means that the given chef covered the given topic.

Let’s figure out how we could build a system which we can ask whether a given chef covered a given topic – while storing this information in a memory-efficient manner. We don’t want to assume anything about the density of the matrix (the number of ‘true’ values relative to the size of the matrix; in the above case there are four ticks out of possible nine, so the density is ⁴⁄₉ or ~44%); there are better representations for dense and sparse matrices, but we want to go with something more universal.

Let’s put our `chefs`

in one `Array`

, our `topics`

in another and let’s represent the `matrix`

with an `Array`

of `Arrays`

of `true`

and `false`

values.

```
chefs = %w[Sam Rob Noel]
topics = %w[core ideas tests]
matrix = [
[true, true, false],
[true, false, false],
[false, false, true ],
]
```

Let’s create a `Guests`

class that takes `chefs`

, `topics`

and the boolean `matrix`

; the constructor will turn the `chefs`

and `topics`

`Arrays`

into lookup `Hashes`

. We can now lookup whether a given `chef`

covered a given `topic`

by first looking up their row and column numbers and subsequently indexing into the `matrix`

.

We can now query our `Guests`

object on whether a given `chef`

covered a given `topic`

.

```
class Guests
def initialize(chefs, topics, matrix)
@chefs = chefs.map.with_index.to_h
@topics = topics.map.with_index.to_h
# => {"core"=>0, "ideas"=>1, "tests"=>2}
@matrix = matrix
end
def covered?(chef, topic)
row = @chefs.fetch(chef)
col = @topics.fetch(topic)
@matrix[row][col]
end
end
guests = Guests.new(chefs, topics, matrix)
guests.covered?('Sam', 'core') # => true
guests.covered?('Rob', 'tests') # => false
```

Not diving too much into the internals of Ruby object representation (especially given that they differ between various implementations of the language), we can see that the boolean matrix requires us to store an `Array`

containing three more `Arrays`

, which in turn contain three boolean values each.

But booleans are not the only way to represent truth and falshood in a logical matrix. Rather than using booleans we could use zeroes and ones; traditionally zeroes represent falshood and ones represent truth.

If we were to switch from booleans to `Integers`

in the `matrix`

, we would have to also adjust the `#covered?`

method; we still want it to be a predicate, not return the literal `0`

or `1`

from the table.

We can verify that our new implementation works with the zero- and one-based `matrix`

.

```
matrix = [
[1, 1, 0],
[1, 0, 0],
[0, 0, 1],
]
class Guests
def covered?(chef, topic)
row = @chefs.fetch(chef)
col = @topics.fetch(topic)
@matrix[row][col] == 1
end
end
guests = Guests.new(chefs, topics, matrix)
guests.covered?('Sam', 'core') # => true
guests.covered?('Rob', 'tests') # => false
```

Memory-wise this does not get us anything, though: we still need an `Array`

containing three other `Arrays`

and nine `Integers`

between them. If anything, those `Integers`

might take more space than booleans (although they do not in CRuby)!

But wait a minute. We don’t really need full, 64-bit `Integers`

to represent truth and falshood; all we need are zero and one *bits/… Is there a way for us to represent the cells of the logic table on /bits* without forfeiting too much of its readability?

The *true* RubyTapas connoisseurs are shouting now ‘well of course there is, just go all the way back to The Original Tapa: Episode 1 on Binary Literals!’. In that episode we learned how to construct Ruby `Integers`

by prefixing their binary representation with `0b`

. Let’s try to use that with our `matrix`

; we will squash all of the zeroes and ones of every row into a single binary `Integer`

.

Rather than having an `Array`

holding three other `Arrays`

and nine `Integers`

we now have a single `Array`

of three `Integers`

(that incidentally happen to equal six, four and one) – but we don’t lose any of the relevant information.

```
matrix = [
0b110,
0b100,
0b001,
]
# => [6, 4, 1]
```

That’s (relatively) nice, but surely querying it will require some dark incantations of integer shifting and bitwise masking? Well, it turns out it doesn’t: thanks to the relatively-obscure `Integer#[]`

method our `#covered?`

method can stay *exactly the same*. As it turns out `Integer#[]`

– like most decent `#[]`

methods in Ruby – allows us to ‘index’ into the given `Integer`

; the index in question refers to the *n*-th youngest (or rightmost) bit position and the return value is the value of the bit – either a `1`

or a `0`

.

```
0b110 # => 6
0b110[0] # => 0
0b110[1] # => 1
0b110[2] # => 1
0b110[3] # => 0
```

To work with this kind of matrix we need another subtle, but crucial change: we’re now doing the column lookup from right to left (because the index of `0`

is the youngest, rightmost bit), so we need to reverse the `topics`

list – but if we do that early enough our `Guests`

implementation can stay the same.

```
matrix = [
0b110,
0b100,
0b001,
]
guests = Guests.new(chefs, topics.reverse, matrix)
guests.covered?('Sam', 'core') # => true
guests.covered?('Rob', 'tests') # => false
```

Is this the best possible representation of a logical matrix in Ruby? Probably not, but it’s relatively memory-efficient while still being very readable – all thanks to `Integer#[]`

.

The gain might seem trivial for a 3×3 matrix; with CRuby’s forty-byte objects and eight-byte immediates the original boolean matrix implemented as an `Array`

of three `Arrays`

and nine booleans takes 232 bytes, while the final binary matrix (an `Array`

of three `Integers`

) takes 64 bytes – as much as 28% of the original.

But an 62-row-and-column boolean matrix will take over 32 kilobyts, while the binary representation will take 536 bytes – 1.6% of the original. Things get a little more complicated above 62 columns, as we stop representing `Integers`

as the good old `Fixnum`

immediate values and start using Bignums, but that’s a story for another tapa – and for some more happy hacking!

## Responses