In Progress
Unit 1, Lesson 1
In Progress

Loop Golf

Video transcript & code

In episode #381, we went back to basics and worked through how to translate integer numbers to ASCII string representations. At the end of that episode, we took a look at Ruby's internal C implementation of this functionality.

VALUE
rb_fix2str(VALUE x, int base)
{
    char buf[SIZEOF_VALUE*CHAR_BIT + 2], *b = buf + sizeof buf;
    long val = FIX2LONG(x);
    int neg = 0;

    if (base < 2 || 36 < base) {
        rb_raise(rb_eArgError, "invalid radix %d", base);
    }
    if (val == 0) {
        return rb_usascii_str_new2("0");
    }
    if (val < 0) {
        val = -val;
        neg = 1;
    }
    *--b = '\0';
    do {
        *--b = ruby_digitmap[(int)(val % base)];
    } while (val /= base);
    if (neg) {
        *--b = '-';
    }

    return rb_usascii_str_new2(b);
}

A lot of this code is setup for handling a range of number bases, as well as for handling negative numbers. These are both aspects we didn't tackle in our own code, but they don't substantially complicate the core algorithm. That algorithm is found on these 3 lines:

do {
    *--b = ruby_digitmap[(int)(val % base)];
} while (val /= base);

Let's compare these lines to the Ruby equivalent that we wrote.

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
result    = ""
begin
  remainder = number % 10   # => 3, 2, 1
  codepoint = codepoints[remainder] # => 51, 50, 49
  result.prepend(codepoint.chr) # => "3", "23", "123"
end until (number = number / 10) == 0
result                          # => "123"

In particular these five lines:

begin
  remainder = number % 10
  codepoint = codepoints[remainder]
  result.prepend(codepoint.chr)
end until (number = number / 10) == 0

Now, I submit to you that the Ruby code here is substantially more readable than the C version. I like this Ruby code, and I don't think it calls for any particular stylistic improvement.

That said… Ruby is a higher level language than C. In general, Ruby code ought to be able to accomplish more in fewer lines and characters than C. It might be fun to see how much more concise we could make our Ruby code. And in particular, how much we can tighten it up without rendering the code too obscure.

Let's see what we can do. First off, we can look for some low-hanging fruit. The remainder variable is just an "explaining variable". It exists to give an explanatory name to an intermediate value.

As such, I like it. But it won't hurt our code's readability too badly if we inline the remainder variable.

begin
  codepoint = codepoints[number % 10] # => 51, 50, 49
  result.prepend(codepoint.chr) # => "3", "23", "123"
end until (number = number / 10) == 0
result                          # => "123"

For that matter, we can do the same with the codepoints variable.{{{shot(6}}}}

begin
  result.prepend(codepoints[number % 10].chr)
end until (number = number / 10) == 0
result                          # => "123"

I want to stop here for a second. I think this code is still pretty clear, but it's hard to argue that it's more so than it was before. There's a lot more going on in a single line, and we don't have the explaining variables around to give us little reminders, and to separate out the steps.

I stopped here because it often feels like overkill to extract an inner expression out to an explaining variable. I think it's easier to see the value of explaining variables when we go the opposite direction: When we trade the explicitness of the variable for greater concision.

Anyway, moving on…

There's another virtual freebie we can take advantage of: we can reduce number = number / 10 to "number /= 10".

This works exactly like "plus-equal" and "minus-equal" syntax sugars: it divides and assigns all at once.

begin
  result.prepend(codepoints[number % 10].chr)
end until (number /= 10) == 0
result                          # => "123"

Now, at this point we only have a single line in the begin…end block. Normally, we use a begin…end block to group a series of statements together as if they were a single expression.

Since there's only one statement left, it would seem like we could remove the block and make all the code into a one-liner.

But this breaks our code, producing "12" instead of "123".

result.prepend(codepoints[number % 10].chr) until (number /= 10) == 0
result                          # => "12"

What happened here? Well, it turns out that by using the until keyword at the end of a begin…end block, we were taking advantage of a special Ruby parsing rule. Ordinarily, the while and until statement modifiers work exactly like while and until loops, evaluating their condition first, and then evaluating their bodies. The fact that the statement modifier versions put their conditions after their bodies is just a syntactical nicety; they are still executed in the usual order.

But when Ruby sees a statement modifier immediately following a begin…end block, it changes the rules.

In that case, it treats the whole construct like a do...while loop in other languages. We talked about this a bit in episode #73.

begin
  result.prepend(codepoints[number % 10].chr)
end until (number /= 10) == 0
result                          # => "123"

So we can't simply strip off the begin...end keywords, despite the fact that they only contain one line.

Here's something that I learned as I was researching this episode: Matz actually considers this special do…while behavior of statement modifiers to be a mistake, and it may be removed from the language at some point in the future. I rather like having a do…while type loop in the language. But I do have to agree with Matz that this obscure special rule is a pretty clear violation of the principle of least surprise.

So what does Matz suggest in its place? A simple loop, with an explicit conditional break following the main loop work.

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
result    = ""
loop do
  result.prepend(codepoints[number % 10].chr)
  break if (number /= 10) == 0
end
result                          # => "123"

Both this solution and the one before it using a begin…end block are still a bit longer than the equivalent C code. So we haven't quite achieved our original goal of golfing the code down smaller than the C version. We've still slimmed down our quite a bit. And I'd say that either version is still far more readable than the C code.

The C code's extreme brevity stems largely from four attributes:

  • Shorter variable names. We could shorten our variables as well, but I like the fact that ours are more self-explanatory.
  • C has a briefer syntax for do-while loops. We can't do much about that one.
  • The use of post-increment operators, something Ruby doesn't have. And for good reason; post-increments are one of the most common sources of confusion in comprehending C code.
  • The fact that 0 in C doubles as a "falsey" value. Again, this is a case where Ruby's stricter interpretation mean we have to do a bit more work, but which makes our code more purpose-revealing.
do {
    *--b = ruby_digitmap[(int)(val % base)];
} while (val /= base);

An experienced code golfer could probably go a lot further to trim our code down. If you want to, you're welcome to try your hand at it, and submit your version in the episode comments. But I suspect that any further golfing is going to be at the expense of readability.

Ruby is, if nothing else, a highly expressive language. You can say a lot in a few tokens, and still have it read reasonably well. I can't recommend putting a high priority on minimizing the line or token counts in your production code. But a little code golf now and then can be highly satisfying, and can help keep our language understanding sharp.

Happy hacking!

Responses