In Progress
Unit 1, Lesson 1
In Progress

Integer To String, Part 2

Video transcript & code

In the episode #381, we started exploring exactly what work the computer has to do when it shows us a string representation of an integer number. Today, we'll generalize the solution we came up with to handle any size number, and then talk a bit about how we went about coming up with a solution.

We're going to pick up right where we left off in episode #381, without doing any review. So if you haven't seen that episode yet, you might want to go back and watch it before we go any further.

Ready? Here we go.

OK, that wasn't so hard. Now how about a bigger number, like 123?

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
remainder = number % 10           # => 3
codepoint = codepoints[remainder] # => 51
result    = codepoint.chr         # => "3"
if number > 9                     # => true
  tens      = number / 10
  codepoint = codepoints[tens]
  result.prepend(codepoint.chr)
end
result                          # =>

# ~> NoMethodError
# ~> undefined method `chr' for nil:NilClass
# ~>
# ~> xmptmp-in7940rNU.rb:21:in `<main>'

This is the same problem we had when we moved from single-digit numbers to double digits. We could add another branch to our conditional for handling hundreds, but we can't just keep adding branches forever.

When we find ourselves having to extend a conditional indefinitely, that's usually a sign that it's time to switch to a loop. Let's go ahead and do that.

First, we'll get rid of the existing conditional.

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
remainder = number % 10           # => 3
codepoint = codepoints[remainder] # => 51
result    = codepoint.chr         # => "3"
tens      = number / 10
codepoint = codepoints[tens]
result.prepend(codepoint.chr)
result                          # =>

Then, we'll replace that section with a loop.

This section is no longer going to be specific to just tens, so we rename that variable. We decide to just re-use the number variable: once we've finished with the ones place, it will represent the number of tens, then the number of hundreds, and so on.

In a number like "123", there are 12 tens. We can't just look up "12" in the codepoints map. We need to isolate just the tens place. So once again, we use modulo to get the remainder after division by ten. That remainder is what we look up.

As before, we'll prepend the digit we looked up to the result string.

Right now this is an infinite loop. We need to introduce a rule for when to finish the loop. We'll do that right after performing the first division. If the result is zero, there are no places left, so we'll break at that point.

Let's execute this code.

It works!

If we play around with it, we can see it also works for other numbers.

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
remainder = number % 10           # => 3
codepoint = codepoints[remainder] # => 51
result    = codepoint.chr         # => "3"
loop do
  number        = number / 10
  break if number == 0
  remainder     = number % 10
  codepoint     = codepoints[remainder]
  result.prepend(codepoint.chr)
end
result                          # => "123"

Now that we have this code working, let's tighten it up a little bit.

First off, we constructed a loop before we knew exactly what condition we'd break on. Now that we have an answer, let's take it from a generic loop to a more traditional until loop. We'll do that by moving both the opening division and the test for a zero result into the test part of the until loop.

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
remainder = number % 10           # => 3
codepoint = codepoints[remainder] # => 51
result    = codepoint.chr         # => "3"
until (number = number / 10) == 0
  remainder     = number % 10
  codepoint     = codepoints[remainder]
  result.prepend(codepoint.chr)
end
result                          # => "123"

When we look at what we have left, we see an interesting pattern. The lines inside the loop almost duplicate the lines above the loop.

Let's eliminate the last remaining difference. We'll move the initialization of the result variable up above the rest of the logic.

Once we do this, we can use the same prepending code both above the loop and inside the loop.

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
result    = ""
remainder = number % 10           # => 3
codepoint = codepoints[remainder] # => 51
result.prepend(codepoint.chr)     # => "3"
until (number = number / 10) == 0
  remainder     = number % 10   # => 2, 1
  codepoint     = codepoints[remainder] # => 50, 49
  result.prepend(codepoint.chr)         # => "23", "123"
end
result                          # => "123"

At this point, we have a pattern which may be familiar to you if you remember episode #73. Whenever we have a loop where we see a hunk of code that should always be executed at least once, and then optionally some additional number of times, we can consolidate it into a "do-while" loop.

In Ruby, do-while loops are constructed using begin, end, and a statement modifier. We can easily modify our loop by putting begin at the beginning, and moving the condition to the end.

Once our do-while loop is in place, we can get rid of the duplicated first-round code.

codepoints = {
  0b0000 => 48,
  0b0001 => 49,
  0b0010 => 50,
  0b0011 => 51,
  0b0100 => 52,
  0b0101 => 53,
  0b0110 => 54,
  0b0111 => 55,
  0b1000 => 56,
  0b1001 => 57,
}
# => {0=>48, 1=>49, 2=>50, 3=>51, 4=>52, 5=>53, 6=>54, 7=>55, 8=>56, 9=>57}
number    = 0b01111011            # => 123
result    = ""
begin
  remainder = number % 10   # => 3, 2, 1
  codepoint = codepoints[remainder] # => 51, 50, 49
  result.prepend(codepoint.chr) # => "3", "23", "123"
end until (number = number / 10) == 0
result                          # => "123"

Once again, we can try various numbers on this code, and see that they are correctly converted to ASCII strings.

So now you know how to do something that Ruby already gives you for free! But hopefully, you've picked up one or two tricks along the way. As well as gaining some insight into aspects of computer operation that we don't often think about.

Most of all, I hope this episode has given you some inspiration for how to approach algorithmic problems. All too often, when I'm confronted with a problem like this one, I jump too quickly into building abstract solutions. I do things like trying to figure out the looping version immediately, skipping any prior incremental steps. As a result, I do a lot of backtracking and head-scratching when things don't work quite as I'd imagined in my head.

In today's episode, we didn't succumb to that temptation. We started out with flat, straight-line, naive code that only handled a representative subset of the problem space. This enabled us to get a good handle on the problem, before we abstracted and consolidated into an elegant loop.

By the way, in case you were wondering, Ruby's implementation of this functionality is found in a C function called rb_fix2str, in the source file numeric.c. It has some additional code for handling bases other than 10, and for handling negative numbers. Other than that, the core logic works exactly the same as what we wrote here.

VALUE
rb_fix2str(VALUE x, int base)
{
    char buf[SIZEOF_VALUE*CHAR_BIT + 2], *b = buf + sizeof buf;
    long val = FIX2LONG(x);
    int neg = 0;

    if (base < 2 || 36 < base) {
        rb_raise(rb_eArgError, "invalid radix %d", base);
    }
    if (val == 0) {
        return rb_usascii_str_new2("0");
    }
    if (val < 0) {
        val = -val;
        neg = 1;
    }
    *--b = '\0';
    do {
        *--b = ruby_digitmap[(int)(val % base)];
    } while (val /= base);
    if (neg) {
        *--b = '-';
    }

    return rb_usascii_str_new2(b);
}

And that's it for today. Happy hacking!

Responses