Video transcript & code
A few episodes back we compared separate division and modulo operations to using the
#divmod method. In that episode, I said: "Only if this code were at the heart of some serious number-crunching logic could this doubled operation possibly matter from a performance standpoint." And I was careful to say that using
#divmod was really a choice of style rather than an optimization.
index = 42 page_size = 10 # Separate operators page = index / page_size offset = index % page_size # Divmod page, offset = index.divmod(page_size)
In the comments on that episode, alert viewer Federico Martin did some benchmarking, and discovered another reason not to use
#divmod purely for optimization reasons. On MRI, it's actually slower than the separate operators version!
I had to try this for myself. Here we can see a benchmark set up. In one example, there is a division followed by a remainder operation. In the second, there is a single call to
require "benchmark/ips" index = 42 page_size = 10 Benchmark.ips do |x| x.report("operators") do |times| 1.upto(times) do index / page_size index % page_size end end x.report("divmod") do |times| 1.upto(times) do index.divmod(page_size) end end x.compare! end
When we run this benchmark and wait for the results, we can see that the
#divmod version is, indeed, a bit slower.
Now, intellectually we know that we shouldn't make assumptions about the relative speed of things without benchmarking them. But this seems really counter-intuitive. After all, in Ruby, all operators are really just message sends. Which means that in the operator version there are two message sends. Whereas in the divmod version, there is just one.
The first thought I had when I saw this result was that I wanted to see what kind of Ruby bytecode instructions were being produced for these two different examples. Fortunately, Ruby 2 includes tools to invoke the compiler directly from inside Ruby, and then dump a disassembled view. Let's go ahead and take a look at how the Ruby VM sees these two different examples.
First, the separate division and modulo.
code = <<EOL index = 42 page_size = 10 index / page_size index % page_size EOL puts RubyVM::InstructionSequence.compile(code).disasm
I'm not going to get deep into the nitty-gritty of what all of this means. But it's pretty much the code we'd expect to see. There is a method call for the division operation, followed by a method call for the modulo operation.
Now let's look at the
code = <<EOL index = 42 page_size = 10 index.divmod(page_size) EOL puts puts puts RubyVM::InstructionSequence.compile(code).disasm
We can see right off the bat that this version has fewer instructions. And it contains just a single method call, to
It seems our culprit is not to be found in the bytecode. It must be in the implementation of
In MRI, the
#divmod method is defined in
numeric.c, a fact I found out with a little grepping. It's a one-liner, and there are no surprises to be found here. It calls the division and modulo operator implementations internally. Then it uses
rb_assoc_new to return a new two-element array containing the resulting values.
And that last bit is almost certainly the source of the slowdown we're seeing. The single meaningful difference between separate operators and
#divmod is that
#divmod always allocates a new two-element array for its return value. This one extra object allocation accounts for the difference in speed.
So what's the moral of this story? It's just another reminder that we should never, ever make assumptions about what code is "faster" without first benchmarking to be sure. Even in cases where it seems transparently obvious which version ought to be faster, our assumptions often turn out to be wrong.
My conclusion is that we should focus first on writing clear code, without making trade-offs based on what we think is faster. Our best bet is still, in the words of Kent Beck, to "Make it work; then make it right; then make it fast".