Abstraction and Performance – Part 1
Years ago there was a Ruby language feature which I found powerful, but which people warned me about using because it was also quite inefficient in the Ruby implementations of the time. I remember asking Matz, the creator of Ruby, about this at a conference. His answer was that Ruby programmers should use the features that make them happy, and that it’s the job of language implementers, like Matz and his team, to ensure that those features are also fast.
This idea—that the language should serve programmer happiness first and foremost—is one of the things that makes Ruby unique. Chris Seaton, our guest chef today, is keeping that spirit alive. Chris is the head of the TruffleRuby project. Today in the first of a two-part series he’s going to give us an example of how abstraction and runtime efficiency can sometimes be at odds. Then, in part two, he’ll show how he and other language implementers are tackling this problem so we don’t have to choose one or the other. Enjoy!
Video transcript & code
Abstraction and the Past and Future of Ruby
First Episode
The clamp
method
Here's a method that I think says something about where Ruby is now and where it could be going in the future.
def clamp(num, min, max)
[min, num, max].sort[1]
end
clamp(89, 90, 100) # => 90
clamp(101, 90, 100) # => 101
clamp(110, 90, 100) # => 100
It's a routine from the PSD.rb library for reading Photoshop documents, written by Ryan LeFevre.
The method takes a minimum and maximum number, and returns a third number that was either already in this range, or the minimum or maximum if it was outside the range. It's used as part of converting a value from the CMYK to RGB colour space.
What's interesting to me about this method is the level of abstraction at which it has been written, and what this has meant for its performance and goals for Ruby implementations to improve. I found it while writing my PhD and this one little method ended up motivating several years of my research at university and Oracle Labs.
What is good about this method?
What is it that I think is good about this method?
The method works by creating an array from a literal of the three numbers, call
the Array#sort
method, which returns a new sorted array, and then the middle
value of the sorted array is returned.
[A blank screen is displayed here]
A lot of different people from across different approaches to programming would have their own terminology to describe what is good about writing code in this way.
The method is a single line so you can understand all of what it does by
looking at just that one line. It couldn't reasonably be simplified or focused
any further.
The code is a simple composition of core routines, creating an array,
sorting it, and indexing the new array. There's not really any logic or
algorithm in this method - it delegates all that to the core routines and
only combines them to make a useful new method.
The code has linear control-flow, meaning that there are no branches such as
if
statements. This allows you to look at the method and to only have to think
about the values flowing through it, not which branches it will take. There is
only one return point in the method, and it's implicit.
The code is declarative, in that it says what it produces, rather than how. It
produces an array, that is sorted, and the middle value from that array is
returned. You can read those three things across from the start to the end of
the line, and understand it without having to think much more.
The code is immutable, meaning that the objects created are not modified after
being created. This means that you never need to worry about what state an
object is in. The sorted array is always sorted because it's created like that.
The code is in single assignment form, meaning that names only ever have one
value. The input parameter names are assigned a value when the method is called,
and we never modify these names. We don't have any other local variables at all.
What is the problem with it?
The clamp
method has these excellent qualities, but it also has some serious
problems.
To give some context, this method is called for each of the R, G and B colour values for every pixel in potentially some very large images. It's definitely an inner-loop operation and we can reasonably predict that its performance therefore makes a difference to the responsiveness of apps using this library.
A first problem with the method is the number of objects it allocates. We can rewrite it to make these allocations a bit more clear.
def clamp(num, min, max)
object_1 = Array.new(3) # object allocated
object_1[0] = num
object_1[1] = min
object_1[2] = max
object_2 = Array.new(3) # object allocated
sort_from_into object_1, object_2
object_2[1]
end
It allocates one array for the literal, and then a second array for the new
sorted version (remember that the Array#sort
routine returns a new array, it
doesn't modify the existing one). Objects are central to Ruby and a normal Ruby
program will be allocating a lot of them, but allocating an object in
traditional Ruby implementations is not free - in fact it can become a serious
cost. You may have heard about people doing a lot of work to try to improve
Ruby's garbage collection - well the best way to improve garbage collection is
to not generate garbage. And in the context of how it is being used - called
multiple times for every pixel in an image - this method is generating an
avalanche of garbage. To convert a photo of the size captured by a normal iPhone
from CMYK to RGB, the clamp
method alone will generate about a gigabyte of
garbage.
A second problem is that the method calls into a core library routine,
Array#sort
. It's a bit trickier to explain what my problem with this is, and
I'll show more later, but essentially the problem with using core library
routines is that makes it much harder for a Ruby implementation to optimise this
logic. Most Ruby implementations have special handling for making operations
like <
fast - but there are are too many core library routines to do this for
all of them, and methods like sort
aren't optimised in the same way.
We know that the performance of the method has been a problem in practice, because LeFevre has rewritten it twice.
VALUE psd_native_util_clamp(VALUE self, VALUE r_num, VALUE r_min, VALUE r_max) {
int num = FIX2INT(r_num);
int min = FIX2INT(r_min);
int max = FIX2INT(r_max);
return num > max ? r_max : (num < min ? r_min : r_num);
}
The first rewrite was a C extension version of the method. I really like the way
that this was done - the PSD.rb gem has a C extension counterpart called
psd_native, which replaces just a few key methods such as clamp
. It leaves the
original Ruby code there, so you can read it and understand what it does without
having to read the C code.
The C code loses all the qualities that we said we liked about the original Ruby
code. That nested ternary (the ?
operator) is quite hard to read even for
someone who knows C. But it does remove the allocation, and the call to the
core library routine.
def clamp(num, min, max)
return max if num > max
return min if num < min
num
end
The second rewrite was to the original Ruby code, and also removed both the allocation and the use of the core library routine. But like in C we've lost those qualities. This code now has two branches, three return points, is three-times as long, and I think much harder to understand.
What does this tell us about Ruby?
What do I think this method tells us about Ruby?
I think it's a really good thing that Ruby allows and encourages people to write
the high-level, abstracted code that they want to with the qualities that we
found in the original clamp
method.
The only problem is that this code can be inefficient - the performance of idiomatic Ruby in the existing implementations is not good enough and this unfortunately means that people have to back away from the good code that they want to write.
We can see LeFevre trying two solutions - rewriting in C, which makes the code base much more complicated, and rewriting in a version of Ruby without the good qualities that we liked.
There's a particular problem in rewriting in C, in that those implementations which are trying to optimise Ruby, like JRuby, are then unable to run that C code because they don't have the same internal implementation which C extensions expect. It's ironic that the past solution to Ruby's performance problems now prevent other efforts to improve the root cause of the problem by optimising the original Ruby code.
Summary
In this first of two episodes, we've shown that the performance of Ruby implementations does not always allow people to write the high-level code that they would like to.
Using a tiny example from a real gem, we've illustrated what can be great about writing expressive Ruby code, but then also some problems caused by this high-level code. We've shown that people are rewriting their code in a way that they wouldn't chose to unless performance was a problem.
In the next episode we'll continue to use the clamp
method to talk about where
Ruby is going in the future, and how we can remove the need for rewriting
methods like clamp
.
Responses