In Progress
Unit 1, Lesson 21
In Progress

Abstraction and Performance – Part 1

Years ago there was a Ruby language feature which I found powerful, but which people warned me about using because it was also quite inefficient in the Ruby implementations of the time. I remember asking Matz, the creator of Ruby, about this at a conference. His answer was that Ruby programmers should use the features that make them happy, and that it’s the job of language implementers, like Matz and his team, to ensure that those features are also fast.

This idea—that the language should serve programmer happiness first and foremost—is one of the things that makes Ruby unique. Chris Seaton, our guest chef today, is keeping that spirit alive. Chris is the head of the TruffleRuby project. Today in the first of a two-part series he’s going to give us an example of how abstraction and runtime efficiency can sometimes be at odds. Then, in part two, he’ll show how he and other language implementers are tackling this problem so we don’t have to choose one or the other. Enjoy!

Video transcript & code

Abstraction and the Past and Future of Ruby

First Episode

The clamp method

Here's a method that I think says something about where Ruby is now and where it could be going in the future.

def clamp(num, min, max)
[min, num, max].sort[1]
end

clamp(89, 90, 100) # => 90
clamp(101, 90, 100) # => 101
clamp(110, 90, 100) # => 100

It's a routine from the PSD.rb library for reading Photoshop documents, written by Ryan LeFevre.

The method takes a minimum and maximum number, and returns a third number that was either already in this range, or the minimum or maximum if it was outside the range. It's used as part of converting a value from the CMYK to RGB colour space.

What's interesting to me about this method is the level of abstraction at which it has been written, and what this has meant for its performance and goals for Ruby implementations to improve. I found it while writing my PhD and this one little method ended up motivating several years of my research at university and Oracle Labs.

What is good about this method?

What is it that I think is good about this method?

The method works by creating an array from a literal of the three numbers, call the Array#sort method, which returns a new sorted array, and then the middle value of the sorted array is returned.

Slide title

[A blank screen is displayed here]

A lot of different people from across different approaches to programming would have their own terminology to describe what is good about writing code in this way.

Slide title The method is a single line so you can understand all of what it does by looking at just that one line. It couldn't reasonably be simplified or focused any further.

Slide title The code is a simple composition of core routines, creating an array, sorting it, and indexing the new array. There's not really any logic or algorithm in this method - it delegates all that to the core routines and only combines them to make a useful new method.

Slide title The code has linear control-flow, meaning that there are no branches such as if statements. This allows you to look at the method and to only have to think about the values flowing through it, not which branches it will take. There is only one return point in the method, and it's implicit.

Slide title The code is declarative, in that it says what it produces, rather than how. It produces an array, that is sorted, and the middle value from that array is returned. You can read those three things across from the start to the end of the line, and understand it without having to think much more.

Slide title The code is immutable, meaning that the objects created are not modified after being created. This means that you never need to worry about what state an object is in. The sorted array is always sorted because it's created like that.

Slide title The code is in single assignment form, meaning that names only ever have one value. The input parameter names are assigned a value when the method is called, and we never modify these names. We don't have any other local variables at all.

What is the problem with it?

The clamp method has these excellent qualities, but it also has some serious problems.

To give some context, this method is called for each of the R, G and B colour values for every pixel in potentially some very large images. It's definitely an inner-loop operation and we can reasonably predict that its performance therefore makes a difference to the responsiveness of apps using this library.

A first problem with the method is the number of objects it allocates. We can rewrite it to make these allocations a bit more clear.

def clamp(num, min, max)
object_1 = Array.new(3) # object allocated
object_1[0] = num
object_1[1] = min
object_1[2] = max
object_2 = Array.new(3) # object allocated
sort_from_into object_1, object_2
object_2[1]
end

It allocates one array for the literal, and then a second array for the new sorted version (remember that the Array#sort routine returns a new array, it doesn't modify the existing one). Objects are central to Ruby and a normal Ruby program will be allocating a lot of them, but allocating an object in traditional Ruby implementations is not free - in fact it can become a serious cost. You may have heard about people doing a lot of work to try to improve Ruby's garbage collection - well the best way to improve garbage collection is to not generate garbage. And in the context of how it is being used - called multiple times for every pixel in an image - this method is generating an avalanche of garbage. To convert a photo of the size captured by a normal iPhone from CMYK to RGB, the clamp method alone will generate about a gigabyte of garbage.

A second problem is that the method calls into a core library routine, Array#sort. It's a bit trickier to explain what my problem with this is, and I'll show more later, but essentially the problem with using core library routines is that makes it much harder for a Ruby implementation to optimise this logic. Most Ruby implementations have special handling for making operations like < fast - but there are are too many core library routines to do this for all of them, and methods like sort aren't optimised in the same way.

We know that the performance of the method has been a problem in practice, because LeFevre has rewritten it twice.

VALUE psd_native_util_clamp(VALUE self, VALUE r_num, VALUE r_min, VALUE r_max) {
int num = FIX2INT(r_num);
int min = FIX2INT(r_min);
int max = FIX2INT(r_max);
return num > max ? r_max : (num < min ? r_min : r_num);
}

The first rewrite was a C extension version of the method. I really like the way that this was done - the PSD.rb gem has a C extension counterpart called psd_native, which replaces just a few key methods such as clamp. It leaves the original Ruby code there, so you can read it and understand what it does without having to read the C code.

The C code loses all the qualities that we said we liked about the original Ruby code. That nested ternary (the ? operator) is quite hard to read even for someone who knows C. But it does remove the allocation, and the call to the core library routine.

def clamp(num, min, max)
return max if num > max
return min if num < min
num
end

The second rewrite was to the original Ruby code, and also removed both the allocation and the use of the core library routine. But like in C we've lost those qualities. This code now has two branches, three return points, is three-times as long, and I think much harder to understand.

What does this tell us about Ruby?

What do I think this method tells us about Ruby?

I think it's a really good thing that Ruby allows and encourages people to write the high-level, abstracted code that they want to with the qualities that we found in the original clamp method.

The only problem is that this code can be inefficient - the performance of idiomatic Ruby in the existing implementations is not good enough and this unfortunately means that people have to back away from the good code that they want to write.

We can see LeFevre trying two solutions - rewriting in C, which makes the code base much more complicated, and rewriting in a version of Ruby without the good qualities that we liked.

There's a particular problem in rewriting in C, in that those implementations which are trying to optimise Ruby, like JRuby, are then unable to run that C code because they don't have the same internal implementation which C extensions expect. It's ironic that the past solution to Ruby's performance problems now prevent other efforts to improve the root cause of the problem by optimising the original Ruby code.

Summary

In this first of two episodes, we've shown that the performance of Ruby implementations does not always allow people to write the high-level code that they would like to.

Using a tiny example from a real gem, we've illustrated what can be great about writing expressive Ruby code, but then also some problems caused by this high-level code. We've shown that people are rewriting their code in a way that they wouldn't chose to unless performance was a problem.

In the next episode we'll continue to use the clamp method to talk about where Ruby is going in the future, and how we can remove the need for rewriting methods like clamp.

Responses