In Progress
Unit 1, Lesson 1
In Progress

Abstraction and Performance – Part 2

In part one of this series, guest chef Chris Seaton demonstrated how expressiveness in code can sometimes come at the expense of execution speed. Today, you’ll see how he and other language implementers are using just-in-time compilation to ensure that developers can choose idioms based on their own preferences, and leave optimization to the Ruby runtime. Enjoy!

Video transcript & code

Abstraction and the Past and Future of Ruby

Second Episode

Recap

In the previous episode we showed this clamp routine from the PSD.rb library for reading Photoshop documents.```

def clamp(num, min, max)
[min, num, max].sort[1]
end

The method takes a minimum and maximum number, and returns a third number that was either already in this range, or the minimum or maximum if it was outside the range. It's used as part of converting a value from the CMYK to RGB colour spaces.

We talked about some qualities that we might appreciate in the method, but then also some problems with performance, and showed how the method has been rewritten a couple of times in different ways, losing the qualities that we said we liked in the original.

Just-in-time compilers and removing abstraction

The ideal situation would be if we could run the original version of the clamp method as fast as the C version or the rewritten Ruby version, automatically.

That's optimization, and it's what JRuby and other implementations of Ruby with just-in-time compilers have been trying to do. And now, MRI, the standard implementation of Ruby, is also working on development of a couple of optimising just-in-time compilers to try to achieve the same thing.

If we want to run the original clamp as fast as the rewritten versions, a just-in-time compiler is going to have to do several extremely complicated things, simultaneously:

Firstly, we're going to have to remove the allocation of these arrays so that we don't need to run the memory allocator, and we don't need to run the garbage collector later. This is called escape analysis and scalar replacement of aggregates. You can think of it as replacing object array elements with local variables in the method - one local variable for each array element.

def clamp(num, min, max)
a0 = min
a1 = num
a2 = max
b0 = ...first sorted element
b1 = ...second sorted element
b2 = ...third sorted element
return b1
end

We can then remove the unused local variables - the array elements which we never use - and whatever code would be needed to calculate them.

def clamp(num, min, max)
a0 = min
a1 = num
a2 = max
b1 = ...second sorted element
return b1
end

Secondly, we're going to need to remove the sort routine itself. The default sort routine uses loops and calculates the sort ordering of all three elements, even though we only want one of them. Instead we can include just the logic for calculating the one element we need - the second one.

def clamp(num, min, max)
a0 = min
a1 = num
a2 = max
b1 = num > max ? max : (num < min ? min : num)
return b1
end

We can continue to automatically simplify this method, and it will eventually look the same as the manually done rewrite into the C or new Ruby version.

def clamp(num, min, max)
a0 = min
a1 = num
a2 = max
a1 > a2 ? a2 : (a1 < a0 ? a0 : a1) end ``` ``` ruby def clamp(num, min, max) num &gt; max ? max : (num &lt; min ? min : num)
end

The version here uses the ternary operator (the ? again) but it's functionally the same as the version with if expressions.

That looked like a complicated process, and to make things worse, we need to sort of do both of these optimisations simultaneously. You can't remove both the arrays until the sort routine has been optimised away, and you can't optimise away a sort routine that uses an array that someone else could be using so it needs to run after the allocation removal.

Finally, we're also going to have to be prepared to undo both of these optimisations if something changes, such as someone monkey patching array sorting, indexing or comparing numbers. In larger programs it may be necessary to do this while the method is executing, reaching into the method's state in the processor and recreating objects that were previously optimised away.

Achieving this kind of optimisation is the huge challenge that optimising Ruby implementations such as the new just-in-time compilers being developed for MRI have ahead of them.

Slide title TruffleRuby, the new implementation of Ruby which I work on and which came out of my PhD research, is today the only implementation that is able to do this. It can literally compile the Ruby code in the first version of clamp to the same machine code bytes as the Ruby code from the rewritten version. This frees Ruby programmers to write the high-level code they want, and get the same performance as the lower-level version.

Hopefully this capability is the future of MRI as well.

Keep in mind, though - this was all about a single-line method. We're still looking at the easy stuff here and it needs to scale all the way up to be effective on larger applications.

Simpler examples

A simpler example of this idea can be seen in a 2015 Ruby Kaigi keynote by Evan Phoenix.

[1, 2].min

Phoenix argued that this code should reduce to a constant, and named it the canary test of a Ruby just-in-time compiler that is powerful enough. It's a lower bar than our clamp method, because it produces a scalar value rather than an array so it's easier to remove.

Getting this working in MRI and JRuby would be a first step towards optimising clamp. Phoenix recommends applying a form of compilation technique called partial evaluation to the LLVM toolchain and integrating this into MRI. TruffleRuby also uses the partial evaluation technique, but applies it to the Java Virtual Machine.

More complicated examples

A more complicated example of this idea was described in my PhD thesis where I took clamp and several other parts PSD.rb, and a related library called ChunkyPNG, and composed them into a single acid test that also involves hashes and metaprogramming.

module Foo
def self.foo(a, b, c)
hash = {a: a, b: b, c: c}
array = hash.map { |k, v| v }
x = array[0]
y = [a, b, c].sort[1]
x + y
end
end

class Bar
def method_missing(method, *args)
if Foo.respond_to?(method)
Foo.send(method, *args)
else
0
end
end
end

bar = Bar.new
bar.foo(14, 8, 6) # => 22

I think an optimising Ruby implementation should be able to reduce this program to a single constant value using the optimisations we've talked about and some more - and TruffleRuby can today.

What this means

What I'm saying here is that I think that one of Ruby's strengths is that it allows people to write the code that they want to. One weakness of Ruby is currently that this code is then slower than it needs to be.

Ruby programmers shouldn't have to rewrite their code to make it faster if we can avoid it by having the Ruby implementation do that work for them. If a small group of experts can implement powerful just-in-time compilers for Ruby that effectively rewrite their code for them at runtime using some of the techniques I've described here then the Ruby programmers can focus on their high-level code.

TruffleRuby, JRuby, Rubinius, and now MRI with the MJIT and RTL just-in-time compiler projects, are all efforts working towards this in the next few years of Ruby's development.

Responses