In Progress
Unit 1, Lesson 21
In Progress

Performance Testing – Staying Fast

When we fix a bug in the code we typically add a test in order to ensure that the bug never gets accidentally re-introduced. But what about verifying performance optimizations? Typically when we optimize a performance bottleneck, we manually profile the results and then commit our changes.

What if we could add tests that characterized the code’s expected performance profile, and failed if it ever deviates from those expectations? Our guest today, Piotr Murach, has creates some tools to do just that, and he’s going to show you how to use them to make your performance expectations testable. Enjoy!

Video transcript & code

Performance Testing

FiniteMachine logo

FiniteMachine is a Ruby gem that I’ve created to help me implement finite state machines.

FiniteMachine example DSL

It has an intuitive DSL to describe states, transitions and callbacks triggered by these transitions.

One day I decided to refactor a bunch of code to make it more maintainable. All my unit and integration tests passed so I proceeded with the minor release from 0.10.2 version to 0.11.0. A few days later, I received a GitHub issue that read.

FiniteMachine memory leak

“It appears there's a memory leak in the v0.11 series of the gem. When we bumped to it in our app, we started to experience severe memory issues in production.”

FiniteMachine benchmark script

Upps! The author of the issue provided a simple script that clearly demonstrated ballooning memory using Ruby’s GC stat method. The code was holding on to resources that Garbage Collector couldn't clean up. So I devised a solution and released a patched version of the FiniteMachine gem.

“we know how to test behavior, but how do we test performance?”

However, this event prompted me to think about a way to mitigate such scenarios from happening in the future. Unit and acceptance tests may help you deliver a working software and guard it against regressing into a complete mess. What is not guaranteed is that the code will remain performant. Ideally, we would be able to capture such performance regressions in a test suite and stop ourselves from refactoring our way into a slow code. Let’s see how we can change this!

I would like to show you a way to add performance testing as another tool in your development tool belt. In episodes 541 & 542, Chris Seaton discusses a balance between abstraction and performance based on studying the clamp method. Here it is as a refresher:

def clamp(num, min, max)
  [min, num, max].sort[1]

This method demonstrates a typical approach to Ruby code development; expressiveness first before any performance considerations. Let’s imagine for a second that this method is part of a large codebase you’re currently working on that needs to be as fast as a cheetah. You have run profiler and found that clamp method is one of your performance bottlenecks. In a TDD spirit you want to start with a test. But what assertion do you write? I'm glad you've asked.

RSpec as a testing framework allows us to convert our test assertions into English like sentences. This is the big sale point of RSpec! I've decided to combine this expressiveness with the idea of performance testing to create a suite of test assertions ready for you to use.

Performance Testing Angles

These assertions allow us to look at code performance from different angles such as:

  • Execution speed
  • Resources usage
  • Scalability

For example, let's capture the execution speed requirement by looking at how many iterations per second the clamp method performs. First, we need to configure RSpec to make all the performance matchers available in our test suite.

Let's open up the spec_helper.rb file.

We need to require the rspec-benchmark gem.

Then to make the matchers available in our tests, we need to include the RSpec::Benchmark::Matchers module in our RSpec configuration .

require "rspec-benchmark"

RSpec.configure do |config|
  config.include RSpec::Benchmark::Matchers

Now let's describe the performance characteristics of the clamp method and leave a placeholder for the amount of iterations.

We use a block expectation to invoke the clamp method with some sample values.

Then we use the perform_at_least matcher to make an assertion about the number of iterations we expect this method to perform in a given period of time. All the performance matchers start with the word "perform".

Because we don’t know how fast the clamp method is, we pick a random value of 2 million iterations per second.

RSpec.describe “#clamp” do
  it "performs at least ??? iterations per second" do
    expect { clamp(70, 50, 60) }.to perform_at_least(2_000_000).ips

And we run our test.

#clamp performs at least ??? iterations per second
Failure/Error: expect { clamp(70, 50, 60) }.to perform_at_least(2_000_000).ips
  expected block to perform at least 2000000 i/s, but performed only 1834441 (± 1%) i/s

Let's take a look at these results.

The failure provides us with the actual number of iterations per second, which turns out to be fewer than our arbitrarily-chosen expectation of two million.

The perform_at_least matcher removes guesswork from our estimation by displaying a descriptive error message with the actual performance when a boundary value fails. Choosing the right boundary value is important and demonstrates the first caveat of performance testing. Namely, that too strict values may set off false alarms, and too lax values won’t catch any regressions.

In this case, 2 million iterations is too high, but setting the boundary to the actual value as seen in the failure message would provide for a flaky test.

So we decide that the ‘good enough’ value is probably 1.5 million iterations.

RSpec.describe "#clamp" do
  it "performs at least 1.5M iterations per second" do
    expect { clamp(70, 50, 60) }.to perform_at_least(1_500_000).ips

Now that we have our test written, we can focus on getting the clamp method faster.

In this case, we will create a separate method called clamp_fast to help us highlight the performance difference.

def clamp_fast(num, min, max)
  [min, num, max].sort[1]

To improve performance usually involves doing a “fundamental” change. Introduction of simpler data structure (a hash in place of complex data types) or different algorithm (iteration over recursive call).

In the case of clamp method, to speed computation up it’s enough to remove spurious allocations of objects which can be done by directly using the ternary operator.

def clamp_fast(num, min, max)
  num > max ? max : (num < min ? min : num)

Let’s see how many iterations clamp_fast can perform.

Let's make an optimistic prediction that we will get double the performance.

RSpec.describe "#clamp_fast" do
  it "performs at least 1.5M iterations per second" do
    expect { clamp_fast(70, 50, 60) }.to perform_at_least(4_000_000).ips

And run our tests.

  performs at least 1.5M ips

Finished in 0.22309 seconds (files took 0.21747 seconds to load)
1 example, 0 failures

Hm, that succeeded!

Let's try four times the old performance.

RSpec.describe "#clamp_fast" do
  it "performs at least 1.5M iterations per second" do
    expect { clamp_fast(70, 50, 60) }.to perform_at_least(8_000_000).ips

And run our tests again.

#clamp_fast performs at least 1.5M iterations per second
Failure/Error: expect { clamp_fast(70, 50, 60) }.to perform_at_least(8_000_000).ips
  expected block to perform at least 8000000 i/s, but performed only 7605403 (± 0%) i/s

Ahh! Now we've found the real performance of our optimized version: 7 million iterations per second! That’s so much faster!

Let's again use this to round down to a reliably repeatable expectation.

it "performs at least 7M iterations per second" do
  expect { clamp_fast(70, 50, 60) }.to perform_at_least(7_000_000).ips

OK, we have put an absolute value on our performance expectations for the optimized version. But is that really what we care about? Or is what we're really trying to assert here that the new version is significantly faster than the old version? How could we express this?

If your head hurts doing maths that’s fine, I have just the matchers for you.

We can express the relationship between the two methods using the perform_faster_than and perform_slower_than matchers.

expect { ... }.to perform_faster_than { ... }
expect { ... }.to perform_slower_than { ... }

If we wish to be specific about how much faster our new implementation is, we can express this using once, twice, at_least, at_most matchers.

expect { ... }.to perform_faster_than { ... }.once
expect { ... }.to perform_faster_than { ... }.twice
expect { ... }.to perform_faster_than { ... }.exactly(5).times
expect { ... }.to perform_faster_than { ... }.at_least(5).times
expect { ... }.to perform_faster_than { ... }.at_most(5).times

This matchers should be familiar to anyone who has used RSpec built-in expectations.

So comparing clamp implementations we can state that the new implementation is at least 3 and a half times faster than the old one.

it "performs at least 3 and half times faster than the old one" do
  expect {
    clamp_fast(70, 50, 60)
  }.to perform_faster_than {
    clamp(70, 50, 60)

Great! We now have captured the new clamp implementation performance as a test.

Key Takeaways

I trust that my short example demonstrates a few key points:

  • the need to measure before optimizing
  • beautiful code doesn't have to be at odds with speed improvements
  • simpler code tends to be faster

I'm hoping that this episode will provoke you into thinking beyond just testing your code, but also ensuring that your code is fast. And I hope that some of these ideas will prove to be fruitful for writing performant Ruby code.

Happy safe hacking!

Next episode: Performance Testing – Part 2