In Progress
Unit 1, Lesson 1
In Progress

Smaller Objects, Faster Code

It’s easy to write off slow performance when processing large quantities of data as simply par for the course. Lots of data takes a long time to churn through. But often there are some easy optimization wins waiting just below the surface. All it takes is understanding some basic principles of computer architecture, and the ability to apply those principles to Ruby code.

In today’s episode, guest chef Jack Thorne joins us to demonstrate how changing the *size* of objects created in a batch job can have a major impact on performance. Along the way, you’ll learn a little about memory architecture in modern CPUs. Enjoy!

Video transcript & code

Making Ruby go fast with small objects

Hi Rubyists! I hope you like to make Ruby go fast. But before we get started, my name is Jack, and I program in Ruby full time and in my free time I work with ChiHAckNight creating opensource software with open data. Open data is data provided by cities and states for public use. When working with large datasets, there are a few ways Ruby can slow down, and I want to talk about ways to speed it up. I want to share with you a framework I use to think about ruby performance when I sit down to write programs. It is more of a methodology than a set of methods or patterns, but it works for me quite well.

Lets say we want to make a sinatra app with this new open data. We can make a web site for the city of Chicago that people can come to and get the total kWh of energy usage. Luckily we have a dataset to start with provided by the City of Chicago.

computer display of energy usage

Now let's build a Sinatra app

here we have a 'main' endpoint that can look up the sum of any column in the dataset, given a parameter.


require "csv"
require "sinatra"

data = CSV.table("./Energy_Usage_2010.csv")

get "/main" do
  data.reduce(0) { |acc, e_record|
    acc + (e_record[params[:key].to_sym] || 0)
  }.to_s
end

After running the web server let's make a request with curl and specifying that total kwh.


$ curl "http://localhost:4567/main?key=total_kwh"

looking at the Sinatra logs we can see


::1 - - [24/Aug/2018:01:44:55 -0500] "GET /main?key=total_kwh HTTP/1.1" 200 11 0.2628

This completed successfully and took about a quarter of a second to run.

Let's talk about what happened in that request.

When the server starts up, it read the energy data file and stores it in a variable.

Then on a request with a param of total_kwh, it will go through every record and add up total_kwh into an integer and returns it. Now this is in a web framework, and there is overhead we are not controlling for but let's see what we can do.

Every time Ruby uses a variable or data it has to move it into the CPU and moving this data around your computer take time and the more time your program has to wait for data to be moved around, the slower your program is. However, when your computer moves data to the CPU, it places into caches along the way. We can use those caches to make our programs faster. Believe it or not, accessing RAM is orders of magnitude slower than any cache in your CPU. lets look at an animation showing the caches.

cache speeds compared

As you can see, there is a massive difference in speed between L1, L3, and your RAM. So anytime a CPU doesn't find the data in a cache we call this a cache miss and these are slow. A cache miss happens the first time you access data or when a cache in the CPU fills up and has to evict something. We want to minimize the amount of time the cache gets too full and evicts data. To do that we have to use the cache more efficiently, I never try to get 100% cache hits, but the point is to get 80% of the value for 20% of the effort. And because of how extreme the differences are a little bit can go a long way. One way we can use the cache better is to load less data this means the cache is less full and do not need to evict as often. Let's take a look back at our example to see what changes we could make.

Looking back at this example. We are pushing all the data in the CSV through the caches in the process evicting a lot of data. This means a lot of the time spent is waiting on the main memory and not returning the very valuable energy data.


require "csv"
require "sinatra"

data = CSV.table("./Energy_Usage_2010.csv")

get "/main" do
  data.reduce(0) { |acc, e_record|
    acc + (e_record[params[:key].to_sym] || 0)
  }.to_s
end

One way to improve the performance of this program is to look at it from a data flow perspective. In order to reduce the pressure on CPU caches, we could try transforming the dataset we're working with before run time to make it more focused.

Let's add some code that transforms our dataset at load time to one that contains just the data we want to work with. In this case, it will be the total kWh. This will be an array of only the integers and nils we want

Next, we can create another endpoint to test this. This endpoint will take a param and will lookup that dataset and do the same calculation as before.


require "csv"
require "sinatra"

data = CSV.table("./Energy_Usage_2010.csv")

mapped_data = {
  total_kwh: data[:total_kwh]
}

get "/main" do
  data.reduce(0) { |acc, e_record|
    acc + (e_record[params[:key].to_sym] || 0)
  }.to_s
end

get "/thin" do
  mapped_data[params[:key].to_sym].reduce(0) { |acc, e_record|
    acc + (e_record || 0)
  }.to_s
end

We can test this like before using curl to make requests to both the old and new endpoints.


$ curl "http://localhost:4567/main?key=total_kwh"
$ curl "http://localhost:4567/thin?key=total_kwh"

When we take a look at the logs, we can see the performance of the different endpoints side by side.


$ ruby web.rb
::1 - - [24/Aug/2018:01:45:36 -0500] "GET /main?key=total_kwh HTTP/1.1" 200 11 0.2652
::1 - - [24/Aug/2018:01:45:01 -0500] "GET /thin?key=total_kwh HTTP/1.1" 200 11 0.0073

That little change in the data flow increases performance by more than an order of magnitude. Using a profiler on my machine, I've measured the actual difference in CPU cache hits between these two examples. Whereas the first version has a cache hit rate in the high 20 percent range, the second version has a hit rate the mid-70s. That's a pretty huge difference, and you can see the result of it in the shorter runtime of the second example.

There are many other ways to increase your use of the CPU cache, but today I just wanted to show you one of my favorite easy wins. As always with any performance improvement, please measure twice before you cut.

Personally, as a developer I get a sense of deep satisfaction from crafting code that feels like a tight, well-oiled machine. And I also know that faster code makes for happier users. Being conscious of the hardware my software runs on has changed the way I think about writing software. Particularly about data size and flow through my programs. Hopefully, this will help to give you a framework to do the same. I think we can all make our programs more performant by improving our understanding the underlying architecture.

Happy hacking!

Responses