In Progress
Unit 1, Lesson 1
In Progress

TSort with Justin Weiss

Video transcript & code

Today we are joined by guest chef Justin Weiss. Justin is one of my favorite Ruby and Rails bloggers right now. He consistently turns out interesting and highly informative articles at justinweiss.com. He's also in the process of writing a book called "Practicing Rails", which promises to make the process of learning Ruby on Rails a little less daunting.

I asked Justin if he'd like to create an episode for RubyTapas, and he has come through with a video about a standard library I knew very little about. It's called "tsort". And I'll let Justin explain what it is, how to use it, and why you might want to. Enjoy!


There's a problem you'll start to see a lot, if you're looking for it.

You have a thing, that depends on another thing happening first. That other thing depends on still other things, and so on.

So you could think about it like this:

# A => [B, C]
# B => [C, D]
# C => []
# ...

A thing A depends on B and C, B depends on C and D. C doesn't have any dependencies, and so on.

In Ruby, you see this in a few tools you use every day. Rake has tasks which depend on other tasks. Gems depend on other gems. For example, Rails depends on ActiveSupport, which depends on the i18n gem, and so on.

Explaining dependencies one level deep makes it easy to think about. I mean, Rails shouldn't care that ActiveModel will eventually install the builder gem. Rails just cares about installing ActiveModel.

But for the person that needs to install these dependencies in the right order, keeping track of dependencies this way can be tough. Especially since you only ever want to process each dependency once, so you're not repeating work. This problem is called a topological sort. And with tsort.rb in the Ruby standard library, Ruby has built-in support for solving it.

So, let's look at an example. Here's a bunch of gem dependencies.

GEM_REPOSITORY = {
  :activemodel => [:activesupport, :builder],
  :activerecord => [:activemodel, :activesupport, :arel],
  :activesupport => [:i18n, :json, :minitest, :thread_safe, :tzinfo],
  :arel => [],
  :builder => [],
  :celluloid => [:timers],
  :connection_pool => [:minitest, :rake],
  :i18n => [],
  :json => [],
  :minitest => [],
  :rails => [:activesupport, :activemodel, :activerecord],
  :rake => [],
  :redis => [],
  :redis_namespace => [:redis],
  :sidekiq => [:celluloid, :connection_pool, :json, :redis, :redis_namespace],
  :thread_safe => [],
  :timers => [],
  :tzinfo => [:thread_safe],
}

So here, the hash key is the name of the gem, and the hash value is the list of gems that gem directly depends on.

So you can see that sidekiq depends on celluloid, connection_pool, json, and you can just keep going.

Total mess of dependencies, right? But let's say you want to install the rails and sidekiq gems. You're going to have to navigate this mess to figure out which gems to install in which order.

But we're not going to navigate this mess. We're going to let tsort do this for us. So we'll require it:

require 'tsort'

There are a few different ways to use TSort, but the easiest is to call the module method tsort on it, like this:

puts TSort.tsort()

When you call TSort like this, it has to know two things: Which top-level nodes should it start from? (In our case, these are the gems we want to install. Rails and Sidekiq.) And given a node, how does it find the dependencies of that node? (The gems that Rails depends on, for instance.)

These are the first and second parameters to the tsort method: a function that tells tsort which nodes to start with, and a function that, given a node, tells tsort what that node's dependencies are. So let's write them.

We'll start by telling TSort which gems to install.

def gems_to_install
  # ...
end

And because we're talking about callbacks here, we don't just return a list of nodes. We yield each gem we want to tell it to install. Which is easy enough to do here:

def gems_to_install
  yield :sidekiq
  yield :rails
end

OK. Next, we need to tell tsort how to install the dependencies for a specific gem. We just look that up in our big GEM_REPOSITORY hash.

def dependencies_for_gem(gem)
  GEM_REPOSITORY[gem]
end

But this won't actually work. There's one other tricky part. Just like the gems\to\install method, TSort wants each dependency yielded to it, it doesn't just want a list. So we'll just forward tsort on to our array's #each method:

def dependencies_for_gem(gem, &block)
  GEM_REPOSITORY[gem].each(&block)
end

So, we'll add the block up here, and pass it along to #each.

Finally, we pass our methods to tsort:

puts TSort.tsort(gems_to_install, dependencies_for_gem)

TSort actually takes things that implement a call method, like a lambda, so we'll wrap our method calls in method to turn them callable:

puts TSort.tsort(method(:gems_to_install), method(:dependencies_for_gem))

And let's run it. Cool, we have our list! And we can spot-check a few of these to make sure they're in dependency order.

So, minitest and rake are before both Rails and Sidekiq. Looks like it installs all the Sidekiq dependencies first, and then Rails. And some of these clearly go in the right order, like activesupport before activemodel before activerecord.

You can change which gem you want to install first:

def gems_to_install
  yield :rails
  yield :sidekiq
end

And watch your dependency order change again. Cool!

Rubygems actually does use tsort, but of course it's more complicated in the real world. You could also use tsort to write a task runner like rake, to create workers that each run bits of code and depend on each other, and all kinds of other things. And once again, it's right there in the Ruby standard library!

Responses