In Progress
Unit 1, Lesson 1
In Progress

Aphorism to Empiricism

There are certain aphorisms that we almost all familiar with as developers, like “don’t repeat yourself” and “separate responsibilities”. But, setting aside for the moment the debate about whether these slogans are always correct… how do we know for sure if we are applying them?

In today’s episode, guest chef Kerri Miller takes us on a whirlwind tour of three essential utilities for evaluating specific qualities of a Ruby codebase. You’ll learn how to map out the dependencies in a Rails application, how to identify high-risk areas, and how to measure code complexity. Enjoy!

Video transcript & code

APHORISM

There are some things we know about code that we capture in aphorisms - Separate Your Responsibilities… ...Reduce Complexity… ...Don’t Repeat Yourself. As a human, I certainly think I know what these look like in practice, but sometimes it isn’t as obvious as we think.

Luckily, there’s a few tools out there that can help guide us, whether we’re in the process of writing new code, or refactoring legacy software. They not only help us in the task at hand, but can deepen our understanding of the catchphrases themselves. Let’s take a look at three I use fairly often.

Let’s start with the big picture view, with a pair of tools that can help us examine a system as a whole.

First, Entity Relationship Diagrams, or ERDs.

ERD for rubygems/rubygems.org

This particular one represents the models found in the Rubygems.org website codebase. Each ActiveRecord model found in the application is represented here, with a line pointing towards its “has” relationships, and a list of its database fields.

You can think of it as a social graph, bit like a map of the paths worn into neighborhood lawns by the kids who live in each house — each has their friends, who they spend the most time playing with. Models that have a number of lines radiating from them tend to be the popular kids, the one’s that show up again and again in your code.

ERD highlighting User and Rubygems models

Here we see User and Rubygems having the most relationships, and its no coincidence, as they’re the core concepts in the application.

ERD highlighting shared relationships between User and Rubygems models

They also share ownership with the most number of models. Yet they don’t directly have a relationship. Conceptually, individual instances of each relate to the other, through the intermediary model’s records. Each intermediary describes a different way they interact.

ERD highlighting models with exclusive relationships to the Rubygems model

Of course, in our little neighborhood drama, Rubygem is the star of the show. It not only relates to the most other models, but it also has its own gang that User doesn’t get to hang out with.

Version, Linkset, Dependency, and GemDownload are all isolated from the rest of the system, forming their own little cul de sac of code.

ERD for a complex Rails application

In a more complex system, these cul de sacs can represent larger, more meta concepts around process and responsibilities. For example, in this app, there's a large grouping of models off to the left that all relate to a Customer model, and if we zoom in

Zoom on 4 billing-related models

We can see they’re all related to billing and invoicing. This would be a good place to perhaps encapsulate their interactions with the rest of the application. Or even hive it off into its own Billing service.

So we’ve seen how an entity relationship diagram can visualize relationships between domain models. Now let’s get a step closer to the code, and look at a tool that can help guide us towards refactor targets and areas of danger.

Turbulence title card

Research into large codebases has shown us repeatedly that there are two factors that when combined, are correlated with high incidence of bugs:

  1. The first is files that change frequently. We sometimes call these “high churn” files.
  2. The second is areas of code that are complex and hard to fathom.

Turbulence project homepage

The next tool we’re going to look at is called Turbulence, and it helps us visualize these high-churn, high-complexity hotspots.

Treemap output from turbulene for rubygems/rubygems.org

Turbulence is a gem that generates a graph, a rectangle for each file, sized according to its rate of churn. Each rectangle’s color is based on the score of another tool, called Flog, which offers a score based on the conceptual complexity of a file. These files in red are the ones we want to have good test coverage on. And they will likely have large pieces of code that could stand to be refactored and simplified.

Continuing closer to the code, let’s look at the actual Flog score for this application. Let’s start by looking at one of the more complex files.


➜  flog -m -a app/helpers/rubygems_helper.rb
   158.6: flog total
     9.3: flog/method average

    38.0: RubygemsHelper#link_to_github    app/helpers/rubygems_helper.rb:18-25
    23.2: RubygemsHelper#links_to_owners   app/helpers/rubygems_helper.rb:91-95
    18.0: RubygemsHelper#subscribe_link    app/helpers/rubygems_helper.rb:44-58
    12.9: RubygemsHelper#simple_markup     app/helpers/rubygems_helper.rb:34-40
     9.5: RubygemsHelper#unsubscribe_link  app/helpers/rubygems_helper.rb:61-68
     7.5: RubygemsHelper#latest_version_number app/helpers/rubygems_helper.rb:106-108
     7.2: RubygemsHelper#github_params     app/helpers/rubygems_helper.rb:111-112
     6.9: RubygemsHelper#link_to_directory app/helpers/rubygems_helper.rb:28-31
     6.9: RubygemsHelper#report_abuse_link app/helpers/rubygems_helper.rb:84-89
     5.1: RubygemsHelper#pluralized_licenses_header app/helpers/rubygems_helper.rb:2-3
     4.6: RubygemsHelper#formatted_licenses app/helpers/rubygems_helper.rb:6-10
     4.0: RubygemsHelper#show_all_versions_link? app/helpers/rubygems_helper.rb:102-104
     3.5: RubygemsHelper#link_to_page      app/helpers/rubygems_helper.rb:14-15
     3.4: RubygemsHelper#atom_link         app/helpers/rubygems_helper.rb:70-73
     3.4: RubygemsHelper#badge_link        app/helpers/rubygems_helper.rb:79-82
     2.2: RubygemsHelper#reverse_dependencies_link app/helpers/rubygems_helper.rb:75-76
     2.2: RubygemsHelper#nice_date_for     app/helpers/rubygems_helper.rb:98-99

Flog offers us an “opinionated” score - each method is assigned a score based on how difficult it is to understand or modify. The author has suggested 10 as a good threshold. The further a method goes above 10, the more likely it is to be in need of some simplification.

Here we can see that even though there’s a fair amount of total complexity, the average method has a score of 9.3 — this class has a large number of simple methods. The most complex method is only 38, which isn’t the worst I’ve seen, but what makes it so complex? Flog can show us what went into coming up with that score:


➜  flog -m -d  app/helpers/rubygems_helper.rb
   158.6: flog total
     9.3: flog/method average

    38.0: RubygemsHelper#link_to_github    app/helpers/rubygems_helper.rb:18-25
    11.1:   linkset
     5.8:   URI
     5.6:   branch
     5.1:   home
     4.8:   code
     2.9:   nil?
     2.9:   host
     2.5:   !
     2.5:   ==

Quite a few calls to a linkset method, which increases complexity. There’s also complexity around URI usage, and a bit around branch which is how flog reports the cumulative effect of if/else and case/when structures. Maybe it could be cleaned up, lets take a peek

Original code

It doesn’t look too horrible, but all those dot-chained methods are kind of visually repetitive,...


def link_to_github(rubygem)
  if !rubygem.linkset.code.nil? && URI(rubygem.linkset.code).host == "github.com"
    URI(@rubygem.linkset.code)
  elsif !rubygem.linkset.home.nil? && URI(rubygem.linkset.home).host == "github.com"
    URI(rubygem.linkset.home)
  end
rescue URI::InvalidURIError
  nil
end

...and it can be difficult to keep straight which branch of code does what, since they look so visually similar. Let’s take a stab at cleanup, with a goal to improve the complexity score, and ignoring any other considerations for the moment.

Here’s what I came up with.

Highlighting repetitive code

In the original, we’re performing the same set of operations on two different pieces of data in each branch of the original code.

Highlighting same elements in refactored code

I’ve decided to optimize for the Don’t Repeat Yourself principle. Instead of having the same code written twice with the only difference being the data it acts upon,...

Highlighting location of operation in refactored code

...we change it so the data is an input to a block of the operations we want to perform.


def link_to_github(rubygem)
  result = [rubygem.linkset.code, rubygem.linkset.home].detect do |linkset|
    URI(linkset.to_s).host == "github.com"
  end

  URI(result.to_s)
rescue URI::InvalidURIError
  nil
end

This allows us to focus the eye on the operation we wish to perform - checking if the URI's host equals "github.com"

Highlighting location of data in refactored code

.This keeps the data separate from the operation, resulting in a much less complicated bit of code, which makes it easier for the next developer not only to understand, but easier for them to confidently make changes to it.

So what does flog tell us about the new code’s complexity?

Before: 38.0 -- After: 16.4

Much improved! We’ve reduced overall complexity in this method by 43% - outstanding! Now, we can certainly talk about style, clarity, and performance, but it looks quite a bit better to me.


Before:
   158.6: flog total
     9.3: flog/method average

    38.0: RubygemsHelper#link_to_github    app/helpers/rubygems_helper.rb:18-25
    23.2: RubygemsHelper#links_to_owners   app/helpers/rubygems_helper.rb:91-95
    18.0: RubygemsHelper#subscribe_link    app/helpers/rubygems_helper.rb:44-58
    12.9: RubygemsHelper#simple_markup     app/helpers/rubygems_helper.rb:34-40
     9.5: RubygemsHelper#unsubscribe_link  app/helpers/rubygems_helper.rb:61-68

After:
   136.9: flog total
     8.1: flog/method average

    23.2: RubygemsHelper#links_to_owners   app/helpers/rubygems_helper.rb:91-95
    18.0: RubygemsHelper#subscribe_link    app/helpers/rubygems_helper.rb:44-58
    16.4: RubygemsHelper#link_to_github    app/helpers/rubygems_helper.rb:18-25
    12.9: RubygemsHelper#simple_markup     app/helpers/rubygems_helper.rb:34-40
     9.5: RubygemsHelper#unsubscribe_link  app/helpers/rubygems_helper.rb:61-68
     7.5: RubygemsHelper#latest_version_number app/helpers/rubygems_helper.rb:106-108

 

I’ll often run this tool as a before/after on my pull requests, just to make sure I haven’t accidentally added excessive amounts of complexity; after all, complexity adds up the more you try to be clever!

PR for this change showing inclusion of flog scores

When the difference is relevant or interesting, I’ll paste the output from Flog into the pull request or even in an extended commit message, if I want to focus a reviewer’s attention on the decrease in complexity. Especially when working on a refactor project, it is valuable to be able to measure the actual improvement rather than rely only upon how you feel about the change.

Outro

Bulleted list of tools shown in episode

So we've looked at 3 different tools that help us see our code from new angles.

We looked at rails-erd, which generates a graph of our ActiveRecord models and how they relate to each other.

We used turbulence to chart the relationship between churn and complexity in our code, to reveal hot spots where bugs might crop up in the future.

And lastly we used flog to explore the complexity of our code in order to understand why it might be hard to change, and to guide us in simplifying it.

There are dozens more tool and metrics to apply to your codebase, but keep in mind that these tools have an opinion, and gaming how you write code to make the tools happy isn't always the best approach. Doing so, however, can teach you quite a bit about the concerns the tool is built to address.

It is best to treat all these tools as lenses, allowing you to explore different avenues for solutions, and increase your own personal understanding of WHY a given command, structure, or practice could be considered complicated or problematic. They can help you value the constructs the language does (or doesn't) provide you to shortcut otherwise complicated code.


#RubyTapas

Responses