Aphorism to Empiricism
There are certain aphorisms that we almost all familiar with as developers, like “don’t repeat yourself” and “separate responsibilities”. But, setting aside for the moment the debate about whether these slogans are always correct… how do we know for sure if we are applying them?
In today’s episode, guest chef Kerri Miller takes us on a whirlwind tour of three essential utilities for evaluating specific qualities of a Ruby codebase. You’ll learn how to map out the dependencies in a Rails application, how to identify high-risk areas, and how to measure code complexity. Enjoy!
Video transcript & code
There are some things we know about code that we capture in aphorisms - Separate Your Responsibilities… ...Reduce Complexity… ...Don’t Repeat Yourself. As a human, I certainly think I know what these look like in practice, but sometimes it isn’t as obvious as we think.
Luckily, there’s a few tools out there that can help guide us, whether we’re in the process of writing new code, or refactoring legacy software. They not only help us in the task at hand, but can deepen our understanding of the catchphrases themselves. Let’s take a look at three I use fairly often.
Let’s start with the big picture view, with a pair of tools that can help us examine a system as a whole.
First, Entity Relationship Diagrams, or ERDs.
This particular one represents the models found in the Rubygems.org website codebase. Each ActiveRecord model found in the application is represented here, with a line pointing towards its “has” relationships, and a list of its database fields.
You can think of it as a social graph, bit like a map of the paths worn into neighborhood lawns by the kids who live in each house — each has their friends, who they spend the most time playing with. Models that have a number of lines radiating from them tend to be the popular kids, the one’s that show up again and again in your code.
Here we see User and Rubygems having the most relationships, and its no coincidence, as they’re the core concepts in the application.
They also share ownership with the most number of models. Yet they don’t directly have a relationship. Conceptually, individual instances of each relate to the other, through the intermediary model’s records. Each intermediary describes a different way they interact.
Of course, in our little neighborhood drama, Rubygem is the star of the show. It not only relates to the most other models, but it also has its own gang that User doesn’t get to hang out with.
Version, Linkset, Dependency, and GemDownload are all isolated from the rest of the system, forming their own little cul de sac of code.
In a more complex system, these cul de sacs can represent larger, more meta concepts around process and responsibilities. For example, in this app, there's a large grouping of models off to the left that all relate to a Customer model, and if we zoom in
We can see they’re all related to billing and invoicing. This would be a good place to perhaps encapsulate their interactions with the rest of the application. Or even hive it off into its own Billing service.
So we’ve seen how an entity relationship diagram can visualize relationships between domain models. Now let’s get a step closer to the code, and look at a tool that can help guide us towards refactor targets and areas of danger.
Research into large codebases has shown us repeatedly that there are two factors that when combined, are correlated with high incidence of bugs:
- The first is files that change frequently. We sometimes call these “high churn” files.
- The second is areas of code that are complex and hard to fathom.
The next tool we’re going to look at is called Turbulence, and it helps us visualize these high-churn, high-complexity hotspots.
Turbulence is a gem that generates a graph, a rectangle for each file, sized according to its rate of churn. Each rectangle’s color is based on the score of another tool, called Flog, which offers a score based on the conceptual complexity of a file. These files in red are the ones we want to have good test coverage on. And they will likely have large pieces of code that could stand to be refactored and simplified.
Continuing closer to the code, let’s look at the actual Flog score for this application. Let’s start by looking at one of the more complex files.
➜ flog -m -a app/helpers/rubygems_helper.rb 158.6: flog total 9.3: flog/method average 38.0: RubygemsHelper#link_to_github app/helpers/rubygems_helper.rb:18-25 23.2: RubygemsHelper#links_to_owners app/helpers/rubygems_helper.rb:91-95 18.0: RubygemsHelper#subscribe_link app/helpers/rubygems_helper.rb:44-58 12.9: RubygemsHelper#simple_markup app/helpers/rubygems_helper.rb:34-40 9.5: RubygemsHelper#unsubscribe_link app/helpers/rubygems_helper.rb:61-68 7.5: RubygemsHelper#latest_version_number app/helpers/rubygems_helper.rb:106-108 7.2: RubygemsHelper#github_params app/helpers/rubygems_helper.rb:111-112 6.9: RubygemsHelper#link_to_directory app/helpers/rubygems_helper.rb:28-31 6.9: RubygemsHelper#report_abuse_link app/helpers/rubygems_helper.rb:84-89 5.1: RubygemsHelper#pluralized_licenses_header app/helpers/rubygems_helper.rb:2-3 4.6: RubygemsHelper#formatted_licenses app/helpers/rubygems_helper.rb:6-10 4.0: RubygemsHelper#show_all_versions_link? app/helpers/rubygems_helper.rb:102-104 3.5: RubygemsHelper#link_to_page app/helpers/rubygems_helper.rb:14-15 3.4: RubygemsHelper#atom_link app/helpers/rubygems_helper.rb:70-73 3.4: RubygemsHelper#badge_link app/helpers/rubygems_helper.rb:79-82 2.2: RubygemsHelper#reverse_dependencies_link app/helpers/rubygems_helper.rb:75-76 2.2: RubygemsHelper#nice_date_for app/helpers/rubygems_helper.rb:98-99
Flog offers us an “opinionated” score - each method is assigned a score based on how difficult it is to understand or modify. The author has suggested 10 as a good threshold. The further a method goes above 10, the more likely it is to be in need of some simplification.
Here we can see that even though there’s a fair amount of total complexity, the average method has a score of 9.3 — this class has a large number of simple methods. The most complex method is only 38, which isn’t the worst I’ve seen, but what makes it so complex? Flog can show us what went into coming up with that score:
➜ flog -m -d app/helpers/rubygems_helper.rb 158.6: flog total 9.3: flog/method average 38.0: RubygemsHelper#link_to_github app/helpers/rubygems_helper.rb:18-25 11.1: linkset 5.8: URI 5.6: branch 5.1: home 4.8: code 2.9: nil? 2.9: host 2.5: ! 2.5: ==
Quite a few calls to a
linkset method, which increases complexity. There’s also complexity around
URI usage, and a bit around
branch which is how flog reports the cumulative effect of if/else and case/when structures. Maybe it could be cleaned up, lets take a peek
It doesn’t look too horrible, but all those dot-chained methods are kind of visually repetitive,...
def link_to_github(rubygem) if !rubygem.linkset.code.nil? && URI(rubygem.linkset.code).host == "github.com" URI(@rubygem.linkset.code) elsif !rubygem.linkset.home.nil? && URI(rubygem.linkset.home).host == "github.com" URI(rubygem.linkset.home) end rescue URI::InvalidURIError nil end
...and it can be difficult to keep straight which branch of code does what, since they look so visually similar. Let’s take a stab at cleanup, with a goal to improve the complexity score, and ignoring any other considerations for the moment.
Here’s what I came up with.
In the original, we’re performing the same set of operations on two different pieces of data in each branch of the original code.
I’ve decided to optimize for the Don’t Repeat Yourself principle. Instead of having the same code written twice with the only difference being the data it acts upon,...
...we change it so the data is an input to a block of the operations we want to perform.
def link_to_github(rubygem) result = [rubygem.linkset.code, rubygem.linkset.home].detect do |linkset| URI(linkset.to_s).host == "github.com" end URI(result.to_s) rescue URI::InvalidURIError nil end
This allows us to focus the eye on the operation we wish to perform - checking if the URI's host equals "github.com"
.This keeps the data separate from the operation, resulting in a much less complicated bit of code, which makes it easier for the next developer not only to understand, but easier for them to confidently make changes to it.
So what does flog tell us about the new code’s complexity?
Much improved! We’ve reduced overall complexity in this method by 43% - outstanding! Now, we can certainly talk about style, clarity, and performance, but it looks quite a bit better to me.
Before: 158.6: flog total 9.3: flog/method average 38.0: RubygemsHelper#link_to_github app/helpers/rubygems_helper.rb:18-25 23.2: RubygemsHelper#links_to_owners app/helpers/rubygems_helper.rb:91-95 18.0: RubygemsHelper#subscribe_link app/helpers/rubygems_helper.rb:44-58 12.9: RubygemsHelper#simple_markup app/helpers/rubygems_helper.rb:34-40 9.5: RubygemsHelper#unsubscribe_link app/helpers/rubygems_helper.rb:61-68 After: 136.9: flog total 8.1: flog/method average 23.2: RubygemsHelper#links_to_owners app/helpers/rubygems_helper.rb:91-95 18.0: RubygemsHelper#subscribe_link app/helpers/rubygems_helper.rb:44-58 16.4: RubygemsHelper#link_to_github app/helpers/rubygems_helper.rb:18-25 12.9: RubygemsHelper#simple_markup app/helpers/rubygems_helper.rb:34-40 9.5: RubygemsHelper#unsubscribe_link app/helpers/rubygems_helper.rb:61-68 7.5: RubygemsHelper#latest_version_number app/helpers/rubygems_helper.rb:106-108
I’ll often run this tool as a before/after on my pull requests, just to make sure I haven’t accidentally added excessive amounts of complexity; after all, complexity adds up the more you try to be clever!
When the difference is relevant or interesting, I’ll paste the output from Flog into the pull request or even in an extended commit message, if I want to focus a reviewer’s attention on the decrease in complexity. Especially when working on a refactor project, it is valuable to be able to measure the actual improvement rather than rely only upon how you feel about the change.
So we've looked at 3 different tools that help us see our code from new angles.
We looked at rails-erd, which generates a graph of our ActiveRecord models and how they relate to each other.
We used turbulence to chart the relationship between churn and complexity in our code, to reveal hot spots where bugs might crop up in the future.
And lastly we used flog to explore the complexity of our code in order to understand why it might be hard to change, and to guide us in simplifying it.
There are dozens more tool and metrics to apply to your codebase, but keep in mind that these tools have an opinion, and gaming how you write code to make the tools happy isn't always the best approach. Doing so, however, can teach you quite a bit about the concerns the tool is built to address.
It is best to treat all these tools as lenses, allowing you to explore different avenues for solutions, and increase your own personal understanding of WHY a given command, structure, or practice could be considered complicated or problematic. They can help you value the constructs the language does (or doesn't) provide you to shortcut otherwise complicated code.