In Progress
Unit 1, Lesson 1
In Progress

Referential Transparency

Video transcript & code

Data goes in; data comes out. It's a simple concept, and it's the basis for functional programming. If a method is a pure function of its parameters it's easy to reason about and easy to test.

This property of a method is known as "referential transparency". When we change the context that a referentially transparent method is called in, its behavior will never change. When chasing down the source of a change in program behavior, we can be confident that the variance must have been in such a function's inputs, and not in the method itself.

In the episode #268 we looked at a predicate method harvested from some real-world code. Here it is in its original form. It's a little hard to see right now, because Emacs forgot its glasses today. But that's OK, because this view gives an opportunity to practice something called the squint test.

The squint test is when we zoom out on some code, squint at it, and try to get a feel for it just from shape and color. In this particular instance, let's pay special attention to color. There's a lot of light blue in this method, which is the color my editor uses to highlight constants.

Even without being able to read the code, this causes my overall level of uneasiness to increase a little bit. Because instinctively, I know that this probably means that this method takes inputs from sources other than its parameters. It is not referentially transparent. That means if I am quickly scanning through code, trying to figure out why something is no longer working, this method has to be added to the list of possible culprits.

When we dial in the focus we can see that, indeed, this method has some implicit global dependencies. Specifically, it repeatedly references Time.current. This is a method added by activesupport, which for the purpose of today's video you can think of as identical to Time.now.

class Reminder
  attr_reader :reminder_at, :completed, :last_reminded_at

  def initialize(completed: false, reminder_at:, last_reminded_at: nil)
    @completed = completed
    @reminder_at = reminder_at
    @last_reminded_at = last_reminded_at
  end

  alias completed? completed

  def alert_due?
    [
     !completed?,
     reminder_at <= Time.current,
     last_reminded_at == nil || last_reminded_at <= 23.hours.ago,
     reminder_at.hour <= Time.current.hour,
     reminder_at.min <= Time.current.min,
    ].all?
  end

end

It's likely that you've heard from me or others about how having implicit global dependencies makes a method harder to test in isolation. Which is true. You've may have also heard that these difficulties in testing are just pointers to a deeper design problem. This is also true.

But today I want to show you a latent problem in this code that you may never have considered.

Every call to Time.current is a form of I/O. It is a request from Ruby-land to the computer for some information about the outside world: specifically, what time it is. And the thing about time is that it is perpetually marching on.

We might imagine that these system calls occur so close together that this will never make a difference. But in fact there is no guarantee of this. If the system is heavily loaded; if there are many threads vying for the scheduler's attention; or if Ruby decides it's time to run a garbage collection cycle to recover some memory; then one of these syscalls might occur significantly later than the first.

We can simulate this scenario in a test, by stubbing out Time.current to return first one value, then another. The first value, if it were to stay consistent, would cause the whole predicate to return true. But the second value, a minute later, causes later tests to fail, which in turn causes the whole predicate to return false instead.

require "rspec/autorun"

RSpec.describe Reminder do
  it "isn't susceptible to threading issues" do
    reminder = Reminder.new(reminder_at: Time.new(2014, 11, 10, 23, 59))
    allow(Time).to receive(:current).and_return(Time.new(2014, 11, 10, 23, 59),
                                                Time.new(2014, 11, 11, 0, 0))
    expect(reminder).to be_alert_due
  end
end

We can confirm this result when we run the test.

Code like this can lead to the most frustrating kind of bug: a misbehavior that only happens sporadically, at certain times of day, under certain system loads, and can never be consistently reproduced in a debugging scenario.

The solution is simple: we extract all of the calls to Time.current to a single local variable, which is set exactly once per call.

def alert_due?
  current_time = Time.current
  [
   !completed?,
   reminder_at <= current_time,
   last_reminded_at == nil || last_reminded_at <= 23.hours.ago,
   reminder_at.hour <= current_time.hour,
   reminder_at.min <= current_time.min,
  ].all?
end

Once we make this change, it seems silly not to take it one step further: instead of a local variable, we promote the variable to a parameter, with a default of Time.current.

def alert_due?(current_time = Time.current)
  [
   !completed?,
   reminder_at <= current_time,
   last_reminded_at == nil || last_reminded_at <= 23.hours.ago,
   reminder_at.hour <= current_time.hour,
   reminder_at.min <= current_time.min,
  ].all?
end

We've now brought the method closer to being free of implicit references to global state. But we are not quite there. Sometimes global I/O dependencies aren't so easy to spot. This is especially true when Rails extensions are in the picture. Can you spot what we missed?

Yep, it's this code here: 23.hours.ago. This is implicitly referencing the current time in order to determine what time it would have been 23 hours ago. We can fix this by instead using the current_time variable.

def alert_due?(current_time = Time.current)
  [
   !completed?,
   reminder_at <= current_time,
   last_reminded_at == nil || last_reminded_at <= current_time - 23.hours,
   reminder_at.hour <= current_time.hour,
   reminder_at.min <= current_time.min,
  ].all?
end

We've now made this method referentially transparent. Well, at least we have under a looser, object-oriented definition that also includes the object's own attributes among the inputs. Given a known instantiation of the object, the only variance can come from the current_time argument. Which we can control, if we want to.

The upshot: we've made this method easier to reason about, easier to test in isolation, and eliminated a devilishly sneaky defect, all by increasing the method's referential transparency.

Happy hacking!

Responses