In Progress
Unit 1, Lesson 1
In Progress

Break The Circle

Video transcript & code

In episode #380, we looked at a fragment of a class I had been working on. The class had a problem: a circular dependency that I had inadvertently introduced. As a consequence, under certain circumstances, it was getting into an infinite recursion and blowing the stack.

Here's the code.

The specific problem is this: in the orgmode_lisp_dir method, so long as the user explicitly sets a special environment variable, QUARTO_ORGMODE_VENDOR_LISP_DIR, the code works fine.

But when this variable is unset, the code falls back to deriving the lisp directory value from a vendor_dir method elsewhere in the class.

Unfortunately, vendor_dir has been changed to delegate back to orgmode_lisp_dir. So when the environment variable hasn't been set up, this code calls itself back and forth until it runs out of stack frames.

#<<deps>>
module Quarto
  class Orgmode < Plugin
    # ---- 74 lines snipped ----

    def version
      "8.0.7"
    end

    # --- 8 lines snipped ----

    def orgmode_lisp_dir
      ENV.fetch("QUARTO_ORGMODE_VENDOR_LISP_DIR") {
        "#{vendor_dir}/lisp"
      }
    end

    # ---- 15 lines snipped ----

    def vendor_dir
      Pathname(orgmode_lisp_dir).parent.to_s
    end

    # ---- 70 lines snipped
  end
end

My first thought was simply to revert the change that introduced this bug.

Here's the reverted version.

This code depends on an external, global vendor_dir in order to derive a plugin-specific vendor directory. There's no circular reference back to #orgmode_lisp_dir.

#<<original>>
module Quarto
  class Orgmode < Plugin
    # ...
   def vendor_dir
      (Pathname(main.vendor_dir) + "org-#{version}").to_s
    end
    # ...
  end
end
#<<all_tests>

Unfortunately, this version doesn't capture the semantics I want.

Let's talk about those semantics for a minute. The core of the problem we're dealing with is that there may be one of two different sources of truth for this code to derive paths from. If the user has done no special configuration, then the prime source of truth is the globally configured vendor directory. Both the plugin-specific vendor_dir and the orgmode_lisp_dir should be derived from this value.

But, if the user specifically configures a particular QUARTO_ORGMODE_VENDOR_LISP_DIR, then that value becomes the primary source of truth. Now, instead of the lisp directory being derived from the vendor directory, the reverse is true: the vendor directory should be extrapolated from the lisp directory.

We can talk about whether this is an ideal system design another time. The important thing right now is that this kind of behavior, where the source of truth changes depending on what the user has configured, is a requirement that comes up from time to time in application construction. So it's worth thinking about how to solve this problem cleanly.

In order to express the true intended semantics of the class, we need to introduce a conditional to the vendor_dir method. We need to check if the user has configured QUARTO_ORGMODE_VENDOR_LISP_DIR. If so, we need to treat it as the source of truth. Otherwise, we need to build our result on the global vendor directory.

#<<original>>
module Quarto
  class Orgmode < Plugin
    # ...
   def vendor_dir
     if ENV.key?("QUARTO_ORGMODE_VENDOR_LISP_DIR")
       Pathname(ENV["QUARTO_ORGMODE_VENDOR_LISP_DIR"]).parent.to_s
     else
       (Pathname(main.vendor_dir) + "org-#{version}").to_s
     end
    end
    # ...
  end
end
#<<all_tests>>

Let's look at the two methods we've been talking about. What do they now share in common?

It might not be immediately obvious, but both of these methods now share a common structure. They are both switching their behavior on the presence or absence of the same environment variable.

Let's make this sameness more obvious, by converting orgmode_lisp_dir from using the #fetch method to using an explicit if/else conditional.

#<<original>>
module Quarto
  class Orgmode < Plugin
    # ...
    def orgmode_lisp_dir
      if ENV.key?("QUARTO_ORGMODE_VENDOR_LISP_DIR")
        ENV["QUARTO_ORGMODE_VENDOR_LISP_DIR"]
      else
        "#{vendor_dir}/lisp"
      end
    end

    # ...

    def vendor_dir
      if ENV.key?("QUARTO_ORGMODE_VENDOR_LISP_DIR")
        Pathname(ENV["QUARTO_ORGMODE_VENDOR_LISP_DIR"]).parent.to_s
      else
        (Pathname(main.vendor_dir) + "org-#{version}").to_s
      end
    end
    # ...
  end
end
#<<all_tests>>

This version has identical semantics, but it's now a lot easier to see the similarities in structure between our two methods. This is a phenomena we've talked about in prior episodes: sometimes we need to "de-factor" code in order to better reveal code smells that we can then refactor away.

What's the code smell in this case? It's one we've seen before. We might say it is the object-oriented code smell to rule them all: repeating the same conditional code over and over again.

At this point we are on familiar ground. What do we do when the same conditional keeps being repeated? We come up with two different objects to represent the two branches of the conditional, and then ensure that the switch is only performed in one place.

But wait a second… coming up with two new objects, just for this pair of methods? Doesn't that seem a bit… heavyweight?

Don't worry. I didn't say we needed new classes. Just two different objects, to hold two different sets of answers. For this type of "data-holder", I think a simple Hash will do nicely.

We write a new method, called dirs.

Inside, we write our switch on the presence or absence of the environment variable.

If the variable is present, we assign it to a convenience variable.

Then we construct and return a hash, with values for the plugin vendor directory, and for the lisp directory, both derived from the user-configuration variable.

If it is absent, we derive the plugin specific vendor directory from the global vendor directory.

Then we construct and return a different hash. This one has the same two keys, but derives the values from the main vendor directory value instead.

#<<original>>
module Quarto
  class Orgmode < Plugin
    # ...
    def dirs
      if ENV.key?("QUARTO_ORGMODE_VENDOR_LISP_DIR")
        lisp_dir = ENV["QUARTO_ORGMODE_VENDOR_LISP_DIR"]
        {
          vendor: Pathname(lisp_dir).parent.to_s,
          lisp:   lisp_dir
        }
      else
        vendor_dir = (Pathname(main.vendor_dir) + "org-#{version}").to_s
        {
          vendor: vendor_dir,
          lisp:   "#{Pathname(main.vendor_dir)}/lisp"
        }
      end
    end
    # ...
  end
end
#<<all_tests>>

With this done, we rewrite the orgmode_lisp_dir method to call dirs and return the :lisp key.

And we rewrite the vendor_dir to return the :vendor key.

Remember how back in episode #380 we talked about the guideline of making sure that methods higher up in a class have dependencies on the methods lower down, and not vice-versa? Well, that's what we've now accomplished.

Instead of pointing at each other, both the orgmode_lisp_dir and the vendor_dir methods now depend downwards, to the dirs method.

#<<original>>
module Quarto
  class Orgmode < Plugin
    # ...
    def orgmode_lisp_dir
      dirs[:lisp]
    end

    # ...
    def vendor_dir
      dirs[:vendor]
    end

    def dirs
      if ENV.key?("QUARTO_ORGMODE_VENDOR_LISP_DIR")
        lisp_dir = ENV["QUARTO_ORGMODE_VENDOR_LISP_DIR"]
        {
          vendor: Pathname(lisp_dir).parent.to_s,
          lisp:   lisp_dir
        }
      else
        vendor_dir = (Pathname(main.vendor_dir) + "org-#{version}").to_s
        {
          vendor: vendor_dir,
          lisp:   "#{Pathname(main.vendor_dir)}/lisp"
        }
      end
    end
    # ...
  end
end
#<<all_tests>>

The circle of dependencies has now been broken. And the truth about our application's intended behavior is now more apparent in the code. The truth, that is, that setting one environment variable changes how the location of more than one directory will be determined.

I think that's plenty for today. Happy hacking!

Responses