In Progress
Unit 1, Lesson 1
In Progress

Catch And Throw

Have you ever encountered code that used exceptions for flow control? That’s where code raises an exception, not to signal an error, but in order to make an early escape from a deeply nested chain of method calls.

Triggering an exception for a non-exceptional case is a kludge. Fortunately, in Ruby it’s not a necessary kludge.

In this episode, you’ll watch a real-world refactoring session in which I replaced an exception used for flow control with a use of Ruby’s catch and throw feature.

Video transcript & code

Everyone loves a refactoring episode, right? Well in this episode I'm going to show off a refactoring I recently performed on the Discourse codebase. But more importantly, I'm going to demonstrate the use of Ruby's catch and throw methods.

Here's a class from the Discourse codebase which is responsible for parsing out short excerpts from web pages. It's a Nokogiri SAX event handler class, a topic we covered in the previous episode.

class ExcerptParser < Nokogiri::XML::SAX::Document

  class DoneException < StandardError; end

  # ...

  def self.get_excerpt(html, length, options)
    me = self.new(length,options)
    parser = Nokogiri::HTML::SAX::Parser.new(me)
    begin
      parser.parse(html) unless html.nil?
    rescue DoneException
      # we are done
    end
    me.excerpt
  end

  # ...

  def characters(string, truncate = true, count_it = true, encode = true)
    return if @in_quote
    encode = encode ? lambda{|s| ERB::Util.html_escape(s)} : lambda {|s| s}
    if count_it && @current_length + string.length > @length
      length = [0, @length - @current_length - 1].max
      @excerpt << encode.call(string[0..length]) if truncate
      @excerpt << "…"
      @excerpt << "</a>" if @in_a
      raise DoneException.new
    end
    @excerpt << encode.call(string)
    @current_length += string.length if count_it
  end
end

I'm not going to go over this class in detail. The important parts are these:

  1. Parsing is kicked off in a class method called .get_excerpt.
  2. There is a method, #characters, which is called back by the parser whenever it comes across plain text in the parsed document. This method collects text until it has enough for an excerpt.

This class solves an interesting problem: in order to get a page excerpt, the ExcerptParser only needs to collect text from the HTML page until it reaches the desired excerpt length. Any parsing after that is wasted effort and time. But once we tell the parser to start parsing, it won't stop until it gets to the end of the document, feeding events to the ExcerptParser the whole time.

As a result, this class needs to be able to "break out" of the parser and signal the code that started the parse that it has all it needs. The way it does this is to raise an exception. The code that starts the parse then rescues this exception and throws it away. The exception that is raised, DoneException, is defined especially for this task.

This technique, called "using exceptions for flow control" is clunky and verbose. It feels wrong to use a construct made for signaling errors to accomplish an early–but normal–termination like this. Nonetheless, in some programming languages it is a necessary evil. Thankfully, Ruby isn't one of those languages.

To refactor this code, we change the bulky begin/rescue/end clause to a call to catch. We give catch a symbol, :done, as an argument. Then we enclose the call to parser.parse in a block passed to catch.

catch will execute the code inside the block. If at any point code executed within this block "throws" the symbol :done, catch will do exactly what it sounds like - catch the symbol. Execution will then proceed onward from the from the end of the catch block.

Incidentally, there's nothing special about the choice of the symbol :done; we could have used any symbol. We pick :done because it matches the name of the exception that was used before.

We then move on to the characters method. We change the raise to a throw. We give it the symbol :done as its argument.

Finally, we eliminate the DoneException, which is no longer used.

class ExcerptParser < Nokogiri::XML::SAX::Document

  # ...

  def self.get_excerpt(html, length, options)
    me = self.new(length,options)
    parser = Nokogiri::HTML::SAX::Parser.new(me)
    catch(:done) do
      parser.parse(html) unless html.nil?
    end
    me.excerpt
  end

  # ...

  def characters(string, truncate = true, count_it = true, encode = true)
    return if @in_quote
    encode = encode ? lambda{|s| ERB::Util.html_escape(s)} : lambda {|s| s}
    if count_it && @current_length + string.length > @length
      length = [0, @length - @current_length - 1].max
      @excerpt << encode.call(string[0..length]) if truncate
      @excerpt << "…"
      @excerpt << "</a>" if @in_a
      throw :done
    end
    @excerpt << encode.call(string)
    @current_length += string.length if count_it
  end
end

This code operates very similarly to how it did before. When the ExcerptParser finds it has enough text for an excerpt, it "throws" the :done symbol. This unwinds the call stack, just like an exception, until it finds a matching :catch block.

OK, so if this works just like an exception, what was the point of this change? Well, there are a few reasons to prefer this version.

  1. First and foremost, the code no longer suggests an error condition where there actually is none. I think this is more honest and less confusing.
  2. There is less code. In particular, there's no single-use exception class needed.
  3. The catch clause has two fewer lines, making it less obtrusive.
  4. catch(:done) puts the signal that is expected (:done) right at the top of the block. So reading through the code you can see right away that the code in this block might terminate early if it signals that it is :done. By contrast, with a begin/rescue/end block we have to skip down to the end of the code in question to see what errors might occur. This is fine for errors, but less fine for perfectly normal early returns.

One thing we haven't talked about yet is using throw with more than one argument. But we'll leave that for another episode. Happy hacking!

Responses