In Progress
Unit 1, Lesson 1
In Progress

Flip-Flop

Video transcript & code

In the previous episode we learned about the Range type. Ranges are Ruby's way of representing ranges of values, such as 0 to 10, or 'a' to 'z'.

0..10                           # => 0..10
'a'..'z'                        # => "a".."z"

We've already looked at a number of ways we can put ranges to work. Today we're going to dig into a more esoteric and non-obvious use for ranges.

First, some background. I write the scripts of RubyTapas episodes in an Emacs extension called Org-Mode, which has a specialized file format. Here's a simple example org-mode file, called example.org. As you can see, there is body text, interspersed with delimited code listings.

Let's say we want to extract just the code listings from this file. We want to keep the delimiters around the code blocks, but we don't want any body text. Here's one way we might go about it.

First, we create some regular expressions to match the code block delimiters, and assign them to constants for convenience. Then we set up a variable called in_code, initializing it to false. We'll use this variable to track whether we are currently within a code listing.

Then we start up a loop over the lines of the Org-Mode file. For each line, we first check to see if we are looking at the start of a block. If so, in_code will be set to a truthy value. Otherwise it will continue to hold whatever value it had previously.

Next we print out the current line if the in_code variable is truthy.

Finally, we check to see if the current line marks the end of a source listing. If so, we set in_code back to false.

The variable in_code effectively acts as a "flip-flop switch" in this program. It starts out in position one, false. It stays in that position until an event—finding a BEGIN_SRC delimiter—trips it over to position two. From then on it stays in the second position until an ending delimiter flips it back over to position one again.

When we execute this code, we see that extracts the code blocks just as we intended.

BEGIN_PATT = /\A#\+BEGIN_SRC/
END_PATT   = /\A#\+END_SRC/

in_code = false

IO.foreach("example.org") do |line|  
  in_code = (BEGIN_PATT =~ line) || in_code
  puts line if in_code
  in_code = false if line =~ END_PATT
end
# >> #+BEGIN_SRC ruby
# >>   1 + 1                           # => 2  
# >> #+END_SRC
# >> #+BEGIN_SRC ruby
# >>   def hello
# >>     puts "hello, world"
# >>   end
# >> #+END_SRC
# >> #+BEGIN_SRC ruby
# >>   def fibonacci( n )
# >>     return n if ( 0..1 ).include? n
# >>     ( fibonacci( n - 1 ) + fibonacci( n - 2 ) )
# >>   end
# >> #+END_SRC

Now let's look at a different way to accomplish the same task. We'll get rid of our in_code variable, as well as the lines that update it. Instead, we'll add an if statement modifier to the puts line. After the if, we put a test of the beginning pattern against the current line, followed by the dot-dot range syntax, followed by a test of the ending pattern against the current line.

And that's it. When we run this code, we get the same result as before!

BEGIN_PATT = /\A#\+BEGIN_SRC/
END_PATT   = /\A#\+END_SRC/

IO.foreach("example.org") do |line|  
  puts line if BEGIN_PATT =~ line .. END_PATT =~ line
end
# >> #+BEGIN_SRC ruby
# >>   1 + 1                           # => 2  
# >> #+END_SRC
# >> #+BEGIN_SRC ruby
# >>   def hello
# >>     puts "hello, world"
# >>   end
# >> #+END_SRC
# >> #+BEGIN_SRC ruby
# >>   def fibonacci( n )
# >>     return n if ( 0..1 ).include? n
# >>     ( fibonacci( n - 1 ) + fibonacci( n - 2 ) )
# >>   end
# >> #+END_SRC

So what is going on here? Is Ruby constructing a special Range object which has regular expression tests as its beginning and ending values?

We can test this theory by extracting the range out into a variable of its own. When we try and execute this code, we get an error: "bad value for range". And this shouldn't be too surprising when we look at this code. The way Ruby's parser works, those two regex operations will be executed before the range is constructed. So this is really a range between a beginning and ending value which are both either true or false. Since it doesn't make sense to construct a Range from boolean values, Ruby rejects it.

BEGIN_PATT = /\A#\+BEGIN_SRC/
END_PATT   = /\A#\+END_SRC/

IO.foreach("example.org") do |line|  
  range = (BEGIN_PATT =~ line .. END_PATT =~ line)
  puts line if range
end
# ~> -:7:in `block in <main>': bad value for range (ArgumentError)
# ~>    from -:6:in `foreach'
# ~>    from -:6:in `<main>'
# >> Body text...
# >>

So now we know what Ruby isn't doing. But this doesn't really help us understand what's actually going on.

In fact, Ruby isn't treating the double-dot syntax as a Range literal at all in this context. Ruby is a big language, and it has a lot of special parsing rules for special situations. This turns out to be one of those special situations.

When Ruby sees the double-dot used as the predicate in a conditional statement, it doesn't construct a Range. Instead, it treats the double-dot as a flip-flop operator. The resulting code then behaves exactly the same as our original version with the explicit flip-flop variable. In effect, Ruby writes that code for us, including an anonymous variable to track the current state of the flip-flop.

What possible reason could Ruby have for this special case? Well, you're looking at it. Ruby inherits this feature from Perl, which in turn inherited it from AWK. AWK is a language dedicated to processing records—which usually means lines in a file—using pattern matching and commands. If you're curious, here's the equivalent AWK script. On the left is a range of two patterns, and on the right inside curly braces is the command to execute when that range is matched.

/^#\+BEGIN_SRC/, /^#\+END_SRC/ { print }

The flip-flop operator exists to make it possible to process lines in files as easily as we can in an AWK script. For this purpose it works very well. However, outside of the text-processing context it is of limited utility.

And that's all for today. Happy hacking!

Responses