In Progress
Unit 1, Lesson 1
In Progress

Multiline Strings

Video transcript & code

If there is one thing Ruby has no lack of, it is syntax for quoting strings. No matter what our string-quoting need, Ruby usually has a built-in way to accomplish it. Today we're going to focus on just one string-quoting scenario: quoting literal strings that extend across multiple lines.

Now, first things first: Ruby has no problem with us simply starting a string on one line and then continuing it on across multiple lines. We don't need any special syntax to support this.

jabberwocky = "'twas brillig and the slithy toves
did gyre and gimbal in the wabe
all mimsy were the borogroves
and the mome raths, outgrabe"

But where we need a multiline string, it often means we are quoting some real-world text, and real-world text tends to contain inconvenient artifacts such as single and double quotes. If we continue our recitation of the poem "Jabberwocky" in this code, we quickly run into problems: the introduction of a quotation causes the parser to think the string is over.

jabberwocky = "'twas brillig and the slithy toves
did gyre and gimbal in the wabe
all mimsy were the borogroves
and the mome raths, outgrabe

"Beware the Jabberwock, my son""

We can fix the issue by escaping the quotation marks with a backslash.

jabberwocky = "'twas brillig and the slithy toves
did gyre and gimbal in the wabe
all mimsy were the borogroves
and the mome raths, outgrabe

\"Beware the Jabberwock, my son\""

But this is a solution that requires constant vigilance, and one that's tedious to maintain. We'd rather quote the text in such a way that we don't need to worry about the odd quotation mark.

This is where percent-quoting enters the picture. Instead of a double-quote, we can begin our string with a percent sign, followed by a delimiter. We have a lot of leeway in choosing our delimiter. For instance, we can choose an open brace, and Ruby will automatically infer that the literal will be terminated by a close brace.

jabberwocky = %{'twas brillig and the slithy toves
did gyre and gimbal in the wabe
all mimsy were the borogroves
and the mome raths, outgrabe

"Beware the Jabberwock, my son"}

Besides for matching open/close delimiters like braces, square brackets, and parentheses, we can also use symbols such as the pipe to both open and close the quote.

jabberwocky = %|'twas brillig and the slithy toves
did gyre and gimbal in the wabe
all mimsy were the borogroves
and the mome raths, outgrabe

"Beware the Jabberwock, my son"|

This grants us a lot of flexibility in quoting strings that might contain common delimiters. So long as we know a character that the string definitely does not and will never contain, we can quote it robustly without having to worry about escaping special characters.

But even the most obscure of characters have a way of showing up inside of quotes as a program evolves. Especially in cases where the string being quoted contains source code itself, it's not unusual for all manner of funny characters to turn up where we least expect them, suddenly causing our code to cease to parse correctly.

jabberwocky = %|this is a nice quote you have here
it would be a real shame if someone were to add a | to it|

And there's another issue with percent-quoted strings. For reasons of code readability, we often find it desirable to indent multiline strings using a block style, the same way we indent our code.

jabberwocky = %{
  'twas brillig and the slithy toves
  did gyre and gimbal in the wabe
  all mimsy were the borogroves
  and the mome raths, outgrabe

  "Beware the Jabberwock, my son"
}

jabberwocky
# => "\n  'twas brillig and the slithy toves\n  did gyre and gimbal in the wabe\n  all mimsy were the borogroves\n  and the mome raths, outgrabe\n\n  \"Beware the Jabberwock, my son\"\n"

Not only does this string now contain the spaces used to indent each line, it also contains leading and trailing newlines where we separated the delimiters from the quoted text.

If we want to avoid both these leading newlines and the possibility of stray characters sneaking in and messing up our quoting, we need to bring out the big guns: "here documents", or "heredocs". A heredoc begins with two left angle brackets, followed by a delimiter token. We'll use capitalized "EOF" here, which stands for "end of file" and is a common heredoc ending token. On a line after the end of the text to be quoted, we place the ending alone on its own line.

jabberwocky = <<EOF
  'twas brillig and the slithy toves
  did gyre and gimbal in the wabe
  all mimsy were the borogroves
  and the momraths, outgrabe

  "Beware the Jabberwock, my son"
EOF

jabberwocky
# => "  'twas brillig and the slithy toves\n  did gyre and gimbal in the wabe\n  all mimsy were the borogroves\n  and the momraths, outgrabe\n\n  \"Beware the Jabberwock, my son\"\n\n  2\n"

We can see that the quoted text now no longer contains leading newlines. This is because a heredoc doesn't begin until the line after the syntax which introduces it.

In fact, it's misleading to even think of a heredoc as having beginning and ending delimiters. It would be more accurate to say that the heredoc syntax tells the Ruby parser: "starting on the next line, and until you find this token on a line by itself, stop parsing Ruby and instead just suck up text".

One result of the fact that a heredoc doesn't begin until the following line is that we can actually chain calls onto the heredoc signifier in order to modify the resulting string. Here's an example of converting the string to uppercase before assigning it to the variable. In effect, the heredoc signifier acts as a placeholder for the string which is yet to be read in.

jabberwocky = <<EOF.upcase
  'twas brillig and the slithy toves
  did gyre and gimbal in the wabe
  all mimsy were the borogroves
  and the momraths, outgrabe

  "Beware the Jabberwock, my son"
EOF

jabberwocky
# => "  'TWAS BRILLIG AND THE SLITHY TOVES\n  DID GYRE AND GIMBAL IN THE WABE\n  ALL MIMSY WERE THE BOROGROVES\n  AND THE MOMRATHS, OUTGRABE\n\n  \"BEWARE THE JABBERWOCK, MY SON\"\n"

What makes heredocs so much more robust than percent quoting for arbitrary strings is the fact that we aren't limited to single-character delimiters to mark the boundaries of the quoted string. We've used the string EOF here, but we could just as easily use a longer token which is highly unlikely to show up in the quoted text.

jabberwocky = <<A_VERY_UNIQUE_STRING.upcase
  'twas brillig and the slithy toves
  did gyre and gimbal in the wabe
  all mimsy were the borogroves
  and the momraths, outgrabe

  "Beware the Jabberwock, my son"
A_VERY_UNIQUE_STRING

jabberwocky
# => "  'TWAS BRILLIG AND THE SLITHY TOVES\n  DID GYRE AND GIMBAL IN THE WABE\n  ALL MIMSY WERE THE BOROGROVES\n  AND THE MOMRATHS, OUTGRABE\n\n  \"BEWARE THE JABBERWOCK, MY SON\"\n"

So far we've treated heredocs as if they contain purely static text, but this isn't entirely the case. By default, heredocs may contain standard Ruby interpolation syntax, and these segments will be evaluated in the usual way.

answer = <<EOF
one plus one is #{1 + 1}
EOF

answer                          # => "one plus one is 2\n"

There may well be situations in which we don't want interpolation to tale place. In that case, we can put single quotes around the termination token. This tells Ruby to treat the heredoc as a single-quoted string and simply preserve all of the content—including single quotes—as-is.

answer = <<'EOF'
one plus one is #{1 + 1}
EOF

answer                          # => "one plus one is \#{1 + 1}\n"

It's not uncommon to find a heredoc inside a class, module, or method. In this context, heredoc parsing rules present a barrier to good code style. By default, the terminator token must appear on a line by itself, and it must be the only thing on that line - it can't even have leading whitespace! This prevents us from indenting the whole heredoc block as we might like.

module Wonderland
  JABBERWOCKY = <<-'EOF'
  'twas brillig and the slithy toves
  EOF
end

In order to address this annoyance, heredocs have another optional mode. If we precede the terminator token with a dash, Ruby will relax its rules for finding the token. Instead of only looking for it at column zero, it will permit the terminator to appear with any amount of whitespace before it. This lets us aline the terminator with the line that initiated the heredoc.

If you've been watching carefully, you may have noticed one problem this doesn't address: the leading spaces we used to indent the heredoc also show up as literal spaces within the resulting. Getting rid of indentation in heredoc strings is a topic we'll cover in an upcoming episode.

For completeness' sake, there's one other feature of heredocs that's worth noting. We can construct multiple heredocs in a row. Here's an example to illustrate what I mean. We have a unit test that compares two strings. Each string is specified using a heredoc. Then, on the after the assertion, we specify one heredoc after the other. Ruby will extract the first heredoc contents up to the first terminator. After that, it will start slurping in the second heredoc until it gets to that heredoc's terminator. We've used different terminators for each heredoc here, but we also could have used the same one for both. Ruby just parses them in order of appearance.

require "minitest/autorun"

class HereTest < ::MiniTest::Test
  def test_heredoc
    assert_equal <<-EXPECTED, <<-ORIGINAL.upcase
    TWEEDLE-DEE
    EXPECTED
    tweedle-dee
    ORIGINAL
  end
end
# >> Run options: --seed 8502
# >>
# >> # Running:
# >>
# >> .
# >>
# >> Finished in 0.004864s, 205.5970 runs/s, 205.5970 assertions/s.
# >>
# >> 1 runs, 1 assertions, 0 failures, 0 errors, 0 skips
require "minitest/autorun"

class HereTest < ::MiniTest::Test
  def test_heredoc
    assert_equal <<-EOF, <<-EOF.upcase
    TWEEDLE-DEE
    EOF
    tweedle-dee
    EOF
  end
end
# >> Run options: --seed 10057
# >>
# >> # Running:
# >>
# >> .
# >>
# >> Finished in 0.000808s, 1238.0313 runs/s, 1238.0313 assertions/s.
# >>
# >> 1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

Like I said, I include this for completeness. I can't remember the last time I found a good use for this feature.

And with that, we come to the end of today's overview of multiline quoting techniques in Ruby. Happy hacking!

Responses