In Progress
Unit 1, Lesson 1
In Progress

Word Wrap

Video transcript & code

Sooner or later, you'll be writing a command-line tool and find yourself needing to dump some long-form text to the console. Maybe you're displaying program documentation, or maybe it's just a message-of-the-day. Whatever the reason, if you have to dump text to the terminal, you have to think about word wrapping.

TEXT = <<END
Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. Many a brave soul did it send hurrying down to Hades, and many a hero did it yield a prey to dogs and vultures, for so were the counsels of Jove fulfilled from the day on which the son of Atreus, king of men, and great Achilles, first fell out with one another.

And which of the gods was it that set them on to quarrel? It was the son of Jove and Leto; for he was angry with the king and sent a pestilence upon the host to plague the people, because the son of Atreus had dishonoured Chryses his priest. Now Chryses had come to the ships of the Achaeans to free his daughter, and had brought with him a great ransom: moreover he bore in his hand the sceptre of Apollo wreathed with a suppliant's wreath and he besought the Achaeans, but most of all the two sons of Atreus, who were their chiefs.

"Sons of Atreus," he cried, "and all other Achaeans, may the gods who dwell in Olympus grant you to sack the city of Priam, and to reach your homes in safety; but free my daughter, and accept a ransom for her, in reverence to Apollo, son of Jove."
END

Well, OK, maybe you don't have to think about it. You could always just dump the text as-is, and rely on the terminal's built-in word-wrapping abilities. But those abilities are pretty much nonexistant. If I do this in my terminal, it's perfectly happy to cut words in the middle. And the wider the terminal, the harder it is to read long-form text. There's a reason newspaper columns are narrow.

require "./text"
puts TEXT

Another option is to make sure there are hard newlines in the text we have to print. But this only works if the text is statically embedded in our program where we control it, and doesn't come from some external source.

And even then, if we have text formatted to fit in 80 columns and dump it to a 60-column-wide terminal, we get something really ugly.

HARD_NEWLINE_TEXT = <<END
Sing, O goddess, the anger of Achilles son of Peleus, that brought
countless ills upon the Achaeans. Many a brave soul did it send
hurrying down to Hades, and many a hero did it yield a prey to dogs
and vultures, for so were the counsels of Jove fulfilled from the day
on which the son of Atreus, king of men, and great Achilles, first
fell out with one another.

And which of the gods was it that set them on to quarrel? It was the
son of Jove and Leto; for he was angry with the king and sent a
pestilence upon the host to plague the people, because the son of
Atreus had dishonoured Chryses his priest. Now Chryses had come to the
ships of the Achaeans to free his daughter, and had brought with him a
great ransom: moreover he bore in his hand the sceptre of Apollo
wreathed with a suppliant's wreath and he besought the Achaeans, but
most of all the two sons of Atreus, who were their chiefs.

"Sons of Atreus," he cried, "and all other Achaeans, may the gods who
dwell in Olympus grant you to sack the city of Priam, and to reach
your homes in safety; but free my daughter, and accept a ransom for
her, in reverence to Apollo, son of Jove."
END
require "./text"
puts HARD_NEWLINE_TEXT

Word-wrapping is such a common need that we might expect to find a method for it in the core string methods, or at least in the standard library. But surprisingly, this is an omission in the libraries which ship with Ruby.

One way to wrap text to an arbitrary width for console display is using a regular expression and the String#gsub method.

def wrap(text, columns=80)
  text.gsub(/(.{1,#{columns}})(\s+|\Z)/, "\\1\n")
end

Let's try this method out, first on text without embedded newlines.

require "./text.rb"
require "./regex_wrap.rb"

puts wrap(TEXT)

This works surprisingly well for a one line of code. But we can see limitations already. For instance, the blank lines between paragraphs have disappeared.

And if our input already contains newlines and we need to re-wrap it to a different width, this approach falls apart completely.

require "./text.rb"
require "./regex_wrap.rb"

puts wrap(HARD_NEWLINE_TEXT, 60)

We could try to improve on this gsub/regex approach, but it's already hard to understand, and anything we added would just make it more so. So let's move on to other solutions.

Remember earlier, when I said that this functionality wasn't available in the Ruby standard library? That's not entirely true. In fact, at least two different Ruby standard libraries have internal text-wrapping algorithms in order to support their features.

For instance, if we require the rubygems/text library, and include the Gem::Text module, we get access to a format_text helper method. Given some text and a number of columns, it will wrap the text at word boundaries.

require "./text"
require "rubygems/text"

include Gem::Text

format_text(TEXT, 80)

As you can see, this code retains the inter-paragraph spaces.

Sadly, when we run this code on the text that already contains hard newlines, it is not able to re-format it.

require "./text"
require "rubygems/text"

include Gem::Text

format_text(HARD_NEWLINE_TEXT, 60)

The other problem with using internal Rubygems methods is that since they aren't intended to be public APIs, we're depending on something that might change or be removed without warning. So that's not a very good idea.

However, it does give us a nice place to start if we want to get an idea of how we might implement word-wrapping on our own.

def format_text(text, wrap, indent=0)
  result = []
  work = text.dup

  while work.length > wrap do
    if work =~ /^(.{0,#{wrap}})[ \n]/ then
      result << $1.rstrip
      work.slice!(0, $&.length)
    else
      result << work.slice!(0, wrap)
    end
  end

  result << work if work.length.nonzero?
  result.join("\n").gsub(/^/, " " * indent)
end

def min3 a, b, c # :nodoc:
  if a < b && a < c then
    a
  elsif b < c then
    b
  else
    c
  end
end

Before we roll our own word-wrapping code, though, let's see if we can use something off the shelf. There are various Rubygems which contain word-wrapping functionality along with other text-munging features. But I always like it when I can find a lightweight gem which does just what I need and doesn't add a lot of extra code to a project.

For a small, single-purpose library that handles word wrapping and nothing else, we can turn to the peculiarly-named "Lovely Rufus" gem by Piotr Szotkowski. After we install it, we can require it and use the LovelyRufus::TextWrapper.wrap method. This method takes a string and an optional width keyword, specifying what column to wrap at.

When we try it out on our unbroken text sample, we can see it does just fine.

require "./text"
require "lovely_rufus"

LovelyRufus::TextWrapper.wrap(TEXT, width: 80)

And when we try it again, this time telling it to re-wrap text that already contains hard newlines, it still has no trouble.

require "./text"
require "lovely_rufus"

LovelyRufus::TextWrapper.wrap(HARD_NEWLINE_TEXT, width: 60)

Lovely Rufus has some other nice features, such as avoiding one-letter words at the end of sentences, and correctly handling email quotes and code comments. But it's still a very small library, with just a few short code files and a only one other gem dependency.

The actual call to the wrap method is a bit long and unwieldy, it's true. But in application code we'd want to encapsulate our choice of word-wrapping code inside a helper method anyway, so that shouldn't be an issue.

def wrap(text, columns)
  LovelyRufus::TextWrapper.wrap(text, width: columns)
end

So now we know how to format text for a given terminal width. Of course, that still leaves us needing a way to discover just how wide the current terminal is. But we'll talk about that in an upcoming episode. Happy hacking!

Responses