In Progress
Unit 1, Lesson 1
In Progress

Chomp

Ruby can be your multitool for all kinds of text-munging tasks. But to use it effectively, you need to understand how Ruby handles line endings and record separators.

Video transcript & code

Chomp

So Ruby has this concept of an input record separator, which I've talked about in other videos.

require "stringio"
input = StringIO.new("Line the First\nLine the Second\nLine the third\n")
input.gets  # => "Line the First\n"
input.readlines
# => ["Line the Second\n", "Line the third\n"]

The short version is that Ruby's various methods for reading in lines of text are really about reading in records.

We can redefine what string delineates the end of a record at the individual method call level,

using the shorthand mnemonic global variable $/,

require "stringio"
input = StringIO.new("/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin")
input.gets(":")  # => "/usr/local/bin:"
$/ = ":"
input.readlines
# => ["/usr/sbin:", "/usr/bin:", "/sbin:", "/bin"]

or using the spelled-out English alias for the global variable.

require "stringio"
input = StringIO.new("/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin")
require "English"
$INPUT_RECORD_SEPARATOR = ":"
input.each_line do |line|
  puts line
end
# >> /usr/local/bin:
# >> /usr/sbin:
# >> /usr/bin:
# >> /sbin:
# >> /bin

One thing you might notice is that regardless of how the record separator is defined, these methods always include the separator at the end.

But oftentimes when we're reading in lines or records from a file, we just want the record, not the terminator. So today I want to talk about some methods for stripping off these separator characters.

We're going to start by talking about a method called .chop. Now, spoiler alert, in a minute I'm going to tell you not to use this method! But in order to understand why, it's easiest to show you how it works.

If we have a string with a linefeed on the end, chop will return a copy with the linefeed stripped off. Or "chopped" off, if you will.

"fnord\n".chop  # => "fnord"

There's also a bang version which modifies the string in-place.

str = "fnord\n"
str.chop!
str # => "fnord"

Chop will also strip off a Windows-style CRLF.

"fnord\r\n".chop  # => "fnord"

As well as a lone carriage return.

"fnord\r".chop  # => "fnord"

It doesn't just remove these two characters in any order, however. If for some strange reason we have an LFCR line ending, chop just takes off the trailing carriage return and leaves the linefeed.

"fnord\n\r".chop  # => "fnord\n"

OK, but that's not a case we're very likely to run into. Here's a bigger problem: If we give it a string with no separator at the end, it takes off the the last character, no matter what it is!

"let's jump in the pool".chop  # => "let's jump in the poo"

And let's say we're working with some input with a special record separator, like the dashed line that separates YAML documents .

And we've set the input record separator variable appropriately.

Reading in records works correctly gets a single record, plus its terminator characters.

But chop ignores the input record separator setting!

require "stringio"
input = StringIO.new(<<EOF)
record one
---
record two
---
EOF
$/ = "\n---\n"
str = input.gets  # => "record one\n---\n"
str.chop  # => "record one\n---"

And in case you're wondering, no, chop doesn't take a separator argument.

str.chop("\n----\n") # ~> ArgumentError: wrong number of arguments (given 1, expected 0)

OK, now let's take a look at chop's more capable sibling.

chomp, like chop, will strip off either a linefeed, a carriage return, or a CRLF sequence.

"ftaghn\n".chomp  # => "ftaghn"
"ftaghn\r".chomp  # => "ftaghn"
"ftaghn\r\n".chomp  # => "ftaghn"

And like chop, it has a mutating bang version.

str = "ftaghn\n"
str.chomp!
str # => "ftaghn"

Also like chop, it won't eat a whole LFCR sequence by default.

"ftaghn\n\r".chomp  # => "ftaghn\n"

But unlike chop, we can tell chomp to look for a custom separator!

"ftaghn\n\r".chomp("\n\r")  # => "ftaghn"

What happens when we chomp a string that has no terminator at the end? Unlike chop, chomp is smart and leaves the string alone.

"let's jump in the pool".chomp  # => "let's jump in the pool"

And how about when we have a custom input record separator defined globally?

If you guess that .chomp would respect this configuration, you guessed right. .chomp is in the category of Ruby input methods that are record-oriented.

require "stringio"
input = StringIO.new(<<EOF)
record one
---
record two
---
EOF
$/ = "\n---\n"
str = input.gets  # => "record one\n---\n"
str.chomp  # => "record one"

So between chop and chomp, you should always use chomp. In fact, the only reason I even demonstrated chop at all is because if you go looking for documentation on how to strip off line endings in Ruby, you're going to run across both of these methods, and wonder which one you should use. So I wanted to show you what the difference was.

By the way, a historical note: both of these methods are inherited from the Perl programming language. As I recall, back in the Perl era, it was sometimes expedient to use the more primitive chop method to eke out a tiny bit of extra speed in cases when you knew for certain that every incoming string would end in a newline. These days, with modern hardware, I suspect you're not going to find a case where this optimization makes a significant difference.

str.chop # old and busted
str.chomp # new hotness

So anyway, chomp is how we can manually strip terminator sequences from incoming strings. But we don't actually need to explicitly call chomp to benefit from it.

Here's a selection of record-reading methods. We've got gets, readline, readlines, and each_line, as well as the class-level readlines method.


When we execute, we can see that each one of these methods respects the input record separator, but also includes it in the returned or yielded strings.

But all of these methods can also take a chomp keyword argument. As we set it to true for each of these methods, we can see that their output changes to exclude the terminating linefeed on each returned string.

open("jabberwocky.txt").gets(chomp: true)  # => "Twas brillig, and the slithy toves"
open("jabberwocky.txt").readline(chomp: true)  # => "Twas brillig, and the slithy toves"
open("jabberwocky.txt").readlines(chomp: true).take(2)
# => ["Twas brillig, and the slithy toves",
#     "      Did gyre and gimble in the wabe:"]
lines = []
open("jabberwocky.txt").each_line(chomp: true) do |line|
  lines << line
end
lines.take(2)
# => ["Twas brillig, and the slithy toves",
#     "      Did gyre and gimble in the wabe:"]
IO.readlines("jabberwocky.txt", chomp: true).drop(5).take(2)
# => ["He took his vorpal sword in hand:",
#     "  Long time the manxome foe he sought --"]

There's one more place in Ruby that you'll find the chomp method.

Let's say that for some reason we have a file full of executable search paths in YAML list form

$ cat path.yaml
- /usr/local/bundle/bin
- /usr/local/sbin
- /usr/local/bin
- /usr/sbin
- /usr/bin
- /sbin
- /bin

And we want to combine them together into a UNIX-style colon-delimited search PATH .

This is something we can do with a Ruby one-liner. We can use the -p flag to loop over each line of input and then print it. We can set the output record separator to a colon. And we can use a call to the global sub method to get rid of the leading dash and space before each directory.

$ ruby -pe 'BEGIN{$\=":"}; sub /^- /, ""' path.yaml 
/usr/local/bundle/bin
:/usr/local/sbin
:/usr/local/bin
:/usr/sbin
:/usr/bin
:/sbin
:/bin:

The output we see from this isn't quite what we want. Yes, we successfully output colons between entries, and yes we managed to strip off the YAML list syntax. But we've still got newlines after each directory. And that's because, as we saw earlier, just because Ruby reads one record per delimiter doesn't mean it removes that delimiter from the resulting string.

So, one way we could fix this is by calling chomp! on the current record before we do anything else.

$ ruby -pe 'BEGIN{$\=":"}; $_.chomp!; sub /^- /, ""' path.yaml 
/usr/local/bundle/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:

But Ruby provides a shortcut for this very specific case: just as there is a Kernel-global sub method that implicitly acts on the current record, there's also a global chomp method that acts on the current record.

$ ruby -pe 'BEGIN{$\=":"}; chomp; sub /^- /, ""' path.yaml 
/usr/local/bundle/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:

Now in some cases, we don't even need to explicitly invoke this chomp method.

In a couple other videos about one-liners, I've mentioned the -l flag that puts Ruby in line-ending processing mode.

That mode tweaks a few different settings. But the one thing we care about right now is that it enables an auto-chomp mode for the input records.

$ ruby -lpe 'BEGIN{$\=":"}; sub /^- /, ""' path.yaml 
/usr/local/bundle/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:

Which means in this case that it gives us the same behavior as our explicit call to chomp .

So that's your introduction to record separator removal in Ruby. The three important things I hope you take away from this are:

  1. Just because Ruby stops reading input at a record terminator, doesn't mean it drops that delimiter from the string it returns.
  2. There are two very similar methods for removing record separators from the ends of strings: chop and chomp. Never use chop, always use chomp.
  3. If you're writing a one-liner in the shell and line endings are giving you trouble, try adding a -l.

That's it for now. Happy hacking!

Responses