Tail Part 2: Do While
Video transcript & code
We've been working on replicating a subset of the UNIX tail(1) utility in Ruby. In the last episode we found out how to seek backwards from the end of a file.
file = open('/var/log/syslog') file.seek(-512, IO::SEEK_END) chunk = file.read chunk.chars.count("\n")
We plan on using this ability to read successively earlier chunks from the end of a file until we find ten lines' worth of text. Let's see if we can make that happen.
First, we'll set up a variable to track how many newlines we've found so far. We assume that the file ends with a newline, so we'll actually need to find the 11th-to-last newline in the file in order to locate the beginning of the 10th-to-last line.
As before, we'll use a chunk size of 512 bytes. If you're wondering about the significance of this number: first, it's a power of two; second, it's small enough that we're unlikely to find 10 newlines in the first chunk, which will force us to figure out the logic needed to keep working backwards until we have enough text.
We initialize a variable called
next_chunk_offset to track how far back in the file to read the next chunk from. Next we open the target file, and seek backwards 512 bytes. We save the absolute offset of the file after the seek, and then read in a chunk of text. Remember, this will have the side effect of setting the file's offset back to the end of the file. Now that we have a text chunk, we can update the
newline_count by counting the newlines in it.
Now we need to decide whether to fetch more chunks. What are the criteria for making this decision? First, if we happened to read from a zero-length file, the read would have returned
nil instead of a string. That's also why we explicitly converted the chunk to a string before counting newlines. So we only want to continue if the chunk is non-nil.
Second, if we've reached the beginning of the file then we need to stop.
Third, once we've found 11 newlines our job is done, so we need to check for that as well.
Within the loop, we decrement the
next_chunk_offset by 512 bytes. Then we go through a very familiar series of steps: seek backwards, save the start position, read a chunk, and add to the
newline_count = 0 chunk_size = 512 next_chunk_offset = -chunk_size file = open('/var/log/syslog') file.seek(next_chunk_offset, IO::SEEK_END) chunk_start_offset = file.tell chunk = file.read(chunk_size) newline_count += chunk.to_s.chars.count("\n") while chunk && chunk_start_offset > 0 && newline_count <= 10 next_chunk_offset -= chunk_size file.seek(next_chunk_offset, IO::SEEK_END) chunk_start_offset = file.tell chunk = file.read(chunk_size) newline_count += chunk.to_s.chars.count("\n") end newline_count # => 12
That gives us what we need to loop backwards until 10 lines' worth of text are found. We can see this when we look at the ending
newline_count. But this code is clearly less than optimal. Let's see if we can do away with some of the duplication here.
The awkwardness of this code is brought about by the fact that we are effectively executing one iteration of the loop before we the first test of the loop condition. There is a name for this: a do-while loop. Many languages have a special do-while syntax for just this situation. The loop starts with a
do, and ends with a
But what about in Ruby?
do...while is not valid Ruby syntax, but does it have an equivalent?
Let's first look at a Ruby syntax you may have used already. You might know that the
while keyword isn't just used at the beginning of a loop block; it can also act as a statement modifier. Here's some code where a simple predicate method decrements a number every time it is called, then checks if it has reached zero. When we use this as a statement modifier, we can see that first the loop condition is checked, and then the loop body, on the left side of the
while modifier, is executed.
@n = 3 def done? puts "Checking loop condition" @n -= 1 @n == 0 end puts "not done yet." while !done? # >> Checking loop condition # >> not done yet. # >> Checking loop condition # >> not done yet. # >> Checking loop condition
Something interesting happens if we surround the loop body with a
begin...end block. If you're not familiar with
begin...end, it's a way to group several lines together into a single expression, allowing us to apply things like statement modifiers to the whole block as a unit. When we put a
while modifier after a
begin...end block, the order of the output reverses. We can see that now the block body is executed before the first time the loop condition is checked.
begin puts "not done yet." end while !done? # >> not done yet. # >> Checking loop condition # >> not done yet. # >> Checking loop condition # >> not done yet. # >> Checking loop condition
This special rule for
begin...end block enables us to write the equivalent of a
do...while loop. Let's rewrite our
tail code to use this construct.
We move the
while statement to the end of the block, and start the block with a
begin instead. Then we get rid of the initial seek, the saving of the chunk start offset, the chunk read, and the updating of the newline count. We move the remaining
next_chunk_offset calculation to the bottom of the loop. And that's it: we've eliminated the duplication in this code by transforming the loop into a
newline_count = 0 chunk_size = 512 next_chunk_offset = -chunk_size file = open('/var/log/syslog') begin file.seek(next_chunk_offset, IO::SEEK_END) chunk_start_offset = file.tell chunk = file.read(chunk_size) newline_count += chunk.to_s.chars.count("\n") next_chunk_offset -= chunk_size end while chunk && chunk_start_offset > 0 && newline_count <= 10
We're now one step closer to implementing
tail in Ruby. Next time we revisit this problem, we'll start searching for the beginnings of lines instead of just counting newline characters. Until then: happy hacking!