Tail Part 7: Cooperating Objects
Video transcript & code
We've spent a lot of time duplicating a little bit of the UNIX
tail(1) utility. It's about time we took a break from this series and covered some other topics. But before we do that, I'd like to address some lingering ugliness in the code.
There are two main processes collaborating in this program: There's the process of reading backwards through a file, one chunk at a time. In the last episode we encapsulated that logic in a class called
BackwardChunkedFileRead. This has simplified our main code considerably.
The other process is the one that takes these successively earlier chunks of file content and searches them backwards for newlines. This process is currently split between a loop in the main code, and a method called
Today we'll see if we can encapsulate that second process, similar to how we encapsulated the backward-chunked file read. But this time we won't bother writing a whole new class for it. Instead, we'll use an
class BackwardChunkedFileRead attr_reader :file, :chunk_size, :next_chunk_offset def initialize(file, chunk_size=512) @file = file @chunk_size = chunk_size @next_chunk_offset = -@chunk_size @chunk_start_offset = nil end def each_chunk while chunk = read_chunk yield(chunk) end "" end def read_chunk return nil if @chunk_start_offset == 0 file.seek(next_chunk_offset, IO::SEEK_END) @chunk_start_offset = file.tell chunk = file.read(chunk_size) @next_chunk_offset -= chunk_size chunk end end def each_reverse_newline_index(chunk) while(nl_index = chunk.rindex("\n", (nl_index || chunk.size) - 1)) yield(nl_index) end nl_index end newline_count = 0 file = open('/var/log/syslog.1') start_text = BackwardChunkedFileRead.new(file).each_chunk do |chunk| nl_index = each_reverse_newline_index(chunk) do |index| newline_count += 1 break index if newline_count > 10 end break chunk[(nl_index+1)..-1] if newline_count > 10 end print(start_text) IO.copy_stream(file, $stdout)
We start by renaming the method to
each_reverse_newline_index_with_chunk. We update the method signature to take a
chunk_source parameter, instead of a single
chunk. Then we add a new outer loop around the existing contents of the method, which simply iterates through the chunks made available by the
We change the yield to yield both the newline index and the
chunk it was found in, as the new method name indicates. We also get rid of the return value. We won't be relying on that anymore.
We now have a method that, given a source of text chunks, will iterate through all of them and yield every time a newline is found. Let's see how we can use this to simplify the main code.
First, we get rid of the
newline_count variable. We'll still instantiate a
BackwardChunkedFileRead, but this time we assign it to a variable instead of using it immediately.
Next we assign a variable named
newlines by calling
to_enum. If you've watched episodes 59 and 60, then you'll remember this method. If you haven't seen those episodes, and you aren't familiar with
Enumerator, you might want to pause here and go back and review them before continuing.
In a nutshell,
to_enum takes a yielding method and turns it "inside-out", into an externally iterable object. In this case, the method we are converting to an
each_reverse_newline_index_with_chunk. We also pass an extra argument, the
read object. This will be the
chunk_source parameter that
In creating the
BackwardChunkedFileRead we took a looping process and explicitly converted that process into a stateful object that proceeds forward each time
read_chunk is called. This time, we are using
to_enum to implicitly turn our method into an iterable object. The outcome, though, is similar: we end up with an object which represents a process, remembers it's current place in the process, and which can be told to proceed forwards as needed.
Now for the line all of this has been leading up to: we want to find the 11th-to-last newline in order to find start of the the 10th-to-last line. So we simply take our
Enumerator and tell it to proceed forwards 10 times. Then we tell it to proceed one more time, but this time we capture the result: a newline index and the chunk in which it was found.
From here on out everything is familiar to anyone who has followed this miniseries from the beginning. We print out the current chunk, from the beginning of the located line forwards. Then we print the remainder of the file using
class BackwardChunkedFileRead attr_reader :file, :chunk_size, :next_chunk_offset def initialize(file, chunk_size=512) @file = file @chunk_size = chunk_size @next_chunk_offset = -@chunk_size @chunk_start_offset = nil end def each_chunk while chunk = read_chunk yield(chunk) end "" end def read_chunk return nil if @chunk_start_offset == 0 file.seek(next_chunk_offset, IO::SEEK_END) @chunk_start_offset = file.tell chunk = file.read(chunk_size) @next_chunk_offset -= chunk_size chunk end end def each_reverse_newline_index_with_chunk(chunk_source) chunk_source.each_chunk do |chunk| while(nl_index = chunk.rindex("\n", (nl_index || chunk.size) - 1)) yield(nl_index, chunk) break if nl_index == 0 end end end file = open('/var/log/syslog.1') read = BackwardChunkedFileRead.new(file) newlines = to_enum(:each_reverse_newline_index_with_chunk, read) 10.times do newlines.next end nl_index, chunk = newlines.next start_text = chunk[(nl_index+1)..-1] print(start_text) IO.copy_stream(file, $stdout)
At this point, I feel like we've arrived at code that tells a pretty coherent story. First we open a file. Then we start a backward reading operation on that file, and a collaborating operation to search backwards for newlines. We tell the combined process to find and discard the first 10 newlines. Then we grab the location of the 11th newline and print everything after it.
There is more we could do here. There are some edge cases we could address. And we could add other
tail(1) features like the ability to pass in the number of lines to dump, or to follow a file "live" as another process appends to it. I hope to tackle both of these in future episodes.
But I think this is a good place to leave this code for now. Happy hacking!
[su_icon_panel background="#eeeeee" shadow="0px 1px 2px #eeeeee" text_align="center" icon="icon: thumb-tack" url="https://www.rubytapas.com/java-dregs-double-brace-initialization/"] "Java Dregs" for April 1st, 2013 has been moved[/su_icon_panel]