In Progress
Unit 1, Lesson 1
In Progress

Tail Part 7: Cooperating Objects

Video transcript & code

We've spent a lot of time duplicating a little bit of the UNIX tail(1) utility. It's about time we took a break from this series and covered some other topics. But before we do that, I'd like to address some lingering ugliness in the code.

There are two main processes collaborating in this program: There's the process of reading backwards through a file, one chunk at a time. In the last episode we encapsulated that logic in a class called BackwardChunkedFileRead. This has simplified our main code considerably.

The other process is the one that takes these successively earlier chunks of file content and searches them backwards for newlines. This process is currently split between a loop in the main code, and a method called each_reverse_newline_index.

Today we'll see if we can encapsulate that second process, similar to how we encapsulated the backward-chunked file read. But this time we won't bother writing a whole new class for it. Instead, we'll use an Enumerator.

class BackwardChunkedFileRead
  attr_reader :file, :chunk_size, :next_chunk_offset

  def initialize(file, chunk_size=512)
    @file               = file
    @chunk_size         = chunk_size
    @next_chunk_offset  = -@chunk_size
    @chunk_start_offset = nil
  end

  def each_chunk
    while chunk = read_chunk
      yield(chunk)
    end
    ""
  end

  def read_chunk
    return nil if @chunk_start_offset == 0
    file.seek(next_chunk_offset, IO::SEEK_END)
    @chunk_start_offset = file.tell
    chunk = file.read(chunk_size)
    @next_chunk_offset -= chunk_size
    chunk
  end
end

def each_reverse_newline_index(chunk)
  while(nl_index = chunk.rindex("\n", (nl_index || chunk.size) - 1))
    yield(nl_index)
  end
  nl_index
end

newline_count     = 0
file = open('/var/log/syslog.1')
start_text = BackwardChunkedFileRead.new(file).each_chunk do |chunk|
  nl_index = each_reverse_newline_index(chunk) do |index|
    newline_count += 1
    break index if newline_count > 10
  end
  break chunk[(nl_index+1)..-1] if newline_count > 10
end
print(start_text)
IO.copy_stream(file, $stdout)

We start by renaming the method to each_reverse_newline_index_with_chunk. We update the method signature to take a chunk_source parameter, instead of a single chunk. Then we add a new outer loop around the existing contents of the method, which simply iterates through the chunks made available by the chnk_source.

We change the yield to yield both the newline index and the chunk it was found in, as the new method name indicates. We also get rid of the return value. We won't be relying on that anymore.

We now have a method that, given a source of text chunks, will iterate through all of them and yield every time a newline is found. Let's see how we can use this to simplify the main code.

First, we get rid of the newline_count variable. We'll still instantiate a BackwardChunkedFileRead, but this time we assign it to a variable instead of using it immediately.

Next we assign a variable named newlines by calling to_enum. If you've watched episodes 59 and 60, then you'll remember this method. If you haven't seen those episodes, and you aren't familiar with Enumerator, you might want to pause here and go back and review them before continuing.

In a nutshell, to_enum takes a yielding method and turns it "inside-out", into an externally iterable object. In this case, the method we are converting to an Enumerator is each_reverse_newline_index_with_chunk. We also pass an extra argument, the read object. This will be the chunk_source parameter that each_reverse_newline_index_with_chunk expects.

In creating the BackwardChunkedFileRead we took a looping process and explicitly converted that process into a stateful object that proceeds forward each time read_chunk is called. This time, we are using to_enum to implicitly turn our method into an iterable object. The outcome, though, is similar: we end up with an object which represents a process, remembers it's current place in the process, and which can be told to proceed forwards as needed.

Now for the line all of this has been leading up to: we want to find the 11th-to-last newline in order to find start of the the 10th-to-last line. So we simply take our newlines Enumerator and tell it to proceed forwards 10 times. Then we tell it to proceed one more time, but this time we capture the result: a newline index and the chunk in which it was found.

From here on out everything is familiar to anyone who has followed this miniseries from the beginning. We print out the current chunk, from the beginning of the located line forwards. Then we print the remainder of the file using IO.copy_stream.

class BackwardChunkedFileRead
  attr_reader :file, :chunk_size, :next_chunk_offset

  def initialize(file, chunk_size=512)
    @file               = file
    @chunk_size         = chunk_size
    @next_chunk_offset  = -@chunk_size
    @chunk_start_offset = nil
  end

  def each_chunk
    while chunk = read_chunk
      yield(chunk)
    end
    ""
  end

  def read_chunk
    return nil if @chunk_start_offset == 0
    file.seek(next_chunk_offset, IO::SEEK_END)
    @chunk_start_offset = file.tell
    chunk = file.read(chunk_size)
    @next_chunk_offset -= chunk_size
    chunk
  end
end

def each_reverse_newline_index_with_chunk(chunk_source)
  chunk_source.each_chunk do |chunk|
    while(nl_index = chunk.rindex("\n", (nl_index || chunk.size) - 1))
      yield(nl_index, chunk)
      break if nl_index == 0
    end
  end
end

file     = open('/var/log/syslog.1')
read     = BackwardChunkedFileRead.new(file)
newlines = to_enum(:each_reverse_newline_index_with_chunk, read)
10.times do newlines.next end
nl_index, chunk = newlines.next
start_text      = chunk[(nl_index+1)..-1]
print(start_text)
IO.copy_stream(file, $stdout)

At this point, I feel like we've arrived at code that tells a pretty coherent story. First we open a file. Then we start a backward reading operation on that file, and a collaborating operation to search backwards for newlines. We tell the combined process to find and discard the first 10 newlines. Then we grab the location of the 11th newline and print everything after it.

There is more we could do here. There are some edge cases we could address. And we could add other tail(1) features like the ability to pass in the number of lines to dump, or to follow a file "live" as another process appends to it. I hope to tackle both of these in future episodes.

But I think this is a good place to leave this code for now. Happy hacking!

[su_icon_panel background="#eeeeee" shadow="0px 1px 2px #eeeeee" text_align="center" icon="icon: thumb-tack" url="https://www.rubytapas.com/java-dregs-double-brace-initialization/"] "Java Dregs" for April 1st, 2013 has been moved[/su_icon_panel]

Responses