In Progress
Unit 1, Lesson 21
In Progress

Tail Part 1: Random Access

Video transcript & code

There's a good chance you've used the UNIX tail command at some point. On the off chance that you haven't here's a very quick demo. Execute the tail command with a filename as its argument, and tail will output the last ten lines in the file. This is helpful for seeing the most recent entries in a log file.

tail /var/log/syslog

I thought it might be fun to spend a few episodes replicating the basic tail functionality in Ruby. It turns out to be a trickier problem to implement tail efficiently than it might appear at first glance, and there are several useful concepts to be learned along the way.

First, let's set some ground rules. The most naive implementation might simply read the entire file into memory and then search backwards until it finds ten lines to show. This approach would consume an amount of RAM proportional to the size of the file, making it prohibitively inefficient for large files.

Another, similar implementation might iterate through the lines of the file from beginning to end, buffering the last ten lines as it went, and then dump those lines when it hit the end. While this version would have better memory characteristics, it would consume greater and greater I/O and CPU resources the larger the input file.

We're not going to write either of those versions. Part of the utility of tail is that it is very efficient. Like the real program, our imitation will attempt to do as little work as possible to find the last ten lines of the file.

This means that we can't simply open up the file and start reading from the beginning. Instead, we will harness the power of random file access.

Let's familiarize ourselves with Ruby's methods for random file access. I've created a very small example file that we'll read from. Here is the file in its entirety.

print File.read('small.txt')
Hello, RubyTapas!

I've also prepared a small method to visualize the current state of a File IO object as we interact with it. Don't worry too much about the implementation of this method for now.

def vis(file)
  output = ""
  offset = file.tell
  file.rewind
  content = file.read.chars
  indices = content.map.with_index{|_,i| i}
  [indices, content].each do |collection|
    collection.each_with_index do |element, i|
      print "%3s" % element.to_s.sub(/\n/, '\n') 
    end
    print "\n"
  end
  print "   " * offset
  print " ^^\n"
  file.seek(offset, IO::SEEK_SET)
  print output
end

Every open file has a file offset associated with it. Here we can see the content of the open example file, with the offsets of each byte shown. The carets show the current file offset, which right now is at zero because we just opened the file.

file = open('small.txt')
  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
  H  e  l  l  o  ,     R  u  b  y  T  a  p  a  s  ! \n
 ^^

Reading from the file advances the offset. Let's read seven bytes and then look at the state of the file.

file.read(7)
  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
  H  e  l  l  o  ,     R  u  b  y  T  a  p  a  s  ! \n
                      ^^

We can ask the file what its current offset is with the #tell method.

file.tell
7

We can directly adjust the file offset ourselves with the #seek method. Lets set the offset to byte 11:

file.seek(11)
  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
  H  e  l  l  o  ,     R  u  b  y  T  a  p  a  s  ! \n
                                  ^^

By default, #seek adjusts the offset relative to the beginning of the file. We can also tell it to move relative to the current position by passing a special constant as the second argument. Let's move 4 bytes backwards:

file.seek(-4, IO::SEEK_CUR)
  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
  H  e  l  l  o  ,     R  u  b  y  T  a  p  a  s  ! \n
                      ^^

Another option is to position the offset relative to the end of the file. An argument of zero relative to the end of the file puts the offset at the very end, as if we had finished reading the file.

file.seek(0, IO::SEEK_END)
file.tell
18
file.eof?
true

Negative arguments relative to the end of the file allow us to seek backwards a given number of bytes from the end of the file.

file.seek(-11, IO::SEEK_END)
  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
  H  e  l  l  o  ,     R  u  b  y  T  a  p  a  s  ! \n
                      ^^

We now have the tools we need to get started on our tail knock-off. While we can't read a file backwards from the end, we can seek backwards from the end, a chunk at a time. So for instance, we could seek 512 bytes backwards from the end of the system log file, and then count the newlines found in that chunk.

file = open('/var/log/syslog')
file.seek(-512, IO::SEEK_END)
chunk = file.read
chunk.chars.count("\n")
5

Looks like we netted 5 newlines in that chunk. If we kept moving backward a chunk at a time, we'd eventually find our 10 lines worth of output. But we'll leave that for a future episode. Happy hacking!

Responses