In Progress
Unit 1, Lesson 1
In Progress

Gradual Stiffening

Video transcript & code

RFC822 defines a textual message format that forms the basis for many internet protocols, including email and HTTP. You've seen it if you've ever looked at a raw email message: it consists of a header section where keys are separated from values by a colon. Then there is a double CR/LF, and the message body commences.

Let's say we have some RFC822-format files, and we want to convert the metadata in their headers to JSON format.

filename: confident-ruby.monolith

number: 1

name: confident-ruby-001

role: source



puts "Hello, world"    

We decide to use Ruby, but since this is just a one-off conversion of a few files, we decide to use Ruby straight from the command line.

ruby -n -a -rjson \
     -e 'BEGIN { $/="\r\n"; $;=/:\s*/; headers={} }' \
     -e 'break if $F.size < 2' \
     -e 'headers[$F[0]] = $F[1].chomp' \
     -e 'END { puts JSON.pretty_generate(headers) }' \
     < data.txt

We pass Ruby the -n flag, telling it to loop over the lines of input data, and the -a flag, telling it to automatically split each line into fields in a special array. We also tell it to require the json library.

Then we start writing code. The first line is a BEGIN block, which will be executed just once before the implicit loop starts. In it we set customize some language defaults. We set the input record separator to be a carriage return followed by a linefeed, instead of just a linefeed. We set the input field separator to a colon followed by whitespace. This will split up headers into key and value fields. Then we initialize a variable called headers to be an empty Hash.

Next, we begin writing the body of the implicit loop. This code will be executed once for each line of input. If the size of the special $F array, which contains the automatically split-up fields from the current line, is less than two, it means we have probably run out of headers to process, and so we break out of the loop.

Otherwise, we continue. We take the first field in the fields array, which contains the header name, and make it a key in the headers variable. We take the second field, which contains the header value, and assign it as the value of that key, after first removing the trailing CRLF with #chomp.

Finally, tell Ruby what to do after the loop is finished, with an END block. In the block, we convert the collected headers hash into JSON and pretty-print it to standard out.

When we pipe the contents of one of our RFC822 files into this little program, we get nicely-formatted JSON out the other end.

If you've ever worked with Perl or Awk, some of this code may look familiar. If you haven't, it probably looks like the terminal just puked. This kind of write-only code is the definition of "quick and dirty". But it gets the job done.

In fact, it gets the job done so well that we realize we'd like to re-use it in the future. So we proceed to adapt it into a Ruby script that we can save to a file.

require 'json'
$/="\r\n"; $;=/:\s*/; headers={}
IO.foreach(ARGV[0]) do
  fields = $_.split
  break if fields.size 
  headers[fields[0]] = fields[1].chomp
end
puts JSON.pretty_generate(headers)

We only have to change a few things to make this a valid script. We convert the -r command-line option to a require statement. There's no more -n to supply an implicit loop, so we have to spell it out explicitly using IO.foreach. And there's no more auto-splitting of fields either, so we have to manually split out the current line into a fields variable. Note that we use the Perl-style $_ "default input" variable, which is automatically set by foreach.

This works well, and we proceed to make use of our new script.

One day we come back to the script, and spend a couple of minutes trying to figure out where that $_ variable came from before we remember. We decide to clarify the code by using an explicit block argument from IO.foreach instead of the implicit variable.

IO.foreach('rfc822demo') do |line|
  fields = line.split
  # ...
end

Sometime later we revisit the script again, and decide that those unintelligible variable assignments at the top of the file simply have to go. So we require the 'English' library, and replace them with more readable aliases. $/ becomes $INPUT_RECORD_SEPARATOR, and $; becomes $FIELD_SEPARATOR.

require 'json'
require 'English'
$INPUT_RECORD_SEPARATOR = "\r\n"
$FIELD_SEPARATOR        = /:\s*/
headers                 = {}
#  ...

Again we move on to other things for a while. When we next return to the script, we're considering incorporating the script into a larger programming. That means making it a better citizen by not changing global interpreter defaults. We remove the use of $INPUT_RECORD_SEPARATOR and $FIELD_SEPARATOR. In their place, we explicitly supply record separator and field separator strings to the IO.foreach, String#split, and String#chomp methods.

# ...
record_sep = "\r\n"
IO.foreach('rfc822demo', record_sep) do |line|
  fields = line.split(/:\s*/)
  break if fields.size 
  headers[fields[0]] = fields[1].chomp(record_sep)
end
# ...

Finally, we begin to evolve our script into a form more suitable to be part of a library. We convert the script to a method called rfc822_to_json, which takes the input data as a parameter, and returns the formatted JSON.

Let's look back at what we've done. We went from an expedient line-noise command-line script, to a reasonably clean, robust and readable method. We did this in small, incremental steps. At no point were there any big jumps in complexity or ceremony.

One of the things I like about Ruby is that, more often than not, it's possible to evolve code in this gradual fashion. I asked a few of my programming heroes if there was a name for this quality. Kent Beck replied by calling it "Gradual Stiffening", a term he took from the book The Timeless Way of Building, by Christopher Alexander.

I've begun deliberately looking for this quality of Gradual Stiffening in tools, libraries, and program designs. I look to see not only if there is an "quick 'n dirty" mode and a "built to last" mode, but if there is a clear, incremental transition from one to the other. For any given state of a system, I've begun to ask: can I see what the next tiny step towards greater generality is? Or is the only apparent next step a complete transformation?

Hopefully I've given you some food for thought in this episode. If nothing else, now you know how to write Perl in Ruby! Happy hacking!

Responses