In Progress
Unit 1, Lesson 1
In Progress

Tempfile

Video transcript & code

Let's say we're writing a self-published novel and distributing it as an ebook. We've got our book chapters in Markdown format:

# Chapter 1

It was a dark and stormy night...

And we've decided to use the excellent Pandoc tool to convert from Markdown into Epub format. We can execute Pandoc with an output filename and a markdown file and see that it creates an Epub.

pandoc -S -o book.epub ch1.md
ls -l *.epub
unzip -l book.epub

One thing this book is missing is some metadata. For instance, if we take a look at the book title listed in the Epub file, we can see it's blank.

unzip -p book.epub content.opf | grep dc:title

To fix this, we write a snippet of metadata XML and save it to a file called metadata.xml.

<dc:title>Fifty Shades of Ruby</dc:title>
<dc:description>Or: Monkeypatch me, baby!</dc:description>

Then we recreate the book, this time with our metadata attached.

pandoc -S --epub-metadata=metadata.xml -o book.epub ch1.md

Now we can see the title has been set inside the Epub:

unzip -p book.epub content.opf | grep dc:title

Our novel is a huge success. In fact, it's so successful that we decide to write another one. But because we are good lazy hackers, we decide to automate our workflow this time. And because we hate XML, we decide that we want to keep our metadata in a YAML file, like this:

---
:title: A Study in Ruby
:description: "Or: the Duck of the Baskervilles"

We begin to write a Ruby script to automate the publishing process. It first loads the metadata into some local variables. Then it creates an ERB template for the metadata XML, and expands the template using the local variable binding.

require 'yaml'
require 'erb'

metadata    = YAML.load_file('meta.yaml')
title       = metadata[:title]
description = metadata[:description]

template = ERB.new(<<EOF)
  <dc:title><%= title %></dc:title>
  <dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)

The question now is how to get this metadata XML from our script into the command-line invocation of Pandoc. To accomplish this, we turn to the Tempfile library. We call Tempfile.open with a file basename of 'epub_metadata'. We pass it a block. The block will receive as a parameter an open Tempfile object.

We write out the metadata XML to this tempfile, then close it. We then construct the argument list for Pandoc, interpolating in the path of the tempfile using its #path method. Then we execute the complete Pandoc command.

require 'yaml'
require 'erb'
require 'tempfile'

metadata    = YAML.load_file('meta.yaml')
title       = metadata[:title]
description = metadata[:description]

template = ERB.new(<<EOF)
  <dc:title><%= title %></dc:title>
  <dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)

Tempfile.open('epub_metadata') do |meta|
  meta.write(epub_meta)
  meta.close
  args = %W[-S --epub-metadata=#{meta.path} -o book.epub ch1.md]
  system('pandoc', *args)
end

This is all we need to do to get our expanded metadata into the final epub file.

Let's insert some debug output into this process, so as to understand what Tempfile is doing a little better.

require 'yaml'
require 'erb'
require 'tempfile'

metadata    = YAML.load_file('meta.yaml')
title       = metadata[:title]
description = metadata[:description]

template = ERB.new(<<EOF)
  <dc:title><%= title %></dc:title>
  <dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)

Tempfile.open('epub_metadata') do |meta|
  puts "Tempfile: #{meta.path}"
  meta.write(epub_meta)
  meta.close
  args = %W[-S --epub-metadata=#{meta.path} -o book.epub ch1.md]
  system('pandoc', *args)
end
Tempfile: /tmp/epub_metadata20121114-6385-uzqj24

When we run this, we can see that Tempfile is taking the basename we gave it, appending some extra strings to make it unique, and creating the file in the system /tmp dir. When we go to the shell, we can see that the file no longer exists: Ruby arranges for the file to be removed when the process ends.

petronius% ls -l /tmp/epub_metadata20121022-24501-ovmq8n
ls: cannot access /tmp/epub_metadata20121022-24501-ovmq8n: No such file or direc
tory

One day we decide to hook our script up to our web-based storefront. We no longer want a local .epub file; we just need the Epub data long enough to feed it into an HTTP PUT or POST request to the storefront API. Once again, we call on Tempfile.

We create a method called #publish_book, which will take a string containing the raw Epub data. Then we add a new Tempfile.open block to our program. This block assigns its Tempfile object to a variable called epub. We immediately close this file. We replace the hardcoded book.epub with the path of the epub Tempfile. Then, once the Pandoc command has completed, we re-open the file in read-mode using the #open method. We read the Epub data into a String, and then pass it to the #publish_book method.

require 'yaml'
require 'erb'
require 'tempfile'

def publish_book(book_data)
  puts "Publishing #{book_data.bytesize} bytes..."
  #...
end

metadata    = YAML.load_file('meta.yaml')
title       = metadata[:title]
description = metadata[:description]

template = ERB.new(<<EOF)
  <dc:title><%= title %></dc:title>
  <dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)

Tempfile.open('epub') do |epub|
  epub.close
  Tempfile.open('epub_metadata') do |meta|
    meta.write(epub_meta)
    meta.close
    args = %W[-S --epub-metadata=#{meta.path} -o #{epub.path} ch1.md]
    system('pandoc', *args)
  end
  epub.open
  epub_data = epub.read
  publish_book(epub_data)
end

In conclusion: the Tempfile library takes the pain out of working with temporary files. It takes care of locating the system temp directory; generating files with globally unique names, and ensuring the files aren't left lying around when the program finishes.

In addition, by making it easy to write to, read from, close, and re-open the temp files it creates, Tempfile is a great way to communicate with command-line utilities which expect to work with files are their inputs and outputs.

OK, that's plenty for today. Happy hacking!

Responses