Tempfile
Video transcript & code
Let's say we're writing a self-published novel and distributing it as an ebook. We've got our book chapters in Markdown format:
# Chapter 1
It was a dark and stormy night...
And we've decided to use the excellent Pandoc tool to convert from Markdown into Epub format. We can execute Pandoc with an output filename and a markdown file and see that it creates an Epub.
pandoc -S -o book.epub ch1.md
ls -l *.epub
unzip -l book.epub
One thing this book is missing is some metadata. For instance, if we take a look at the book title listed in the Epub file, we can see it's blank.
unzip -p book.epub content.opf | grep dc:title
To fix this, we write a snippet of metadata XML and save it to a file called metadata.xml.
<dc:title>Fifty Shades of Ruby</dc:title>
<dc:description>Or: Monkeypatch me, baby!</dc:description>
Then we recreate the book, this time with our metadata attached.
pandoc -S --epub-metadata=metadata.xml -o book.epub ch1.md
Now we can see the title has been set inside the Epub:
unzip -p book.epub content.opf | grep dc:title
Our novel is a huge success. In fact, it's so successful that we decide to write another one. But because we are good lazy hackers, we decide to automate our workflow this time. And because we hate XML, we decide that we want to keep our metadata in a YAML file, like this:
---
:title: A Study in Ruby
:description: "Or: the Duck of the Baskervilles"
We begin to write a Ruby script to automate the publishing process. It first loads the metadata into some local variables. Then it creates an ERB template for the metadata XML, and expands the template using the local variable binding.
require 'yaml'
require 'erb'
metadata = YAML.load_file('meta.yaml')
title = metadata[:title]
description = metadata[:description]
template = ERB.new(<<EOF)
<dc:title><%= title %></dc:title>
<dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)
The question now is how to get this metadata XML from our script into the command-line invocation of Pandoc. To accomplish this, we turn to the Tempfile
library. We call Tempfile.open
with a file basename of 'epub_metadata
'. We pass it a block. The block will receive as a parameter an open Tempfile
object.
We write out the metadata XML to this tempfile, then close it. We then construct the argument list for Pandoc, interpolating in the path of the tempfile using its #path
method. Then we execute the complete Pandoc command.
require 'yaml'
require 'erb'
require 'tempfile'
metadata = YAML.load_file('meta.yaml')
title = metadata[:title]
description = metadata[:description]
template = ERB.new(<<EOF)
<dc:title><%= title %></dc:title>
<dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)
Tempfile.open('epub_metadata') do |meta|
meta.write(epub_meta)
meta.close
args = %W[-S --epub-metadata=#{meta.path} -o book.epub ch1.md]
system('pandoc', *args)
end
This is all we need to do to get our expanded metadata into the final epub file.
Let's insert some debug output into this process, so as to understand what Tempfile
is doing a little better.
require 'yaml'
require 'erb'
require 'tempfile'
metadata = YAML.load_file('meta.yaml')
title = metadata[:title]
description = metadata[:description]
template = ERB.new(<<EOF)
<dc:title><%= title %></dc:title>
<dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)
Tempfile.open('epub_metadata') do |meta|
puts "Tempfile: #{meta.path}"
meta.write(epub_meta)
meta.close
args = %W[-S --epub-metadata=#{meta.path} -o book.epub ch1.md]
system('pandoc', *args)
end
Tempfile: /tmp/epub_metadata20121114-6385-uzqj24
When we run this, we can see that Tempfile
is taking the basename we gave it, appending some extra strings to make it unique, and creating the file in the system /tmp
dir. When we go to the shell, we can see that the file no longer exists: Ruby arranges for the file to be removed when the process ends.
petronius% ls -l /tmp/epub_metadata20121022-24501-ovmq8n ls: cannot access /tmp/epub_metadata20121022-24501-ovmq8n: No such file or direc tory
One day we decide to hook our script up to our web-based storefront. We no longer want a local .epub file; we just need the Epub data long enough to feed it into an HTTP PUT or POST request to the storefront API. Once again, we call on Tempfile
.
We create a method called #publish_book
, which will take a string containing the raw Epub data. Then we add a new Tempfile.open
block to our program. This block assigns its Tempfile
object to a variable called epub
. We immediately close this file. We replace the hardcoded book.epub
with the path of the epub
Tempfile
. Then, once the Pandoc command has completed, we re-open the file in read-mode using the #open
method. We read the Epub data into a String, and then pass it to the #publish_book
method.
require 'yaml'
require 'erb'
require 'tempfile'
def publish_book(book_data)
puts "Publishing #{book_data.bytesize} bytes..."
#...
end
metadata = YAML.load_file('meta.yaml')
title = metadata[:title]
description = metadata[:description]
template = ERB.new(<<EOF)
<dc:title><%= title %></dc:title>
<dc:description><%= description %></dc:description>
EOF
epub_meta = template.result(binding)
Tempfile.open('epub') do |epub|
epub.close
Tempfile.open('epub_metadata') do |meta|
meta.write(epub_meta)
meta.close
args = %W[-S --epub-metadata=#{meta.path} -o #{epub.path} ch1.md]
system('pandoc', *args)
end
epub.open
epub_data = epub.read
publish_book(epub_data)
end
In conclusion: the Tempfile
library takes the pain out of working with temporary files. It takes care of locating the system temp directory; generating files with globally unique names, and ensuring the files aren't left lying around when the program finishes.
In addition, by making it easy to write to, read from, close, and re-open the temp files it creates, Tempfile
is a great way to communicate with command-line utilities which expect to work with files are their inputs and outputs.
OK, that's plenty for today. Happy hacking!
Responses