In Progress
Unit 1, Lesson 21
In Progress

Rake Regex Rule

Video transcript & code

Back in episodes #129 and #131 we introduce the concept of rules in the Rake tool. Here's an example of a simple rule.

It tells Rake how to create HTML files from markdown files. Unlike a task for a concrete filename, this rule will work for any file ending in .html.

rule ".html" => ".md" do |t|
  sh "pandoc -o #{t.name} #{t.source}"
end

Let's quickly review what this means. We'll ask Rake to build us a file named hello.html.

Rake doesn't skip a beat: it immediately finds a file named hello.md, and uses the command we supplied to build it into the target file we requested.

$ rake hello.html
pandoc -o hello.html hello.md

This rule takes advantage of a shortcut that Rake provides. If our target and our source are both strings starting with a dot, Rake assumes that we are saying "given any .html target, assume look for a source file with the same basename, but ending in .md instead of .html."

This is an optimization for a particularly common case of rule. But it's not the only kind of rule we might want to set up.

For instance, consider a slightly more complex scenario. We have a directory full of subdirectories, each of them named for a user. Each directory already contains a user-info.yaml file.

$ tree
.
├── crowtrobot
│   └── user-info.yaml
├── rowsdower
│   └── user-info.yaml
└── tomservo
    └── user-info.yaml

3 directories, 3 files

We want to write a Rake rule which, for a given user, will perform some kind of import action, and then create a timestamp file to mark that user import as complete.

As a simple front-end for our rule, we first create a task that accepts a username as an argument.

The role of this task is to take the username argument, and then invoke a rule for generating the timestamp which marks the task as done.

require "yaml"

task :import, :email do |t, args|
  Rake::Task["#{args[:email]}/import.timestamp"].invoke
end

Let's just pause here for a second and try this task out as-is.

$ rake import[tomservo]
rake aborted!
Don't know how to build task 'tomservo/import.timestamp' (see --tasks)
/home/avdi/Dropbox/rubytapas-shared/working-episodes/424-rake-regex-rule/example2/Rakefile:2:in `block in <top (required)>'
Tasks: TOP => import
(See full trace by running task with --trace)

Rake tells us that in order to import the user tomservo, it needs to build a file named tomservo/import.timestamp.

But it doesn't know how to do this. So now let's teach it how.

Back in our Rakefile, we introduce a new rule.

A simple file-extension rule like we saw before just isn't going to cut it this time around. We need to be able to match an import.timestamp file in any of the username subdirectories. We need some kind of more advanced pattern matching this time. Like, say, a regular expression!

Well guess what, we're in luck: Rake rule targets can be specified in the form of a regular expression!

We write a regular expression with a wildcard for the subdirectory, followed by the filename import.timestamp.

We've used %r regex quoting with double quotes as our delimiter, in order to avoid leaning toothpick syndrome in our regular expression literal. You might recall that we talked about leaning toothpicks in episode #408.

Now we've specified the intended output of this rule. But what about the input? Remember, each user directory has a user-info.yaml file that needs to be used as part of the import process.

How do we tell this rule: "for a given import.timestamp file to be created, look for a file in the same directory named user-info.yaml?"

rule %r"\w+/import.timestamp"

Well, if you recall back in episode #132 we learned that Rake provides a tool for spelling out these kinds of filename transformations. It's called #pathmap, and it is added by Rake to all strings.

Let's take one of our potential target strings, tomservo/import.timestamp.

How do we transform this into the matching user info file path?

We can send pathmap, and provide a pattern starting with %d. This is the code for just the directory part of the starting pathname.

To this, we add a slash and the user-info.yaml filename.

The result is a concrete path for our source file.

require "rake"

"tomservo/import.timestamp".pathmap("%d/user-info.yaml")
# => "tomservo/user-info.yaml"

Now, back to our rule.

Wouldn't it be cool if we could just use this pathmap language to specify our rule's dependency? Well guess what: we can!

It really is as simple as supplying the same pathmap template we just came up with as the dependency.

Let's add some placeholder code for the actual import process.

At the end, we touch the target timestamp file in order to mark the import as complete.

rule %r"\w+/import.timestamp" => "%d/user-info.yaml" do |t|
  info = YAML.load(File.read(t.source))
  puts "Importing user #{t.source.pathmap('%d')}"
  # ...
  touch t.name
end

Let's give our new rule a test drive. Once again we invoke our frontend task with a specific username.

This time, Rake tells us that it is importing the user we specified, then touches the timestamp file to mark the task as complete.

it looks like this worked! Let's import the rest of our users.

$ rake import[tomservo]
Importing user tomservo
touch tomservo/import.timestamp
$ rake import[crowtrobot]
Importing user crowtrobot
touch crowtrobot/import.timestamp
$ rake import[rowsdower]
Importing user rowsdower
touch rowsdower/import.timestamp

If we take a look at the file tree again, we can see that each subdirectory now has its timestamp file.

$ tree
.
├── crowtrobot
│   ├── import.timestamp
│   └── user-info.yaml
├── Rakefile
├── rowsdower
│   ├── import.timestamp
│   └── user-info.yaml
└── tomservo
    ├── import.timestamp
    └── user-info.yaml

3 directories, 7 files

And if we try to run one of the imports again, nothing happens, because Rake's dependency tracking determines that the timestamp file is newer than the source file.

$ rake import[tomservo]
$

Back to our rule…

rule %r"\w+/import.timestamp" => "%d/user-info.yaml" do |t|
  info = YAML.load(File.read(t.source))
  puts "Importing user #{t.source.pathmap('%d')}"
  # ...
  touch t.name
end

Let's review what we've learned. We knew that we could write simple rules mapping one file extension to another using Rake rules. What we've now seen is that this is just a special case of a much more powerful facility. Rake rules enable us to concisely state dependencies between files with all kinds of naming rules and relationships. Which is a pretty powerful tool to have at our disposal. Happy hacking!

Responses