In Progress
Unit 1, Lesson 1
In Progress

Rake Introspection

The power of a build tool such as Rake is that you tell it where you want to go, and it figures out the least possible work needed to get there. But when a build tool doesn’t do what you expected, it can be challenging to understand why. In this episode, you’ll learn about several techniques for peering into Rake’s thought process.

Video transcript & code

Today we're going to be talking about Rake. As the standard project automation tool for Ruby and Rails-based projects, I think it's useful to know how to use Rake to its full potential. Today, I want to talk about how to use some of Rake's features for introspection.

Say we've got a Rakefile that looks like this.

There's a top-level default task which has a prerequisite called "site".

The site target depends on an index.html file.

And that file is built from a bunch of markdown in a manuscript directory...

Using a pandoc command.

task "default" => ["site"]

task "site" => "build/site/index.html"

file "build/site/index.html" => FileList["manuscript/*.md"] do
  mkpath "build/site"
  sh "pandoc --resource-path=manuscript -o build/site/index.html manuscript/*.md"
end

That's the human view of this file. But how does Rake see it? What does it do with this information when it runs?

Most of the time, you can get away with not caring. But when Rake does something surprising, it's useful to know how to read its mind.

If you've used Rake at all, you're probably familiar with the --tasks flag.

or -T for short.

$ rake --tasks
$ 

In this case, the output is blank, because none of the tasks in this rakefile are documented yet.

Adding the --all flag shows us what we were missing.

That's -A for short.

$ rake --tasks --all
rake build/site/index.html  # 
rake course                 # 
rake default                # 
rake offline                # 
rake site                   # 

The --tasks/-T flag can optionally filter based on a regular expression as well.

For instance, we can search for tasks having to do with HTML.

$ rake -T html -A     
rake build/site/index.html  #

So that covers listing entry points. But how does Rake see the connections between these tasks?

For that, we run rake --prereqs

(rake -P for short).

$ rake --prereqs
rake build/site/index.html
    manuscript/chapter01.md
    manuscript/chapter02.md
    manuscript/chapter03.md
    manuscript/chapter04.md
    manuscript/chapter05.md
    manuscript/chapter06.md
    manuscript/chapter07.md
    manuscript/chapter08.md
    manuscript/chapter09.md
    manuscript/chapter10.md
    manuscript/chapter11.md
    manuscript/chapter12.md
    manuscript/chapter13.md
    manuscript/chapter14.md
    manuscript/chapter15.md
rake course
rake default
    site
    offline
    course
rake offline
rake site
    build/site/index.html

This command produces a "map" of our project's dependencies. It's not in the same order as the tasks are found in the Rakefile, but with a bit of inspection we can trace the dependencies from one target to the next.

An interesting point to note here is how the Markdown files are listed individually.

Back in the Rakefile they are specified with a shell globbing pattern in a FileList.

file "build/site/index.html" => FileList["manuscript/*.md"] do
  # ...

But in the --prereqs output, Rake shows the expanded list of files that build/site/index.html depends on.

rake build/site/index.html
    manuscript/chapter01.md
    manuscript/chapter02.md
    manuscript/chapter03.md
    ...

So that's the map of dependencies that Rake is using. But how does it navigate its way through this map when we invoke rake?

To see this, let'stouch one of the manuscript files to make it appear to be recently modified,

and then run rake with the --trace flag (rake -t for short).

$ touch manuscript/chapter03.md 
$ rake --trace
** Invoke default (first_time)
** Invoke site (first_time)
** Invoke build/site/index.html (first_time)
** Invoke manuscript/chapter01.md (first_time, not_needed)
** Invoke manuscript/chapter02.md (first_time, not_needed)
** Invoke manuscript/chapter03.md (first_time, not_needed)
** Invoke manuscript/chapter04.md (first_time, not_needed)
** Invoke manuscript/chapter05.md (first_time, not_needed)
** Invoke manuscript/chapter06.md (first_time, not_needed)
** Invoke manuscript/chapter07.md (first_time, not_needed)
** Invoke manuscript/chapter08.md (first_time, not_needed)
** Invoke manuscript/chapter09.md (first_time, not_needed)
** Invoke manuscript/chapter10.md (first_time, not_needed)
** Invoke manuscript/chapter11.md (first_time, not_needed)
** Invoke manuscript/chapter12.md (first_time, not_needed)
** Invoke manuscript/chapter13.md (first_time, not_needed)
** Invoke manuscript/chapter14.md (first_time, not_needed)
** Invoke manuscript/chapter15.md (first_time, not_needed)
** Execute build/site/index.html
mkdir -p build/site
pandoc --resource-path=manuscript -o build/site/index.html manuscript/*.md
** Execute site
** Invoke offline (first_time)
** Execute offline
** Invoke course (first_time)
** Execute course
** Execute default

Let's take this a piece at a time.

Rake starts off at the default target.

** Invoke default (first_time)

The first_time flag indicates that this is the first time Rake has invoked this task (that is, attempted to build this target) during this run.

Since multiple targets may depend on the same prerequisite, it is possible that Rake's dependency tracing may try to invoke the same task more than once. Rake keeps track of whether a task has been invoked already so it can "prune" repeated invocations of the same task.

Rake next finds that default depends on site,

which depends on build/site/index.html.

** Invoke site (first_time)
** Invoke build/site/index.html (first_time)

The index.html in turn depends on a directory full of Markdown files.

** Invoke manuscript/chapter01.md (first_time, not_needed)
** Invoke manuscript/chapter02.md (first_time, not_needed)
** Invoke manuscript/chapter03.md (first_time, not_needed)
...

In the Rakefile these files are only listed as dependencies, not as targets. As such, they are "pure sources"; they don't need to be built from other files.

Rake flags these as not_needed because they are not targets to be built.

They are terminal nodes in the dependency tree.

Unfortunately, the --trace output does not flag the updated file in this list. However, having traced all the dependencies of index.html and determined that it is out of date,

Rake executes the action associated with the build/site/index.html file target.

** Execute build/site/index.html
mkdir -p build/site
pandoc --resource-path=manuscript -o build/site/index.html manuscript/*.md

Take note of the distinction between invoke and execute in this output.

** Invoke build/site/index.html (first_time)
...
** Execute build/site/index.html

When Rake invokes a task, it is determining whether it needs to build that target at all. When it executes a task, it has decided that yes, it needs to do some work, and so it is executing any actions associated with the target.

With the prerequisite updated, Rake executes the site task, but it has no associated actions at present.

** Execute site

Similarly, the top-level default task has no actions.

** Execute default

So Rake's --trace gives you a window into what it is thinking as it is working its way through build dependencies.

But what if we just want to know how Rake will trace through the dependencies, without it actually running any shell commands or building any files?

For that circumstance, there is --dry-run mode

Or rake -n for short.

$ touch manuscript/chapter01.md 
$ rake --dry-run
** Invoke default (first_time)
** Invoke site (first_time)
** Invoke build/site/index.html (first_time)
** Invoke manuscript/chapter01.md (first_time, not_needed)
** Invoke manuscript/chapter02.md (first_time, not_needed)
** Invoke manuscript/chapter03.md (first_time, not_needed)
** Invoke manuscript/chapter04.md (first_time, not_needed)
** Invoke manuscript/chapter05.md (first_time, not_needed)
** Invoke manuscript/chapter06.md (first_time, not_needed)
** Invoke manuscript/chapter07.md (first_time, not_needed)
** Invoke manuscript/chapter08.md (first_time, not_needed)
** Invoke manuscript/chapter09.md (first_time, not_needed)
** Invoke manuscript/chapter10.md (first_time, not_needed)
** Invoke manuscript/chapter11.md (first_time, not_needed)
** Invoke manuscript/chapter12.md (first_time, not_needed)
** Invoke manuscript/chapter13.md (first_time, not_needed)
** Invoke manuscript/chapter14.md (first_time, not_needed)
** Invoke manuscript/chapter15.md (first_time, not_needed)
** Execute (dry run) build/site/index.html
** Execute (dry run) site
** Invoke offline (first_time)
** Execute (dry run) offline
** Invoke course (first_time)
** Execute (dry run) course
** Execute (dry run) default

This functions similarly to --trace, but without actually running any actions.

Now if you come from a background of using GNU Make, this may not be the behavior you expected from a --dry-run flag. It certainly wasn't what I expected. I'm used to a --dry-run flag showing the shell commands a build tool would execute, without actually executing them.

I did some poking around, and found to my surprise that while Rake is written to enable this type of dry run, there is no command-line flag actually exposing the functionality. At least not as of version 12.3.1, the one I'm demonstrating here.

Fortunately Rake is flexible.

Before we proceed, let's blow away the build directory, so that we have an easy way to check whether Rake changes anything in our project.

$ rm -rf build

With that done... we can use the -E flag to unlock Rake's hidden dry run abilities.

$ rake -E 'Rake.nowrite(true)' 

This flag can also be written out as --execute-continue flag, but ugh, who has the time.

$ rake --execute-continue 'Rake.nowrite(true)' 

to evaluate some arbitrary Ruby code before continuing with normal Rake operation. The code Rake.nowrite(true) switches on a global "no write" mode for Rake commands like mkpath and sh. "No write" here means "don't make changes to the filesystem".

When we run this, we see output that makes it look like Rake is executing commands to create a build/site directory and generate index.html.

$ rake -E 'Rake.nowrite(true)' 
mkdir -p build/site
pandoc --resource-path=manuscript -o build/site/index.html manuscript/*.md

But afterwards we can verify that the directory was never actually created.

$ ls build
ls: cannot access 'build': No such file or directory

By enabling the "no write" flag, we got Rake to do a kind of "rehearsal" or "make-believe" run, without actually changing anything.

A word of warning about this: it only works for Rake helpers that support the nowrite mode. If your rake tasks contain actions that don't make use of these helpers or check for the nowrite flag themselves, this mode will have no effect, and Rake will perform the actions for real.

In today's episode we've seen some lesser-used Rake functionality. We've seen how to list undocumented tasks, how to reveal Rake's dependency map, how to understand Rake trace output, and how to ask Rake what it would do without actually doing it. I hope these tools help you feel more confident when using Rake to automate your projects. Happy hacking!

Responses