In Progress
Unit 1, Lesson 1
In Progress

String Templates

Video transcript & code

In the last episode we saw how Ruby adds to C-style format strings by supporting a syntax where values from an argument hash can be interpolated by name. Today, we'll explore a practical application of this feature.

printf "my %{vehicle} is full of %{animal}", vehicle: "hovercraft", animal: "eels"
# >> my hovercraft is full of eels

Say we are in charge of some kind of distributed logging service. Every message that it logs has various bits of metadata added, like the time, the hostname, the process ID, and the severity.

require "socket"

class MyLogger
  def error(message)
    time     = Time.now
    host     = Socket.gethostname
    pid      = $$
    severity = "ERROR"
    puts "#{severity} #{time} #{host} #{pid} #{message}"
  end
end

logger = MyLogger.new
logger.error "Bogosity increasing!"
# >> ERROR 2014-03-18 11:28:24 -0400 hazel 18894 Bogosity increasing!

Now, let's say we want to make the log format configurable. One simple way to do this would be to expand the logger class to accept a formatter lambda as an attribute. Then if we wanted to configure an ultra simplified logging format that just includes the first letter of the severity and the message, we could do it using a custom formatter.

require "socket"

class MyLogger
  attr_accessor :formatter

  def initialize
    @formatter = ->(data) {
      "#{data[:severity]} #{data[:time]} #{data[:host]} #{data[:pid]} #{data[:message]}"
    }
  end

  def error(message)
    data = {
      message:  message,
      time:     Time.now,
      host:     Socket.gethostname,
      pid:      $$,
      severity: "ERROR"
    }
    puts formatter.call(data)
  end
end

logger = MyLogger.new
logger.formatter = ->(data) { "#{data[:severity][0]} #{data[:message]}" }
logger.error "Bogosity increasing!"
# >> E Bogosity increasing!

Unfortunately, making the formatter a lambda means that we can only configure this class in code. What if we wanted to make the format configuration accessible from a form on a web page?

Another thought we might have is to make the format configurable via an ordinary Ruby string, including interpolation codes.

Inside the logger class, the string is instance-evaled in order to fill in data for the current log message.

require "socket"

class MyLogger
  attr_accessor :format

  def initialize
    @format = '#{data[:severity]} #{data[:time]} #{data[:host]} #{data[:pid]} #{data[:message]}'
  end

  def error(message)
    data = { # !> assigned but unused variable - data
      message:  message,
      time:     Time.now,
      host:     Socket.gethostname,
      pid:      $$,
      severity: "ERROR"
    }
    puts instance_eval("\"#{format}\"")
  end
end

logger = MyLogger.new
logger.format = '#{data[:severity][0]} #{data[:message]}'
logger.error "Klingons off the starboard bow!"
# >> E Klingons off the starboard bow!

In this version, we configure the logging format by passing in a single-quoted string that includes Ruby interpolation codes. This works, but it exposes a gigantic security hole if we ever make this configurable via a web page. By using instance_eval to evaluate the format string, we've opened the door to allowing arbitrary code to be executed. This is no good.

Another possibility would be to use a templating framework like ERB. But that feels like overkill for formatting log strings, and anyway, it doesn't fix the security hole.

As an alternative to all of the solutions we've proposed so far, let's apply what we've learned about string formats. Instead of a string containing Ruby interpolation expressions, let's make the format a string that contains formatting codes. Then in the log printing code, we'll use printf with the format string. We'll pass in a hash of log message data as the value arguments for the formatting process.

Now if we want to customize the logging format, we can pass in a string containing formatting codes that refer by name to the log data that we care about. This gives our clients full control over the logging format. But by using string formats, we haven't opened up our code to injection attacks. The log format can only be used to adjust how log data is printed, not to execute arbitrary code.

require "socket"

class MyLogger
  attr_accessor :format

  def initialize
    @format = '%<severity>s %<time>s %<host>s %<pid>s %<message>s'
  end

  def error(message)
    data = {
      message:  message,
      time:     Time.now,
      host:     Socket.gethostname,
      pid:      $$,
      severity: "ERROR"
    }
    printf format, data
  end
end

logger = MyLogger.new
logger.format = "%<severity>.1s %{message}"
logger.error "Klingons off the starboard bow!"
# >> E Klingons off the starboard bow!

Note that we use the curly-brace shortcut notation for the message, since we don't need to specify any special formatting for it.

This technique is a handy one for any situation in which we want a lightweight way to customize how text is formatted, without resorting to a full-fledged templating language. We can store these custom format strings in a database or a configuration file and then apply them at runtime without allowing arbitrary code execution.

And that's enough for now. Happy hacking!

Responses