In Progress
Unit 1, Lesson 1
In Progress

Scanf

Video transcript & code

Recently we found out that thanks to the wonders of Kickstarter, our favorite childhood video game is being re-made in 3-D virtual reality IMAX smell-O-vision.

wumpus_countdown = "08 months 17 days 09 hours 16 minutes 03 seconds"

We're so excited that we've started writing a script to poll the announcements for the game.

In order to parse out the countdown, we have a regular expression with a bunch of capture groups matching decimal numbers.

We match the countdown against the pattern.

Then we take each captured number, convert it to an integer, and assign it to a variable.

wumpus_countdown = "08 months 17 days 09 hours 16 minutes 03 seconds"

pattern = /(\d+) months (\d+) days (\d+) hours (\d+) minutes (\d+) seconds/
matches = pattern.match(wumpus_countdown)
# => #<MatchData
#     "08 months 17 days 09 hours 16 minutes 03 seconds"
#     1:"08"
#     2:"17"
#     3:"09"
#     4:"16"
#     5:"03">
month_count  = matches[1].to_i  # => 8
day_count    = matches[2].to_i  # => 17
hour_count   = matches[3].to_i  # => 9
minute_count = matches[4].to_i  # => 16
seond_count  = matches[5].to_i  # => 3

There are two separate steps here: matching the decimal numbers using a regular expression, and then converting the resulting match strings to integers.

What if we could perform both of these steps at once?

Well we can, using a Ruby standard library called scanf

If you've used printf, or the Ruby string formatting operator we talked about in episodes #194 and #195, you can think of scanf as simply printf in reverse.

To see what I mean, let's start by rewriting our pattern. This time, it'll be a string, not a regex. And we'll use string-format-style specifiers to indicate the variable parts of the pattern.

We've used the %d specifier to indicate that the fields we are interested in capturing are all decimal integers.

Now let's take our countdown string and send it the scanf message along with this new pattern.

The result is an array of numbers. Not an array of strings representing numbers like we got from our regular expression version. But actual integer values.

We can now use destructuring assignment to assign all of our count variables at once.

Let's go ahead and dump all of these variables, just to show that we have indeed captured the data we wanted.

As we've seen scanf, uses a very similar format pattern to printf and the string format operator. A fun and potentially useful implication of this fact is that we can use the same pattern to both parse text, and to reconstruct it.

For instance, if we apply the string format operator to our pattern and supply all of the variables we just assigned, we get our original countdown back.

wumpus_countdown = "08 months 17 days 09 hours 16 minutes 03 seconds"

require "scanf"
pattern = "%d months %d days %d hours %d minutes %d seconds"
wumpus_countdown.scanf(pattern)
# => [8, 17, 9, 16, 3]

month_count, day_count, hour_count, minute_count, second_count =
  wumpus_countdown.scanf(pattern)

month_count                     # => 8
day_count                       # => 17
hour_count                      # => 9
minute_count                    # => 16
second_count                    # => 3

pattern % [month_count, day_count, hour_count, minute_count, second_count]
# => "8 months 17 days 9 hours 16 minutes 3 seconds"

scanf has slightly different behavior when called with a block. To demonstrate, let's say we have a scoresheet from a game of some kind. Each line has the name of a player, and their score for one round. Notably, the scores can be either positive or negative.

Let's set up a pattern for these scoring lines. It will consist of %s, specifying a string with no whitespace, followed by %d, specifying a decimal number.

Then we'll send the scanf message to the scoresheet, with our pattern as the argument.

But this time, we'll supply a block.

Let's capture the block arguments in a single variable for now.

Then, just to get an idea of how scanf works with a block, let's dump the argument to see what it is.

When we run this, we see that the block is executed once for each score line in the scoresheet.

You might have noticed that the layout of the scoresheet is rather messy. There are extra spaces here and there, as well as some random blank lines. But none of that messiness is reflected in the output.

The reason is that scanf normally ignores whitespace before patterns it is trying to match. In addition, when it encounters a space in the pattern , it considers that to be a match against any amount of whitespace in the input text. So this single space is treated as match for multiple spaces, tabs, or even newlines.

require "scanf"

scores = <<EOF
kashti +2
ebba  +3
ylva +4

avdi     +1
kashti +1
ebba  -1
ylva +2
avdi   -3
kashti -2

ebba   +3
ylva -1
avdi +2
EOF

pattern = "%s %d"
scores.scanf(pattern) do |arg|
  p arg
end

# >> ["kashti", 2]
# >> ["ebba", 3]
# >> ["ylva", 4]
# >> ["avdi", 1]
# >> ["kashti", 1]
# >> ["ebba", -1]
# >> ["ylva", 2]
# >> ["avdi", -3]
# >> ["kashti", -2]
# >> ["ebba", 3]
# >> ["ylva", -1]
# >> ["avdi", 2]

Now that we understand how scanf deals with a block, let's use this block for something more useful.

We'll introduce a tally hash, in which missing values will default to zero.

We'll give the block two arguments: a player name, and a number of points.

Inside the block, we'll add the points to the current tally for the given player name.

When we run this and examine the resulting hash, we can see that the scores were successfully added up.

This all happened in an impressively brief amount of code.

tally = Hash.new(0)
scores.scanf(pattern) do |name, points|
  tally[name] += points
end
tally
# => {"kashti"=>1, "ebba"=>5, "ylva"=>5, "avdi"=>0}

There's a lot more to the scanf library. In this episode we've only looked at the string and decimal conversion types, but scanf has a full complement of specifiers for floating-point numbers, octal and hexadecimal, and more. Check out the module documentation for the full run-down.

When we want to both match an input pattern and, at the same time, extract numeric values from that pattern, scanf can shrink down the amount of code we have to write. It's not a tool you'll use everyday, but it's a good one to know exists. Happy hacking!

Responses