In Progress
Unit 1, Lesson 21
In Progress

Lenient Conversions

Video transcript & code

In the previous episode we breezed by this bit of code fairly quickly. In it, we tell a string to convert itself to an integer. It dutifully does so, returning zero despite the fact that the string bears no earthly resemblance to a number. We contrasted this to the Integer() conversion function, which raises an error when given the same string.

"ham sandwich".to_i             # => 0
Integer("ham sandwich")         # => 
# ~> -:2:in `Integer': invalid value for Integer(): "ham sandwich" (ArgumentError)
# ~>    from -:2:in `<main>'

We can observe the same difference between the Float() conversion function and #to_f conversion method.

"veggie melt".to_f              # => 0.0
Float("veggie melt")            # => 
# ~> -:2:in `Float': invalid value for Float(): "veggie melt" (ArgumentError)
# ~>    from -:2:in `<main>'

If you did a double take the first time you saw code like this in Ruby, I can't blame you. We know from what we discussed in that episode that explicit conversion methods like #to_i are conventionally lenient, but this seems like it is taking lenience to absurd levels.

Today, however, I'd like to make a case for this head-scratch-inducing behavior.

First off, #to_i on a string will parse as much of a string as looks like an integer, and ignore the rest. It will even ignore leading whitespace. Here's an example.

And the same is true for #to_f.

"   187lbs".to_i                    # => 187
"   13.5oz".to_f                    # => 13.5

This is great for messy datasets that were originally intended for humans. For instance, data scraped from a table on a web page.

These kinds of messy real-world data sets often have bigger problems than whitespace and extra junk at the end, however.

Consider this data.

REPORT = [
  ["Date",      "Product",            "Quantity", "Unit Price", "Total"],
  ["4/30/2014", "Polka-dot paint",    "1",        "5.99",        "5.99"],
  ["4/30/2015", "Spool of Shoreline", "3",        "3.50",       "10.50"],
  ["May",       nil,                  nil,        nil,          nil    ],                                                              
  ["Date",      "Product",            "Quantity", "Unit Price", "Total"],
  ["5/1/2015",  "Smoke Shifter",      "1",        "8.99",        "8.99"],
]

This looks like sales data we might see scraped from an online product sales report. Let's say we want to quickly tally up the totals in this data. We perform a reduction on the rows, converting the last column in each row to a floating point number, and adding that to a running total.

require "./data"

sum = REPORT.reduce(0){|sum, row|
  sum + Float(row.last)
}
# ~> -:4:in `Float': invalid value for Float(): "Total" (ArgumentError)
# ~>    from -:4:in `block in <main>'
# ~>    from -:3:in `each'
# ~>    from -:3:in `reduce'
# ~>    from -:3:in `<main>'

Unfortunately, this blows up on the very first record. That row turns out to be a header row, and the Float() conversion function can't convert the string "Total" into a floating point number.

So we update our code to ignore the first row. But it still fails. This time, it's trying to convert a nil to a Float().

It turns out this is because the data contains a row with nothing but the month name at the transition from one month to the next. All the other fields on that row are nil. And even if we somehow filtered out nils, the very next row would cause a failure because it contains a repeat of the headers.

require "./data"

sum = REPORT[1..-1].reduce(0){|sum, row| # !> assigned but unused variable - sum
  sum + Float(row.last)
}
# ~> -:4:in `Float': can't convert nil into Float (TypeError)
# ~>    from -:4:in `block in <main>'
# ~>    from -:3:in `each'
# ~>    from -:3:in `reduce'
# ~>    from -:3:in `<main>'

All we want here is a quick tally. Trying to filter out all the possible bad data is a waste of our time.

Let's change from using a strict conversion function to a lenient conversion method. Remember, this method will just return 0.0 if it can't convert the receiver to a Float.

require "./data"

sum = REPORT.reduce(0){|sum, row|
  sum + row.last.to_f
}
sum                             # => 25.480000000000004

This time, we get our total right off the bat.

As it turns out, in this context it is harmless for unusable data to be converted to zero. And it's a lot easier to do that than to try and filter out all the bad data. This data set consists of just a few lines, but in a real-world equivalent it might well consist of thousands of rows, any of which might hide idiosyncrasies.

This is an example of the "Benign Object" pattern. The value of zero in this context is a kind of "benign placeholder". It might not be strictly true to say that those rows contained zero values. But it's not hurting anything either.

So what does this example tell us about choosing between strict conversion functions, and lenient explicit conversion methods?

In Ruby, sending an explicit conversion method such as #to_f is idiomatic to mean "please try your best to give me a Float, but don't worry about it if you can't". Whereas using a conversion function is an idiomatic way to say "this value must be recognizable as a floating point number, otherwise there is something wrong and we can't continue".

It's important that we are aware of what statement we are implicitly making, when we use these different types of conversion. We've just seen how useful an explicit conversion function can be in some situations. But I've often come across the use of methods like #to_f and #to_i in situations where it would have been better to either use a conversion function, or to leave off the conversion entirely and rely on higher-level code to filter incoming values.

For instance, consider this class representing a drone. We can tell it to change its altitude by some increment in feet. The increment is converted to an integer before being used.

Drone = Struct.new(:altitude) do
  def change_altitude_by(feet)
    self.altitude = altitude + feet.to_i
  end
end

d = Drone.new(2000)

d.change_altitude_by(1000)
d.altitude                      # => 3000
d.change_altitude_by("OMG watch out for that tree")
d.altitude                      # => 3000
d.change_altitude_by(nil)
d.altitude                      # => 3000

We can pass in an integer, and see the new altitude reflected. But if we pass in a string, it silently fails to make any change. Likewise if a nil somehow finds its way into our system, it silently does nothing.

Chances are this is not what we want. We'd like to know if a value that the drone doesn't know what to do with gets into the system. We don't want to be in a situation where we believe that messages are having results, when in fact they are accomplishing nothing.

In this case, a strict conversion would be a better choice. But even going without a conversion of any kind would be preferable to the lenient explicit conversion. The conversion method as it stands now is effectively hiding bugs.

When we remove the conversion, the bad data raises a coercion error rather than being silently accepted.

Drone = Struct.new(:altitude) do
  def change_altitude_by(feet)
    self.altitude = altitude + feet
  end
end

d = Drone.new(2000)

d.change_altitude_by(1000)
d.altitude                      # => 3000
d.change_altitude_by("OMG watch out for that tree")
# ~> -:3:in `+': String can't be coerced into Fixnum (TypeError)
# ~>    from -:3:in `change_altitude_by'
# ~>    from -:11:in `<main>'

To sum up: Ruby has both lenient and strict ways to turn arbitrary values into numbers. The lenient conversion methods are particularly useful when we are dealing with messy data and a few extra zeroes won't hurt anything. The strict conversion functions, on the other hand, are a great way to "fail fast" when bad data finds its way into a critical method. We should be aware of the differences between these types of conversion, and choose one, or the other—or neither!—accordingly.

Happy hacking!

Responses