In Progress
Unit 1, Lesson 1
In Progress

Parsing Time

Here’s a problem you might have run into at some point: you are consuming data from an external service, maybe a website or a CSV dump. The textual and numeric fields are easy enough to import. But the dates and times give you trouble, because they are in an unusual or ambiguous format.

Parsing dates and times is a perennial headache in consuming data, because there are a seemingly limitless variety of representations.

In this episode, you’ll learn about Ruby’s facilities for parsing date and time values, both automatically and, when that breaks down, with guidance from the programmer. You’ll become familiar with some under-documented standard library APIs. And you’ll get to know a tool you can turn to when Ruby’s built in date and time parsing abilities simply aren’t sufficient.

Video transcript & code

I ran into an interesting problem the other day, and I thought I'd share.

Let's say we're parsing dates and times as found on a website. Often, we can simply require the time library and use Time.parse to automatically recognize the time format and return a Time object.

require 'time'

Time.parse('Apr 28, 2013 9:00am') 
# => 2013-04-28 09:00:00 -0400

But this isn't always the case. Sometimes we come across a time format which baffles Time.parse.

require 'time'

Time.parse('4/28/2013 9:00')    # => 
# ~> /home/avdi/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/time.rb:202:in `local': argument out of range (ArgumentError)
# ~>    from /home/avdi/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/time.rb:202:in `make_time'
# ~>    from /home/avdi/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/time.rb:271:in `parse'
# ~>    from -:3:in `<main>'

In cases like these we need to explicitly tell Ruby how to interpret the time string. We can do this by requiring the date library and using DateTime.strptime. DateTime.strptime normally takes two arguments: a time string to parse, and a format specification, and if successful it returns a DateTime object representing the parsed date and time.

strptime is named for the UNIX strptime(3) call. Confusingly, though, Ruby does not use the underlying system strptime(3) call to implement this method. Instead it uses its own built-in implementation. The benefit of this approach is that strptime will always work the same no matter what underlying platform Ruby is running on. But the disadvantage is that if you try to use the POSIX documentation of strptime(3) as a reference, you will quickly become frustrated because the Ruby formatting codes are different.

Even more annoyingly, Ruby's strptime has almost no documentation. Fortunately, however, its formatting codes are pretty much the same as the codes for Time.strftime, which is the method used to convert times into strings. For instance, here's a strftime format which prints US-style dates and times.

require 'date'

print_format = '%-m/%-d/%Y %k:%M'
Time.new(2013, 4, 28, 9, 0).strftime(print_format)
# => "4/28/2013  9:00"

In order to convert it to a valid strptime format to parse the same style, we only need to remove the padding modifiers from the format. Then to get a Time from the resulting DateTime, we simply send it the #to_time message.

require 'date'

format = '%m/%d/%Y %k:%M'
DateTime.strptime('4/28/2013 9:00', format).to_time
# => 2013-04-28 05:00:00 -0400

There's a problem here though. DateTime.strptime, given no explicit time zone to parse, assumed the time was UTC. But in fact, I happen to know this string was implicitly representing a time in my local time zone of EDT. Because DateTime.parse doesn't know this, the resulting Time object shows 5 o'clock instead of 9 o'clock.

As it turns out, DateTime.strptime has an obscure lower-level sibling method called _strptime. This awkwardly-named method does not generate a DateTime object. Instead, it returns a Hash of parsed date and time components. When we call it using our time string and format, we can see these components include the year, the month, the day of the month, the hour, and the minute.

require 'date'

format = '%m/%d/%Y %k:%M'
time_parts = DateTime._strptime('4/28/2013 9:00', format)
# => {:mon=>4, :mday=>28, :year=>2013, :hour=>9, :min=>0}

Knowing this, we can convert our time string to a locally-zoned Time object in two steps. First, we use _strptime to break the time string up into parts. Then we feed the parts into Time.local, which returns a new Time object with the appropriate local time zone offset.

require 'date'

format = '%m/%d/%Y %k:%M'
time_parts = DateTime._strptime('4/28/2013 9:00', format)
time = Time.local(
  time_parts[:year],
  time_parts[:mon],
  time_parts[:mday],
  time_parts[:hour],
  time_parts[:min])
time # => 2013-04-28 09:00:00 -0400

Is this process ideal? No, I'd say not. In my humble opinion Ruby's date and time libraries are badly in need of a refresh. But for now, this works.

The Ruby community has also stepped up to fill in the gaps in Ruby's date and time libraries. One useful gem is called chronic. It uses heuristics to parse a vast number of different date and time formats, so that in most cases we can simply call Chronic.parse with a time or date string and it will figure out what it means and give us a Time object back.

require 'chronic'

time = Chronic.parse('4/28/2013 9:00')
time # => 2013-04-28 09:00:00 -0400

That's all for today. Happy hacking!

Responses