In Progress
Unit 1, Lesson 1
In Progress

Oga

Video transcript & code

If you have used Ruby to work with HTML or XML documents in the last few years, you have likely used Nokogiri. Nokogiri is the de-facto standard Ruby binding to the libxml library. It's standards-compliant, featureful, and blazingly fast.

It is also bug prone, has issues with multi-threaded code, and is a hassle to build on some platforms.

Just like the last episode, on the http Gem, today's episode is about introducing you to an alternative Ruby library for a common task.

Today, that library is Oga.

require "oga"

And just like in that episode, I'm not going to be giving you a complete tutorial. Instead, I'll just be showing what makes this gem notable.

Let's say we want to pull the description of this episode out of the script source.

First, we tell Oga to parse the script.

This gives us a document object.

Then we query the document for the first meta tag with the name attribute of description, using CSS notation.

Finally, we take the value of the "content" attribute.

require "oga"

doc         = Oga.parse_html(File.read("429-oga.html"))
doc.class                       # => Oga::XML::Document

desc_elt    = doc.at_css("meta[name='description']")
# => Element(name: "meta" attributes: [Attribute(name: "name" value: "description"), Attribute(name: "content" value: "Today we learn about a lightweight alternative to Nokogiri for XML parsing.")])

script_desc = desc_elt.get("content")
# => "Today we learn about a lightweight alternative to Nokogiri for XML parsing."

This yields our episode description.

Now, if you're a veteran of using Nokogiri, you're probably thinking that this isn't very remarkable at all. In fact, it's nearly identical to using Nokogiri for the same task.

It turns out that the advantages of Oga aren't very amenable to being shown in a screencast. Because Oga isn't notable for offering a new XML API, exceptional speed, or support for novel features. What Oga brings to the table is the fact that it has no native library dependencies.

Unlike NokogiriOga is implemented almost entirely in Ruby. It does have a tiny natively-compiled extension for speed. But that extension is built using the Ragel state-machine language, which means that it can be compiled to either C or Java, for compatibility with JRuby. There are no external libxml bindings to contend with.

The bottom line is that Oga is boring, in the very best of ways. It gives us a new way to work with HTML and XML documents which is largely compatible with the way we already do it in Nokogiri. But it is lighter on dependencies, and is less prone to multithreading issues and other types of bugs that tend to crop up at the border of Ruby and native binary libraries. And that makes it a library worth keeping in our toolbox.

Happy hacking!

Responses