In Progress
Unit 1, Lesson 21
In Progress


Video transcript & code

The music-streaming service I've used for years is shutting down, so I've been pulling some of my data off of it. Part of this data included a "play later" list that I'd built up. I couldn't find an easy way to export the list in a simple format, so I just used a browser inspector to capture the raw HTML of the list.

Now I need to dig into this data dump and pull out a structured list of albums. Let's start our data safari with an exploratory script. We'll require the nokogiri library in order to parse the HTML.

Now, before we go any further, let's take a moment to set up our environment. We'll fire up a terminal, and use the rerun utility to run our script repeatedly, every time it changes.

We learned about rerun in episode #320.

Over in our script, we'll require the awesome_print gem, which we met in the last episode. Then we'll read our HTML data into a Nokogiri document, and dump the document to standard out using awesome_print.

When we save the file, we see the HTML data stream out in our terminal.

Being able to see feedback when we save the file is a pretty decent feedback loop. But let's go one better. I've written a tiny bit of customization for my editor. It defines an optional mode in which the file will be automatically saved every time I make a change, so long as the change doesn't render the code invalid.

Let's enable this mode for this file.

Now let's start digging into the data. We can see in our output that the HTML elements include some divs with a class of "albumtitle".

Let's narrow our results down to just these elements.

That definitely looks promising. Except for one thing: there's no information about the album artists in this output. It looks like we may have narrowed the data too far.

Let's map over the result elements and jump up to their parent elements.

Hmmm, that still doesn't look like it has the information we need. Let's try going up one more level.

OK, now we can see that the results include elements with artist_title classes.

Let's now map one more time, in order to extract some cleaner result data. Inside this map, we'll create hashes.

The first hash key will be album. For the value, we'll start at the current element.

Then we'll search back down to the album link element.

All we want is the text of the element.

Now we move on to the artist key. Again, we start with the current element.

We descend down to the artist_title link.

And once again, we strip away everything but the text content.

At this point, we have some nicely cleaned-up data that's ready for export. We could easily throw this into a CSV or JSON file, or into a database.

Did you notice how at every step of the way, we were able to see the results of our code more or less immediately? When we were kids, this was how pretty much everything we played with worked. Whether we were drawing with watercolors, or building towers out of blocks, we made a change and instantly saw the outcome. This feedback helped us understand effects of our changes, and gave us ideas for future changes.

As we grew older, we learned to work at further and further removes from the concrete outcomes of our efforts. And in fact, a lot of the tools and methodologies in software seem to deliberately push us away from this kind of direct interaction. We learn to think in terms of abstractions, occasionally running some checks to see if the abstractions achieve the intended results. Even when we're doing test-driven development, a practice that's intended to give us a tight feedback loop, we have to compare one value to another and see it fail before we get a chance to see the actual results we're generating.

Compare that to how we explored our playlist HTML. Between our editor customization and our use of rerun, we were able to see the effect of every experiment, instantly. This quality of immediacy is a powerful multiplier. Our human brains are optimized for reasoning through problems with the help of malleable artifacts. Whether the artifact is a stick we use to scribble in the dust, an abacus, or a whiteboard and a marker, we think more effectively when we can express our thoughts by directly manipulating objects, and seeing the results of those manipulations reflected immediately back to us.

As we work with software, it's worthwhile to look for ways to bring this sense of immediacy back to our practice. Often, with just a little creative application of existing tools, we can greatly improve our level of direct feedback. Working this way can be extraordinarily satisfying. It can encourage experimentation. And we avoid going down blind alleys, because we never go very long without seeing the concrete results of our actions.

And that's all for today. Happy hacking!