In Progress
Unit 1, Lesson 1
In Progress

Differentiation

Video transcript & code

Let's get something out of the way right up front: there won't be any code in today's episode. Today is all about design and object modeling.

Imagine it's the year 2115, and you're a member of the first generation of hackers born and raised on Mars. Recently the Mars colony has set out to build the first arboretum on the Red Planet. A botanist from Earth has arrived, along with a shipment of trees and other plants. You've been assigned to help with the project by writing whatever kinds of management software are needed. But your involvement is complicated by the fact that the only trees you've ever seen were on videos from earth.

However, you're eager to learn everything you can about this new problem domain. So on your first day on the job, you pick up some recently pruned leaves and ask the botanist, Dr. Larry, "what kind of tree are these from?"

Pin Oak leaves

Larry is busy taking a soil sample, and he distractedly says "those are Oak leaves". You're excited, because now you have your first term for an object in this new domain! An "Oak Tree".

You decide not to bother Larry any more, and instead take the leaves back to the lab. There, you do a little research, and catalog some of the characteristics of this "oak tree" leaf. Using the terminology you find in a botanical text, you determine that this kind of leaf:

  • Is broad and flat
  • It is simple, which means there is just one leaf per stem instead of bunches of leaves.
  • And it it has pointed lobes.

You're off to a great start! Now you go off in search of more leaves. Soon, you find this specimen. You note that it has many of the same characteristics as the first set of leaves you found. So you take it to the botanist, and say: "Dr. Larry, these are more Oak leaves, right?" But Larry laughs, and says "no, no, those are Maple leaves."

Maple leaves

You're confused, and you say "I don't get it, what's different about them? Both are broad, flat, simple and they both have pointed lobes!"

He patiently replies: "see here: the stems are alternate on the Oak twig, but they are opposite on the Maple leaf. And the edges are toothed on the Maple leaf. Whereas on the Oak, they are not toothed, but they are sharply spiked at the tips."

As Larry explains, you start to see just how different the two kinds of leaves really are. You feel like you're starting to get the hang of this tree business! To show off your newly acquired plant-recognition skills, you reach down and pick up another recently trimmed twig. You show it to Larry, and say: "I think I've just found a third kind of tree! This one is very different from either the Oak or the Maple!"

But Larry shakes his head, and says "that's an Oak leaf".

"But it looks nothing like the other Oak specimen!" you protest. Larry explains: "what you've got there is a cutting from a Chestnut Oak"

Chestnut Oak leaves

With your mind whirling, you return to the lab once more to try to make sense of what you've learned. You sketch out a rough class diagram, with Maple by itself, and "Chestnut Oak" inheriting from "Oak". The definitions for Chestnut Oak override the values of several of the "Oak" definitions. Instead of spiky lobes, it has none. Instead of smooth edges, it has blunt teeth.

You take this diagram and show it to Larry, to see if it's right. He laughs again, and says "No, no. The first cutting you picked up was from a Pin Oak. It is a separate variety of Oak from a Chestnut Oak. The Chestnut Oak is not a sub-variety of the Pin Oak!

With this new information, you think about how to re-draw your diagram. You think about extracting out the parts that are the same between Pin Oak and Chestnut Oak, and putting those in an "Oak" superclass. But you've learned something from all of your incorrect guesses: you've learned that you really have no idea which properties are consistent within a given tree genus, and which attributes might differ. So to play it safe, you make "Oak" an empty superclass, and put all the particulars in the individual subtypes of Oak.

Then you realize that you're working in Ruby, not Java, and the empty superclass serves no purpose. So you erase it.

It's been an enlightening first day on the job, that's for sure. You've learned a bunch of new tree-identification terms. But more importantly, you've learned some valuable lessons about modeling a problem domain.

Because we spend so much time in object-oriented programming thinking about generalization and abstraction, it's often tempting when approaching a new domain to start with the most general cases and then refine down to the specifics. But the trouble with this approach is that life rarely presents us with generalities first. Whether we are looking at tree leaves, hospital admittance forms, or types of bank trade, what we normally have to work from are specific examples, not abstract definitions.

Complicating this even further is the fact that very often, we are presented with specific examples presented under general terms. For instance, the leaves that Dr. Larry first identified as "Oak", instead of clarifying that they were an example of a particular kind of oak.

So what lessons can we take from this?

First, when diving into a new problem domain, we need to embrace specific, concrete examples of domain artifacts. After all, that's all we're likely to find. We should collect as many as we can get our hands on.

Then, we should find a domain expert, and show them the examples. We should then ask them a very important question: "Is this the same as that".

The best answer we can get is "no". When a domain expert tells us that one example is different from another, we get to ask the most important question of all: "why not?". Or, phrased a different way, "what makes this different from that?"

It is this key question that enables us to begin to discover the distinctions between domain concepts. It enables us to understand which differences are coincidental and unimportant, and which differences are fundamental. And in this process of differentiation, we often discover the all-important names for discrete concepts—names like Pin Oak, and Chestnut Oak.

It's essential to realize that in the process of differentiation, even if all we have are general terms, we aren't actually breaking a general case into specifics. Rather, we are breaking one specific case, into two specific cases. Differentiation does not imply hierarchy.

Which brings us to our second lesson: we shouldn't cling too tightly to the first names we discover for concepts. Very often, those initial names turn out to be over-broad for the specific examples we attach them to. We should seek diligently to narrow the names for concepts we identify. For instance, the term "Oak" is incomplete if it is applied to a class that encodes details that are specific to Chestnut Oaks.

I'm always a little suspicious of short, over-broad class names used to instantiate concrete objects in applications. For example, if a BankWithdrawal includes a field for an ATM Location, it's probably more accurately termed an ATMBankWithdrawal. If a User object isn't allowed to access the admin section of a site, it's not just a User; it's an UnprivilegedUser, or maybe just OrdinaryUser.

The third and final lesson we can take from today's example is that the class hierarchies which appear to emerge from this process of differentiation at the very beginning often turn out to be wrong, or at best incomplete. Encoding these flawed assumptions about the structure of the domain into our code can set us up for a lot of confusion and re-work down the road.

As a result, it's better to avoid hierarchical structures at first, even if that means a little duplication. We should let each type of unique, highly specific object we identify stand alone at first. Eventually, as our understanding of the domain stabilizes, and as we discover more areas of commonality, we can start to extract out more general cases.

Hopefully, these guidelines will help you to build a viable and robust model of the next problem domain you encounter. Happy hacking!

Responses