In Progress
Unit 1, Lesson 1
In Progress

Hash Equality

Video transcript & code

Let's recap some recent episodes. We've built a class, called `Feet`, that represents a measurement in feet. We have made it immutable, like Ruby's core numeric types.

``````class Feet

def initialize(magnitude)
@magnitude = magnitude.to_f
freeze
end

def to_s
"#{@magnitude} feet"
end

def +(other)
Feet.new(@magnitude + other.magnitude)
end

def ==(other)
other.is_a?(Feet) && magnitude == other.magnitude
end
end
``````

We have also ensured that it determines equivalence in terms of state, rather than in terms of object identity. Let's just review that point briefly.

Given an instance which represents a quantity of 1, another which represents a quantity of 2, and another, separate object that also represents a quantity of 1, we can observe the following:

• An `Feet` object is identity-equal only to itself. Even objects with the same magnitude have different identities and so are not identity-equal.
• A `Feet` object is equivalent to itself. It is not equivalent to objects with another value, nor is it equivalent to objects of an unrelated class. However, it is equivalent to other `Feet` objects with the same magnitude.
• Case-equality for `Feet` objects obeys the same rules as value-equality.
• As a side effect of the equivalence behavior, it is possible to check for the presence of a given quantity of feet in an array by value, rather than by identity.
``````require "./feet"

one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)

one.equal?(one)                 # => true
one.equal?(two)                 # => false
one.equal?(other_one)           # => false

one == one                      # => true
one == two                      # => false
one == other_one                # => true

one === one                      # => true
one === two                      # => false
one === other_one                # => true

[Feet.new(1)].include?(Feet.new(1)) # => true
``````

Now let's take a look at how `Feet` objects behave when used as keys in a Hash. We'll construct a simple hash with one value keyed on our one-foot object, and another keyed on the two-feet object.

Given what we know so far, we might expect that we could select the first hash value by using another 1-foot value as key. But surprisingly, we get `nil`. When we explicitly ask the hash if it has a one-foot quantity as a key, it says no.

This peculiar behavior in the context of a hash has consequences that extend beyond hashes. If we require the `set` library and instantiate a `Set` containing objects `one` and `two`, then ask the set if it contains a 1-foot quantity, it claims it does not.

``````require "./feet"

one = Feet.new(1)
two = Feet.new(2)

h = { one => "one", two => "two" }
h[Feet.new(1)]                  # => nil
h.key?(Feet.new(1))             # => false

require "set"
s = Set.new([one, two])
s                               # => #<Set: {#<Feet:0x00000002435da8 @magnitude=1.0>, #<Feet:0x00000002435d80 @magnitude=2.0>}>
s.include?(Feet.new(1))         # => false
``````

In Episode 203, we explored how Ruby Hashes use an object's hash value to speed up retrieval. This gives us a clue about where to start looking for the source of this anomalous behavior. We surmise that we should examine the hash values of our `Feet` objects in order to see why hashes aren't finding them.

``````require "./feet"

one = Feet.new(1).hash          # => -3294382760781416056
two = Feet.new(2).hash          # => 24453523860432852
other_one = Feet.new(1).hash    # => 1047400875287194416
``````

When we do this, we see that the hash value is different for two `Feet` objects with the same magnitude. This is because just like with equivalence, Ruby bases hash values on an object's identity, rather than on its state.

Fortunately this is easy to fix. We can define our own `#hash` method. This method constructs an `Array`. The first element of the array is the magnitude. The second element is the object's class. We then return the hash value of this array.

Here we are taking advantage of a useful feature of Ruby's arrays: they construct their hash values by using an algorithm to combine together the hash values of each element. By using the hash value of the magnitude, we ensure that hash values of two `Feet` objects with the same magnitude will always be the same within a given run of Ruby. By mixing in the hash value of the class as well, we also ensure that the hash value is likely to be different from the hash values of other objects, such as raw numbers that happen to have the same value as the magnitude.

Because of the existence of this array-hashing shortcut, it is very rare to see ruby code that manually generates a hash value from scratch. It is usually sufficient to put all of the attributes which make an object unique into an array, and then take the hash value of the array.

``````class Feet

def initialize(magnitude)
@magnitude = magnitude.to_f
freeze
end

def to_s
"#{@magnitude} feet"
end

def +(other)
Feet.new(@magnitude + other.magnitude)
end

def ==(other)
other.is_a?(Feet) && magnitude == other.magnitude
end

def hash
[magnitude, Feet].hash
end
end
``````

When we experiment with this new implementation of `#hash`, we can see that now two different `Feet` objects with the same magnitude have the same hash value, whereas an object with a different magnitude has a different hash value.

``````require "./feet2"

Feet.new(1).hash                # => 115982590576536047
Feet.new(1).hash                # => 115982590576536047
Feet.new(2).hash                # => 2182538387798779408
``````

You might think that we are done now. Unfortunately, things aren't quite this simple. Because once Ruby finds the right bucket to look in based on hash value, it still needs to compare the keys it finds there for equality with the search key. And here's the big surprise: Ruby doesn't use the double-equals equivalence operator to do this. It uses a different method which is specifically intended for hash equality, spelled `eql?`.

Let's add `#eql?` to our listing of equality comparisons. We can see that unlike equivalence, the hash-equality operator sees two different `Feet` objects with the same magnitude as being unequal.

``````require "./feet2"

one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)

one.equal?(one)                 # => true
one.equal?(two)                 # => false
one.equal?(other_one)           # => false

one == one                      # => true
one == two                      # => false
one == other_one                # => true

one === one                      # => true
one === two                      # => false
one === other_one                # => true

one.eql?(one)                   # => true
one.eql?(two)                   # => false
one.eql?(other_one)             # => false
``````

Why is this? You can probably guess the answer by now. By default, just like the default equivalence operator and `#hash` method, Ruby objects use object identity alone to determine hash-equality.

This is not what we want. We want hash-equality to be determined the exact same way as equivalence. The conventional way to do this is to simply alias the equality and hash-equivalence methods.

``````class Feet

def initialize(magnitude)
@magnitude = magnitude.to_f
freeze
end

def to_s
"#{@magnitude} feet"
end

def +(other)
Feet.new(@magnitude + other.magnitude)
end

def ==(other)
other.is_a?(Feet) && magnitude == other.magnitude
end

def hash
[magnitude, Feet].hash
end

alias_method :eql?, :==
end
``````

After making this change, we recalculate our table of comparisons.

``````require "./feet3"

one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)

one.equal?(one)                 # => true
one.equal?(two)                 # => false
one.equal?(other_one)           # => false

one == one                      # => true
one == two                      # => false
one == other_one                # => true

one === one                      # => true
one === two                      # => false
one === other_one                # => true

one.eql?(one)                   # => true
one.eql?(two)                   # => false
one.eql?(other_one)             # => true
``````

This time, we get the answer we want from the hash-equality operator.

And now, finally, we are able to use `Feet` quantities as keys in Hashes and as entries in sets, and get the behavior that we expect when looking up entries by value.

``````require "./feet3"

one = Feet.new(1)
two = Feet.new(2)

h = { one => "one", two => "two" }
h[Feet.new(1)]                  # => "one"
h.key?(Feet.new(1))             # => true

require "set"
s = Set.new([one, two])
s                               # => #<Set: {#<Feet:0x00000001f2d900 @magnitude=1.0>, #<Feet:0x00000001f2d8d8 @magnitude=2.0>}>
s.include?(Feet.new(1))         # => true
``````

At this point, you are probably wondering, "why on earth would Ruby use a separate equality method just for hash keys?" Here's the explanation.

For the most part, Ruby keeps different types quite distinct from each other. There is very little automatic conversion of types the way there is in many other languages.

But Ruby occasionally fudges on this principle. Here's one example: integers and floating-point numbers are two different types of object in Ruby. However, we can compare the integer 0 and the floating point 0.0 and Ruby will report that they are equivalent.

``````0.class                         # => Fixnum
0.0.class                       # => Float

0 == 0.0                        # => true
``````

This breakdown of Ruby's normally strict type distinctions presumably exists because for most programmers it's probably less surprising than the alternative.

This may be fine for explicit comparisons. But as objects of different classes, `0` and `0.0` still have different hash values. So it would be inconsistent for them to be regarded as the same value when used as hash keys.

``````0.hash                          # => 1379976726146794434
0.0.hash                        # => 3968175065238346606
``````

We can put two different entries in a hash, one under key `0` and the other under key `0.0`, and they will be held as distinct by the hash.

``````{ 0 => "int", 0.0 => "float" }
# => {0=>"int", 0.0=>"float"}
``````

This is a case where equivalency and hash equality mean two different things. And thus, Ruby has two different methods for these occasionally differing types of equality.

All of these types of equality may seem bewildering. Fortunately, most of the time we can follow some simple rules of thumb.

First of all, for many objects, like the `User` example we saw earlier, identity is important. In these cases, Ruby's default equality semantics are usually fine.

On the other hand, occasionally we come across objects where we don't care about their identity. All that matters, as far as equivalence is concerned, is their state, otherwise known as their value. There is a pattern name for objects like this: Value Objects.

`Feet` is an example of a Value Object class. In order to make value objects behave in predictable ways, we need to follow a few simple rules.

First, we need to define an equivalency operator that uses the object's state, rather than its identity. This operator should first check the class of the other object before checking its value.

Second, we need to define a custom `#hash` method which uses the object's state to generate a `hash` value.

Third, we need to alias the hash equality operator to the equivalency operator.

So long as the objects are immutable, something we covered in a previous episode, following these three rules should be sufficient to give us value objects that behave in unsurprising ways.

Happy hacking!