In Progress
Unit 1, Lesson 21
In Progress

Hash Equality

Video transcript & code

Let's recap some recent episodes. We've built a class, called Feet, that represents a measurement in feet. We have made it immutable, like Ruby's core numeric types.

class Feet
  attr_reader :magnitude

  def initialize(magnitude)
    @magnitude = magnitude.to_f
    freeze
  end

  def to_s
    "#{@magnitude} feet"
  end

  def +(other)
    Feet.new(@magnitude + other.magnitude)
  end

  def ==(other)
    other.is_a?(Feet) && magnitude == other.magnitude
  end
end

We have also ensured that it determines equivalence in terms of state, rather than in terms of object identity. Let's just review that point briefly.

Given an instance which represents a quantity of 1, another which represents a quantity of 2, and another, separate object that also represents a quantity of 1, we can observe the following:

  • An Feet object is identity-equal only to itself. Even objects with the same magnitude have different identities and so are not identity-equal.
  • A Feet object is equivalent to itself. It is not equivalent to objects with another value, nor is it equivalent to objects of an unrelated class. However, it is equivalent to other Feet objects with the same magnitude.
  • Case-equality for Feet objects obeys the same rules as value-equality.
  • As a side effect of the equivalence behavior, it is possible to check for the presence of a given quantity of feet in an array by value, rather than by identity.
require "./feet"

one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)

one.equal?(one)                 # => true
one.equal?(two)                 # => false
one.equal?(other_one)           # => false  

one == one                      # => true
one == two                      # => false
one == other_one                # => true

one === one                      # => true
one === two                      # => false
one === other_one                # => true

[Feet.new(1)].include?(Feet.new(1)) # => true

Now let's take a look at how Feet objects behave when used as keys in a Hash. We'll construct a simple hash with one value keyed on our one-foot object, and another keyed on the two-feet object.

Given what we know so far, we might expect that we could select the first hash value by using another 1-foot value as key. But surprisingly, we get nil. When we explicitly ask the hash if it has a one-foot quantity as a key, it says no.

This peculiar behavior in the context of a hash has consequences that extend beyond hashes. If we require the set library and instantiate a Set containing objects one and two, then ask the set if it contains a 1-foot quantity, it claims it does not.

require "./feet"

one = Feet.new(1)
two = Feet.new(2)

h = { one => "one", two => "two" }
h[Feet.new(1)]                  # => nil
h.key?(Feet.new(1))             # => false

require "set"
s = Set.new([one, two])
s                               # => #<Set: {#<Feet:0x00000002435da8 @magnitude=1.0>, #<Feet:0x00000002435d80 @magnitude=2.0>}>
s.include?(Feet.new(1))         # => false

In Episode 203, we explored how Ruby Hashes use an object's hash value to speed up retrieval. This gives us a clue about where to start looking for the source of this anomalous behavior. We surmise that we should examine the hash values of our Feet objects in order to see why hashes aren't finding them.

require "./feet"

one = Feet.new(1).hash          # => -3294382760781416056
two = Feet.new(2).hash          # => 24453523860432852 
other_one = Feet.new(1).hash    # => 1047400875287194416

When we do this, we see that the hash value is different for two Feet objects with the same magnitude. This is because just like with equivalence, Ruby bases hash values on an object's identity, rather than on its state.

Fortunately this is easy to fix. We can define our own #hash method. This method constructs an Array. The first element of the array is the magnitude. The second element is the object's class. We then return the hash value of this array.

Here we are taking advantage of a useful feature of Ruby's arrays: they construct their hash values by using an algorithm to combine together the hash values of each element. By using the hash value of the magnitude, we ensure that hash values of two Feet objects with the same magnitude will always be the same within a given run of Ruby. By mixing in the hash value of the class as well, we also ensure that the hash value is likely to be different from the hash values of other objects, such as raw numbers that happen to have the same value as the magnitude.

Because of the existence of this array-hashing shortcut, it is very rare to see ruby code that manually generates a hash value from scratch. It is usually sufficient to put all of the attributes which make an object unique into an array, and then take the hash value of the array.

class Feet
  attr_reader :magnitude

  def initialize(magnitude)
    @magnitude = magnitude.to_f
    freeze
  end

  def to_s
    "#{@magnitude} feet"
  end

  def +(other)
    Feet.new(@magnitude + other.magnitude)
  end

  def ==(other)
    other.is_a?(Feet) && magnitude == other.magnitude
  end

  def hash
    [magnitude, Feet].hash
  end
end

When we experiment with this new implementation of #hash, we can see that now two different Feet objects with the same magnitude have the same hash value, whereas an object with a different magnitude has a different hash value.

require "./feet2"

Feet.new(1).hash                # => 115982590576536047
Feet.new(1).hash                # => 115982590576536047
Feet.new(2).hash                # => 2182538387798779408

You might think that we are done now. Unfortunately, things aren't quite this simple. Because once Ruby finds the right bucket to look in based on hash value, it still needs to compare the keys it finds there for equality with the search key. And here's the big surprise: Ruby doesn't use the double-equals equivalence operator to do this. It uses a different method which is specifically intended for hash equality, spelled eql?.

Let's add #eql? to our listing of equality comparisons. We can see that unlike equivalence, the hash-equality operator sees two different Feet objects with the same magnitude as being unequal.

require "./feet2"

one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)

one.equal?(one)                 # => true
one.equal?(two)                 # => false
one.equal?(other_one)           # => false

one == one                      # => true
one == two                      # => false
one == other_one                # => true

one === one                      # => true
one === two                      # => false
one === other_one                # => true

one.eql?(one)                   # => true
one.eql?(two)                   # => false
one.eql?(other_one)             # => false

Why is this? You can probably guess the answer by now. By default, just like the default equivalence operator and #hash method, Ruby objects use object identity alone to determine hash-equality.

This is not what we want. We want hash-equality to be determined the exact same way as equivalence. The conventional way to do this is to simply alias the equality and hash-equivalence methods.

class Feet
  attr_reader :magnitude

  def initialize(magnitude)
    @magnitude = magnitude.to_f
    freeze
  end

  def to_s
    "#{@magnitude} feet"
  end

  def +(other)
    Feet.new(@magnitude + other.magnitude)
  end

  def ==(other)
    other.is_a?(Feet) && magnitude == other.magnitude
  end

  def hash
    [magnitude, Feet].hash
  end

  alias_method :eql?, :==
end

After making this change, we recalculate our table of comparisons.

require "./feet3"

one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)

one.equal?(one)                 # => true
one.equal?(two)                 # => false
one.equal?(other_one)           # => false

one == one                      # => true
one == two                      # => false
one == other_one                # => true

one === one                      # => true
one === two                      # => false
one === other_one                # => true

one.eql?(one)                   # => true
one.eql?(two)                   # => false
one.eql?(other_one)             # => true

This time, we get the answer we want from the hash-equality operator.

And now, finally, we are able to use Feet quantities as keys in Hashes and as entries in sets, and get the behavior that we expect when looking up entries by value.

require "./feet3"

one = Feet.new(1)
two = Feet.new(2)

h = { one => "one", two => "two" }
h[Feet.new(1)]                  # => "one"
h.key?(Feet.new(1))             # => true

require "set"
s = Set.new([one, two])
s                               # => #<Set: {#<Feet:0x00000001f2d900 @magnitude=1.0>, #<Feet:0x00000001f2d8d8 @magnitude=2.0>}>
s.include?(Feet.new(1))         # => true

At this point, you are probably wondering, "why on earth would Ruby use a separate equality method just for hash keys?" Here's the explanation.

For the most part, Ruby keeps different types quite distinct from each other. There is very little automatic conversion of types the way there is in many other languages.

But Ruby occasionally fudges on this principle. Here's one example: integers and floating-point numbers are two different types of object in Ruby. However, we can compare the integer 0 and the floating point 0.0 and Ruby will report that they are equivalent.

0.class                         # => Fixnum
0.0.class                       # => Float

0 == 0.0                        # => true

This breakdown of Ruby's normally strict type distinctions presumably exists because for most programmers it's probably less surprising than the alternative.

This may be fine for explicit comparisons. But as objects of different classes, 0 and 0.0 still have different hash values. So it would be inconsistent for them to be regarded as the same value when used as hash keys.

0.hash                          # => 1379976726146794434
0.0.hash                        # => 3968175065238346606

We can put two different entries in a hash, one under key 0 and the other under key 0.0, and they will be held as distinct by the hash.

{ 0 => "int", 0.0 => "float" }
# => {0=>"int", 0.0=>"float"}

This is a case where equivalency and hash equality mean two different things. And thus, Ruby has two different methods for these occasionally differing types of equality.

All of these types of equality may seem bewildering. Fortunately, most of the time we can follow some simple rules of thumb.

First of all, for many objects, like the User example we saw earlier, identity is important. In these cases, Ruby's default equality semantics are usually fine.

On the other hand, occasionally we come across objects where we don't care about their identity. All that matters, as far as equivalence is concerned, is their state, otherwise known as their value. There is a pattern name for objects like this: Value Objects.

Feet is an example of a Value Object class. In order to make value objects behave in predictable ways, we need to follow a few simple rules.

First, we need to define an equivalency operator that uses the object's state, rather than its identity. This operator should first check the class of the other object before checking its value.

Second, we need to define a custom #hash method which uses the object's state to generate a hash value.

Third, we need to alias the hash equality operator to the equivalency operator.

So long as the objects are immutable, something we covered in a previous episode, following these three rules should be sufficient to give us value objects that behave in unsurprising ways.

Happy hacking!

Responses