Hash Equality
Video transcript & code
Let's recap some recent episodes. We've built a class, called Feet
, that represents a measurement in feet. We have made it immutable, like Ruby's core numeric types.
class Feet
attr_reader :magnitude
def initialize(magnitude)
@magnitude = magnitude.to_f
freeze
end
def to_s
"#{@magnitude} feet"
end
def +(other)
Feet.new(@magnitude + other.magnitude)
end
def ==(other)
other.is_a?(Feet) && magnitude == other.magnitude
end
end
We have also ensured that it determines equivalence in terms of state, rather than in terms of object identity. Let's just review that point briefly.
Given an instance which represents a quantity of 1, another which represents a quantity of 2, and another, separate object that also represents a quantity of 1, we can observe the following:
- An
Feet
object is identity-equal only to itself. Even objects with the same magnitude have different identities and so are not identity-equal. - A
Feet
object is equivalent to itself. It is not equivalent to objects with another value, nor is it equivalent to objects of an unrelated class. However, it is equivalent to otherFeet
objects with the same magnitude. - Case-equality for
Feet
objects obeys the same rules as value-equality. - As a side effect of the equivalence behavior, it is possible to check for the presence of a given quantity of feet in an array by value, rather than by identity.
require "./feet"
one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)
one.equal?(one) # => true
one.equal?(two) # => false
one.equal?(other_one) # => false
one == one # => true
one == two # => false
one == other_one # => true
one === one # => true
one === two # => false
one === other_one # => true
[Feet.new(1)].include?(Feet.new(1)) # => true
Now let's take a look at how Feet
objects behave when used as keys in a Hash. We'll construct a simple hash with one value keyed on our one-foot object, and another keyed on the two-feet object.
Given what we know so far, we might expect that we could select the first hash value by using another 1-foot value as key. But surprisingly, we get nil
. When we explicitly ask the hash if it has a one-foot quantity as a key, it says no.
This peculiar behavior in the context of a hash has consequences that extend beyond hashes. If we require the set
library and instantiate a Set
containing objects one
and two
, then ask the set if it contains a 1-foot quantity, it claims it does not.
require "./feet"
one = Feet.new(1)
two = Feet.new(2)
h = { one => "one", two => "two" }
h[Feet.new(1)] # => nil
h.key?(Feet.new(1)) # => false
require "set"
s = Set.new([one, two])
s # => #<Set: {#<Feet:0x00000002435da8 @magnitude=1.0>, #<Feet:0x00000002435d80 @magnitude=2.0>}>
s.include?(Feet.new(1)) # => false
In Episode 203, we explored how Ruby Hashes use an object's hash value to speed up retrieval. This gives us a clue about where to start looking for the source of this anomalous behavior. We surmise that we should examine the hash values of our Feet
objects in order to see why hashes aren't finding them.
require "./feet"
one = Feet.new(1).hash # => -3294382760781416056
two = Feet.new(2).hash # => 24453523860432852
other_one = Feet.new(1).hash # => 1047400875287194416
When we do this, we see that the hash value is different for two Feet
objects with the same magnitude. This is because just like with equivalence, Ruby bases hash values on an object's identity, rather than on its state.
Fortunately this is easy to fix. We can define our own #hash
method. This method constructs an Array
. The first element of the array is the magnitude. The second element is the object's class. We then return the hash value of this array.
Here we are taking advantage of a useful feature of Ruby's arrays: they construct their hash values by using an algorithm to combine together the hash values of each element. By using the hash value of the magnitude, we ensure that hash values of two Feet
objects with the same magnitude will always be the same within a given run of Ruby. By mixing in the hash value of the class as well, we also ensure that the hash value is likely to be different from the hash values of other objects, such as raw numbers that happen to have the same value as the magnitude.
Because of the existence of this array-hashing shortcut, it is very rare to see ruby code that manually generates a hash value from scratch. It is usually sufficient to put all of the attributes which make an object unique into an array, and then take the hash value of the array.
class Feet
attr_reader :magnitude
def initialize(magnitude)
@magnitude = magnitude.to_f
freeze
end
def to_s
"#{@magnitude} feet"
end
def +(other)
Feet.new(@magnitude + other.magnitude)
end
def ==(other)
other.is_a?(Feet) && magnitude == other.magnitude
end
def hash
[magnitude, Feet].hash
end
end
When we experiment with this new implementation of #hash
, we can see that now two different Feet
objects with the same magnitude have the same hash value, whereas an object with a different magnitude has a different hash value.
require "./feet2"
Feet.new(1).hash # => 115982590576536047
Feet.new(1).hash # => 115982590576536047
Feet.new(2).hash # => 2182538387798779408
You might think that we are done now. Unfortunately, things aren't quite this simple. Because once Ruby finds the right bucket to look in based on hash value, it still needs to compare the keys it finds there for equality with the search key. And here's the big surprise: Ruby doesn't use the double-equals equivalence operator to do this. It uses a different method which is specifically intended for hash equality, spelled eql?
.
Let's add #eql?
to our listing of equality comparisons. We can see that unlike equivalence, the hash-equality operator sees two different Feet
objects with the same magnitude as being unequal.
require "./feet2"
one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)
one.equal?(one) # => true
one.equal?(two) # => false
one.equal?(other_one) # => false
one == one # => true
one == two # => false
one == other_one # => true
one === one # => true
one === two # => false
one === other_one # => true
one.eql?(one) # => true
one.eql?(two) # => false
one.eql?(other_one) # => false
Why is this? You can probably guess the answer by now. By default, just like the default equivalence operator and #hash
method, Ruby objects use object identity alone to determine hash-equality.
This is not what we want. We want hash-equality to be determined the exact same way as equivalence. The conventional way to do this is to simply alias the equality and hash-equivalence methods.
class Feet
attr_reader :magnitude
def initialize(magnitude)
@magnitude = magnitude.to_f
freeze
end
def to_s
"#{@magnitude} feet"
end
def +(other)
Feet.new(@magnitude + other.magnitude)
end
def ==(other)
other.is_a?(Feet) && magnitude == other.magnitude
end
def hash
[magnitude, Feet].hash
end
alias_method :eql?, :==
end
After making this change, we recalculate our table of comparisons.
require "./feet3"
one = Feet.new(1)
two = Feet.new(2)
other_one = Feet.new(1)
one.equal?(one) # => true
one.equal?(two) # => false
one.equal?(other_one) # => false
one == one # => true
one == two # => false
one == other_one # => true
one === one # => true
one === two # => false
one === other_one # => true
one.eql?(one) # => true
one.eql?(two) # => false
one.eql?(other_one) # => true
This time, we get the answer we want from the hash-equality operator.
And now, finally, we are able to use Feet
quantities as keys in Hashes and as entries in sets, and get the behavior that we expect when looking up entries by value.
require "./feet3"
one = Feet.new(1)
two = Feet.new(2)
h = { one => "one", two => "two" }
h[Feet.new(1)] # => "one"
h.key?(Feet.new(1)) # => true
require "set"
s = Set.new([one, two])
s # => #<Set: {#<Feet:0x00000001f2d900 @magnitude=1.0>, #<Feet:0x00000001f2d8d8 @magnitude=2.0>}>
s.include?(Feet.new(1)) # => true
At this point, you are probably wondering, "why on earth would Ruby use a separate equality method just for hash keys?" Here's the explanation.
For the most part, Ruby keeps different types quite distinct from each other. There is very little automatic conversion of types the way there is in many other languages.
But Ruby occasionally fudges on this principle. Here's one example: integers and floating-point numbers are two different types of object in Ruby. However, we can compare the integer 0 and the floating point 0.0 and Ruby will report that they are equivalent.
0.class # => Fixnum
0.0.class # => Float
0 == 0.0 # => true
This breakdown of Ruby's normally strict type distinctions presumably exists because for most programmers it's probably less surprising than the alternative.
This may be fine for explicit comparisons. But as objects of different classes, 0
and 0.0
still have different hash values. So it would be inconsistent for them to be regarded as the same value when used as hash keys.
0.hash # => 1379976726146794434
0.0.hash # => 3968175065238346606
We can put two different entries in a hash, one under key 0
and the other under key 0.0
, and they will be held as distinct by the hash.
{ 0 => "int", 0.0 => "float" }
# => {0=>"int", 0.0=>"float"}
This is a case where equivalency and hash equality mean two different things. And thus, Ruby has two different methods for these occasionally differing types of equality.
All of these types of equality may seem bewildering. Fortunately, most of the time we can follow some simple rules of thumb.
First of all, for many objects, like the User
example we saw earlier, identity is important. In these cases, Ruby's default equality semantics are usually fine.
On the other hand, occasionally we come across objects where we don't care about their identity. All that matters, as far as equivalence is concerned, is their state, otherwise known as their value. There is a pattern name for objects like this: Value Objects.
Feet
is an example of a Value Object class. In order to make value objects behave in predictable ways, we need to follow a few simple rules.
First, we need to define an equivalency operator that uses the object's state, rather than its identity. This operator should first check the class of the other object before checking its value.
Second, we need to define a custom #hash
method which uses the object's state to generate a hash
value.
Third, we need to alias the hash equality operator to the equivalency operator.
So long as the objects are immutable, something we covered in a previous episode, following these three rules should be sufficient to give us value objects that behave in unsurprising ways.
Responses