Itself
Video transcript & code
If you are lucky enough to be using Ruby 2.2, there are several new features available to you. Among these is a new method on objects called "itself".
At first glance, itself
may seem like the least useful method ever added to Ruby. If you send the #itself
message to a Ruby object, it returns the same object. That's it. No really, that's all it does.
The method is defined on Kernel
, so it is available on pretty much any object other than BasicObject
.
123.itself # => 123
"foo" # => "foo"
o = Object.new # => #<Object:0x007f4fe7a5ba98>
o.itself # => #<Object:0x007f4fe7a5ba98>
BasicObject.new.itself # => NoMethodError: undefined method `itself' for #<BasicObject:0x007f4fe7a5b6b0>
# ~> NoMethodError
# ~> undefined method `itself' for #<BasicObject:0x007f4fe7a5b6b0>
# ~>
# ~> xmptmp-in254798vt.rb:6:in `<main>'
If you think this functionality sounds a bit less than earthshaking, you could easily be forgiven. However, this method isn't as useless as it might first appear.
Here's a convenience method called #top
. The job of #top
is to take a collection of objects, sort it, and return a list of the top 10 objects. For flexibility, it takes several keyword arguments. The first, on
, controls what attribute of the object will be sorted on. The second, by
, controls the predicate that will be used to order the sort. It defaults to the lesser-than operator. Finally, the as
keyword determines which aspect of the object will be returned in the final list.
def top(things, on:, by: :<, as: nil)
things.sort{|x, y|
x_key = x.public_send(on)
y_key = y.public_send(on)
if x_key.public_send(by, y_key)
-1
elsif y_key.public_send(by, x_key)
1
else
0
end
}.last(10).map{|o| as ? o.public_send(as) : o}
end
This will all be more clear if we try it out. Let's get a list of files. Then let's use top
to find the top 10 files in terms of file size.
We can flip this around and find the smallest files by specifying the grater-than relationship instead of lesser-than.
Or, we can sort by a different file attribute entirely, such as the last time the file was changed.
So far, the top
method has been returning arrays of Pathname
objects. We can have it return strings instead, by specifying as: :to_s
to set the attribute which should represent the objects in the output.
require "./top"
require "pathname"
files = Pathname.glob("/home/avdi/Dropbox/rubytapas/**/*.mp4")
top files, on: :size
# => [#<Pathname:/home/avdi/Dropbox/rubytapas/120-outside-in/screencast-20130630-2000.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-14_16.28.41.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/116-extract-command-object/screencast-20130612-1706.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/278-lazy/media/screen-capture-2015-01-14_15.25.00.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/113-p/screencast-20130604-1113.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-17_04.49.32.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/279-audited-predicate/media/footage/screen-capture-2015-01-17_04.49.32.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-08_15.46.15.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/276-fattr/media/screen-capture-2015-01-08_15.46.15.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/095-gem-love-6/screencast-20130409-1751.mp4>]
top files, on: :size, by: :>
# => [#<Pathname:/home/avdi/Dropbox/rubytapas/196-string-templates/screencast-20140319-1604.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/189-assisted-refactoring/slides.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/203-hash-table/slides.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/205-comparable/slides.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/218-spaceship-revisited/screencast-20140604-1811.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/235-load/screencast-20140807-1701.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/233-flip-flop/screencast-20140801-2156_5951.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/RubyTapas Slides.potx.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/235-load/screencast-vlc.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/127-parallel-fib/screencast-20130716-1658.mp4>]
top files, on: :ctime
# => [#<Pathname:/home/avdi/Dropbox/rubytapas/278-lazy/media/screen-capture-2015-01-14_15.25.00.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-14_16.27.55.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-14_16.28.41.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/278-lazy/Presentation1.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/278-lazy/278-lazy.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-16_18.38.06.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/279-audited-predicate/media/footage/screen-capture-2015-01-16_18.38.06.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-17_04.49.32.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/279-audited-predicate/media/footage/screen-capture-2015-01-17_04.49.32.mp4>,
# #<Pathname:/home/avdi/Dropbox/rubytapas/279-audited-predicate/279-audited-predicate.mp4>]
top files, on: :size, as: :to_s
# => ["/home/avdi/Dropbox/rubytapas/120-outside-in/screencast-20130630-2000.mp4",
# "/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-14_16.28.41.mp4",
# "/home/avdi/Dropbox/rubytapas/116-extract-command-object/screencast-20130612-1706.mp4",
# "/home/avdi/Dropbox/rubytapas/278-lazy/media/screen-capture-2015-01-14_15.25.00.mp4",
# "/home/avdi/Dropbox/rubytapas/113-p/screencast-20130604-1113.mp4",
# "/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-17_04.49.32.mp4",
# "/home/avdi/Dropbox/rubytapas/279-audited-predicate/media/footage/screen-capture-2015-01-17_04.49.32.mp4",
# "/home/avdi/Dropbox/rubytapas/footage/screen-capture-2015-01-08_15.46.15.mp4",
# "/home/avdi/Dropbox/rubytapas/276-fattr/media/screen-capture-2015-01-08_15.46.15.mp4",
# "/home/avdi/Dropbox/rubytapas/095-gem-love-6/screencast-20130409-1751.mp4"]
Now let's try using top
on a different kind of collection. We'll read in the system dictionary, and find the top 10 longest words by sorting on the size
attribute.
require "./top"
words = IO.readlines("/usr/share/dict/words")
top words, on: :size
# => ["uncharacteristically\n",
# "counterintelligence's\n",
# "electroencephalograms\n",
# "electroencephalograph\n",
# "Andrianampoinimerina's\n",
# "counterrevolutionaries\n",
# "counterrevolutionary's\n",
# "electroencephalogram's\n",
# "electroencephalographs\n",
# "electroencephalograph's\n"]
Next we decide we want to see if we can use this same method to find the top 10 earliest words when sorted lexicographically. This means that instead of being applied to some attribute of the strings, the lesser-than comparison needs to be applied to the strings themselves.
This presents us with a bit of a conundrum. If we look back at the source of top
, we can see that it always expects an on
argument. And this argument should be a message which will be sent to each element in order to discover the sort key attribute.
This is the "pluggable selector" pattern, which we talked about way back in Episode #19.
Suddenly, our top
method doesn't feel so flexible. But fortunately, we have a solution. For the on
keyword, we can pass in the :itself
message. When sent to the items in the collection, this message will result in getting the same item back, and our lexicographic sort will proceed successfully.
require "./top"
words = IO.readlines("/usr/share/dict/words")
top words, on: :itself
# => ["élan's\n",
# "émigré\n",
# "émigré's\n",
# "émigrés\n",
# "épée\n",
# "épée's\n",
# "épées\n",
# "étude\n",
# "étude's\n",
# "études\n"]
As we look at this, we realize that this would make for a pretty good default value for the on
argument. So we go ahead and update the top
method.
def top(things, on: :itself, by: :<, as: nil)
things.sort{|x, y|
x_key = x.public_send(on)
y_key = y.public_send(on)
if x_key.public_send(by, y_key)
-1
elsif y_key.public_send(by, x_key)
1
else
0
end
}.last(10).map{|o| as ? o.public_send(as) : o}
end
Now we can find the top 10 words lexicographically without any arguments at all.
require "./top2"
words = IO.readlines("/usr/share/dict/words")
top words
# => ["élan's\n",
# "émigré\n",
# "émigré's\n",
# "émigrés\n",
# "épée\n",
# "épée's\n",
# "épées\n",
# "étude\n",
# "étude's\n",
# "études\n"]
Having made this change, we next realize that there we can use the same technique to simplify another part of the top
code. Presently, the as
parameter defaults to nil
. The code then has to test the parameter to see if something other than nil has been supplied by the caller. If so, it is used as a transform on the returned items. Otherwise, the original items are used.
as ? o.public_send(as) : o
By making :itself
the default for the as
parameter, we can get rid of this conditional, and just apply the as
value as a transformer in every case. If no special value is given, the default value of :itself
will result in the original item being returned.
def top(things, on: :itself, by: :<, as: :itself)
things.sort{|x, y|
x_key = x.public_send(on)
y_key = y.public_send(on)
if x_key.public_send(by, y_key)
-1
elsif y_key.public_send(by, x_key)
1
else
0
end
}.last(10).map(&as)
end
If we run it again, our lexicographic sort still works. By using the #itself
message, we have preserved the original semantics, but with simpler code.
require "./top3"
words = IO.readlines("/usr/share/dict/words")
top words
# => ["élan's\n",
# "émigré\n",
# "émigré's\n",
# "émigrés\n",
# "épée\n",
# "épée's\n",
# "épées\n",
# "étude\n",
# "étude's\n",
# "études\n"]
If you have any familiarity with functional programming, you may recognize the pattern we've used today. In functional languages, #itself
is usually known as the identity
function: a function that returns its argument unaltered. It's a common feature in generic code, where it is often used as an argument to functions which take other functions as transformers for data.
Of course, there is nothing special about the implementation of #itself
. If it were missing from the language, we could easily supply it ourselves. Apart from the fact that this is probably less efficient than the native version, since it is implemented in Ruby instead of in raw C code, it is identical.
class Object
def itself
self
end
end
But the inclusion of the #itself
method into core Ruby means that if we are targeting Ruby versions 2.2 or newer, we can write code with the assumption that #itself
is available. And this means we can use it to simplify methods anywhere that selectors are pluggable.
Responses