In Progress
Unit 1, Lesson 1
In Progress

Array Set Operations

Video transcript & code

The pantry is looking a little bare, and it's time to do some shopping. My wife and I have both noticed, and made our own shopping lists.

avdi_list = %W[milk granola cookies apples]
stacey_list = %W[milk bananas eggs apples]

Before one of us goes to the grocery store, we need to merge the two lists into one. We could do this using the array addition operator, but when we do that we end up with duplicates in the list. To get a list with only unique items, we need to follow the addition with a call to #uniq.

avdi_list = %W[milk granola cookies apples]
stacey_list = %W[milk bananas eggs apples]
shopping_list = avdi_list + stacey_list
shopping_list
# => ["milk",
#     "granola",
#     "cookies",
#     "apples",
#     "milk",
#     "bananas",
#     "eggs",
#     "apples"]
shopping_list.uniq!
shopping_list
# => ["milk", "granola", "cookies", "apples", "bananas", "eggs"]

(As an aside, this is a case where I really think it would have been OK to break with UNIX tradition and keep those last two letters in the word #uniq. Oh well…)

shopping_list.unique!

I don't like the two-step nature of this array merge. If we have to remember the two steps every time we join two shopping lists together, that's just tempting fate. Sooner or later, we'll remember the addition but forget the call to #uniq! and introduce a bug. And anyway, this operation feels like it should be a single semantic whole.

Fortunately, Ruby agrees with this assessment. Instead of the + operator, we can use the | operator to get the union of two arrays.

avdi_list | stacey_list
# => ["milk", "granola", "cookies", "apples", "bananas", "eggs"]

As we can see, the result is a combination of both lists, with duplicates removed.

The union operator is an example of one of Ruby's Array set operations. A set, in computer science, is an unordered collection of unique items. Ruby Arrays are not sets, because they permit duplicates, but Ruby provides some methods that allow us to treat them like sets.

Here are some more examples. We can see just the items that are common to both shopping lists using the &, or set intersection operator.

avdi_list & stacey_list
# => ["milk", "apples"]

What if we want to see just the difference between the two lists? We can do that with the - operator, which performs the set complement operation:

avdi_list = %W[milk granola cookies apples]
stacey_list = %W[milk bananas eggs apples]
avdi_list - stacey_list # => ["granola", "cookies"]
stacey_list - avdi_list # => ["bananas", "eggs"]

One of my favorite uses of this operator is discovering interface subsets for Ruby classes. Let's say we want to know what methods Ruby's HTTP URI class responds to. By default, when we send it the #instance_methods message, we get a list of all of its methods including various core methods inherited from Object. But let's say we want to see just the URI-specific methods. We can pass false to #instance_methods, but then we get a surprise: just one method. That doesn't seem right.

The reason is that the HTTP URI class inherits most of its methods from URI::Generic. How can we see every method an HTTP URI supports except the ones it shares with all other objects?

The answer is to use the Array set complement operator to "substract" all Object methods from the complete list of URI methods.

require 'uri'

URI::HTTP.instance_methods
# => [:request_uri,
#     :default_port,
#     :scheme,
#     :host,
#     :port,
#     :registry,
#     :path,
#     :query,
#     :opaque,
#     :fragment,
#     :parser,
#     :component,
#     :set_scheme,
#     :scheme=,
#     :userinfo=,
#     :user=,
#     :password=,
#     :set_userinfo,
#     :set_user,
#     :set_password,
#     :userinfo,
#     :user,
#     :password,
#     :set_host,
#     :host=,
#     :hostname,
#     :hostname=,
#     :set_port,
#     :port=,
#     :set_registry,
#     :registry=,
#     :set_path,
#     :path=,
#     :set_query,
#     :query=,
#     :set_opaque,
#     :opaque=,
#     :set_fragment,
#     :fragment=,
#     :hierarchical?,
#     :absolute?,
#     :absolute,
#     :relative?,
#     :merge!,
#     :merge,
#     :+,
#     :route_from,
#     :-,
#     :route_to,
#     :normalize,
#     :normalize!,
#     :to_s,
#     :==,
#     :hash,
#     :eql?,
#     :component_ary,
#     :select,
#     :inspect,
#     :coerce,
#     :find_proxy,
#     :pretty_print,
#     :pretty_print_cycle,
#     :pretty_print_instance_variables,
#     :pretty_print_inspect,
#     :nil?,
#     :===,
#     :=~,
#     :!~,
#     :<=>,
#     :class,
#     :singleton_class,
#     :clone,
#     :dup,
#     :taint,
#     :tainted?,
#     :untaint,
#     :untrust,
#     :untrusted?,
#     :trust,
#     :freeze,
#     :frozen?,
#     :methods,
#     :singleton_methods,
#     :protected_methods,
#     :private_methods,
#     :public_methods,
#     :instance_variables,
#     :instance_variable_get,
#     :instance_variable_set,
#     :instance_variable_defined?,
#     :remove_instance_variable,
#     :instance_of?,
#     :kind_of?,
#     :is_a?,
#     :tap,
#     :send,
#     :public_send,
#     :respond_to?,
#     :extend,
#     :display,
#     :method,
#     :public_method,
#     :define_singleton_method,
#     :object_id,
#     :to_enum,
#     :enum_for,
#     :pretty_inspect,
#     :equal?,
#     :!,
#     :!=,
#     :instance_eval,
#     :instance_exec,
#     :__send__,
#     :__id__]
URI::HTTP.instance_methods(false)
# => [:request_uri]
URI::Generic.instance_methods(false)
# => [:default_port,
#     :scheme,
#     :host,
#     :port,
#     :registry,
#     :path,
#     :query,
#     :opaque,
#     :fragment,
#     :parser,
#     :component,
#     :set_scheme,
#     :scheme=,
#     :userinfo=,
#     :user=,
#     :password=,
#     :set_userinfo,
#     :set_user,
#     :set_password,
#     :userinfo,
#     :user,
#     :password,
#     :set_host,
#     :host=,
#     :hostname,
#     :hostname=,
#     :set_port,
#     :port=,
#     :set_registry,
#     :registry=,
#     :set_path,
#     :path=,
#     :set_query,
#     :query=,
#     :set_opaque,
#     :opaque=,
#     :set_fragment,
#     :fragment=,
#     :hierarchical?,
#     :absolute?,
#     :absolute,
#     :relative?,
#     :merge!,
#     :merge,
#     :+,
#     :route_from,
#     :-,
#     :route_to,
#     :normalize,
#     :normalize!,
#     :to_s,
#     :==,
#     :hash,
#     :eql?,
#     :component_ary,
#     :select,
#     :inspect,
#     :coerce,
#     :find_proxy]
URI::HTTP.instance_methods - Object.instance_methods
# => [:request_uri,
#     :default_port,
#     :scheme,
#     :host,
#     :port,
#     :registry,
#     :path,
#     :query,
#     :opaque,
#     :fragment,
#     :parser,
#     :component,
#     :set_scheme,
#     :scheme=,
#     :userinfo=,
#     :user=,
#     :password=,
#     :set_userinfo,
#     :set_user,
#     :set_password,
#     :userinfo,
#     :user,
#     :password,
#     :set_host,
#     :host=,
#     :hostname,
#     :hostname=,
#     :set_port,
#     :port=,
#     :set_registry,
#     :registry=,
#     :set_path,
#     :path=,
#     :set_query,
#     :query=,
#     :set_opaque,
#     :opaque=,
#     :set_fragment,
#     :fragment=,
#     :hierarchical?,
#     :absolute?,
#     :absolute,
#     :relative?,
#     :merge!,
#     :merge,
#     :+,
#     :route_from,
#     :-,
#     :route_to,
#     :normalize,
#     :normalize!,
#     :component_ary,
#     :select,
#     :coerce,
#     :find_proxy]

I find this particularly useful in certain metaprogramming scenarios, which we can talk about another time.

It's worth noting that just like boolean operators, these operations can be combined with an equals sign in order to update variables with the result. For instance, ordinarily when adding a new item to an Array we'd use the shovel operator.

avdi_list = %W[milk granola cookies apples]
avdi_list << "granola"
avdi_list << "ice cream"
avdi_list
# => ["milk", "granola", "cookies", "apples", "granola", "ice cream"]

But this breaks our expectation of uniqueness. A set-friendly alternative is to use the |= operator, and enclose the new item in an array.

avdi_list = %W[milk granola cookies apples]
avdi_list |= ["granola"]
avdi_list |= ["ice cream"]
avdi_list
# => ["milk", "granola", "cookies", "apples", "ice cream"]

This preserves uniqueness. However, be aware that like all "equals" versions of operators, this is in fact replacing the variable with a new array, not updating the array in-place. We can see this if we assign the original array to a second variable before performing the updates. The array referenced by the original_list variable remains unchanged.

avdi_list = %W[milk granola cookies apples]
original_list = avdi_list
avdi_list |= ["granola"]
avdi_list |= ["ice cream"]
avdi_list
# => ["milk", "granola", "cookies", "apples", "ice cream"]
original_list
# => ["milk", "granola", "cookies", "apples"]

These operations let us apply basic set logic to Ruby arrays. But for our programs to behave with proper set semantics, we still have to remember to use the set operations consistently. And when dealing with very large sets, we may find these operations to be less than efficient. For more advanced set capabilities, Ruby also has the set standard library.

But we'll talk about that in another episode. Happy hacking!

Responses