In Progress
Unit 1, Lesson 1
In Progress

Deep Dup

Copying objects can be surprisingly prone to error in Ruby. With the language only providing shallow copying semantics out of the box, copied objects can often share more state than you intended them to. In this episode, you’ll learn a few techniques for reliably making deep copies of complex object trees.

Video transcript & code

In a recent episode we found a way to easily freeze a whole tree of objects, using the ice_nine gem. This helped us ensure that our master shopping list could only be used as a template, and could not be accidentally modified in-place.

We discovered a problem however, when we tried to create a duplicate object and then add an entry to the items array.

Because Ruby's dup method only performs a shallow copy, the items property still referred to the original, frozen master list, and the append failed.

require "ice_nine"
ShoppingList = Struct.new(:name, :items)

MASTER_LIST = ShoppingList.new("Master List", [
                                 "Bread",
                                 "Milk",
                                 "Beer"])
IceNine.deep_freeze(MASTER_LIST)
today_list = MASTER_LIST.dup

today_list.items << "Fireworks" # ~> RuntimeError: can't modify frozen Array

# ~> RuntimeError
# ~> can't modify frozen Array
# ~>
# ~> xmptmp-in909276Q.rb:12:in `<main>'

In that episode, we addressed this by adding an initialize_copy method to our shopping list class. As we saw in Episode #486, Ruby invokes this method anytime an object is duplicated or cloned.

Inside the method, we explicitly made duplicates of each of the shopping list attributes.

With this change in place, we were then able to modify the nested items array of the shopping list copy.

require "ice_nine"
ShoppingList = Struct.new(:name, :items) do
  def initialize_copy(original)
    self.name = original.name.dup
    self.items = original.items.dup
  end
end

MASTER_LIST = ShoppingList.new("Master List", [
                                 "Bread",
                                 "Milk",
                                 "Beer"])
IceNine.deep_freeze(MASTER_LIST)
today_list = MASTER_LIST.dup

today_list.name = "July 4th Shopping"
today_list.items << "Fireworks"

today_list
# => #<struct ShoppingList
#     name="July 4th Shopping",
#     items=["Bread", "Milk", "Beer", "Fireworks"]>

This seems like a lot of effort to go to just to create a deep copy of an object. Surely there's an easier way.

Well, if you're working in a Rails application and ActiveSupport is available, you can try using the deep_dup method that it adds to objects.

However, in my experience this method is not always reliable. In fact, it doesn't work for this example. It seems it fails to to deeply duplicate Ruby Struct attributes.

require "ice_nine"
require "active_support/core_ext/object"

ShoppingList = Struct.new(:name, :items)

MASTER_LIST = ShoppingList.new("Master List", [
                                 "Bread",
                                 "Milk",
                                 "Beer"])

IceNine.deep_freeze(MASTER_LIST)
today_list = MASTER_LIST.deep_dup

today_list.name = "July 4 Shopping"
today_list.items << "Fireworks" # ~> RuntimeError: can't modify frozen Array

MASTER_LIST.items
# =>

# ~> RuntimeError
# ~> can't modify frozen Array
# ~>
# ~> xmptmp-in9092igX.rb:15:in `<main>'

There is another trick for creating deep copies of objects that doesn't require any extra libraries at all.

We can use Ruby's Marshal module to dump a serialized version of the master list. Normally, we'd write the resulting data to disk or to a database field. But in this case, we immediately reconstitute the serialized data using Marshal.load.

The net result of this dump-and-load procedure is a deeply copied object graph. We can see this when we execute the code, and compare the items in the copied list to the items in the master list.

require "ice_nine"

ShoppingList = Struct.new(:name, :items)

MASTER_LIST = ShoppingList.new("Master List", [
                                 "Bread",
                                 "Milk",
                                 "Beer"])

IceNine.deep_freeze(MASTER_LIST)
today_list = Marshal.load(Marshal.dump(MASTER_LIST))

today_list.name = "July 4 Shopping"
today_list.items << "Fireworks"

today_list.items
# => ["Bread", "Milk", "Beer", "Fireworks"]

MASTER_LIST.items
# => ["Bread", "Milk", "Beer"]

For many simple kinds of object, this technique works well. Although you may see performance issues with it at scale.

But there are some objects that Marshal is unable to dump and load.

As a somewhat contrived example, imagine that we were to add a lambda to the master shopping list.

When we execute the code this time, we get an error. That's because Ruby doesn't know how to serialize a Proc object, which is what the lambda syntax creates.

require "ice_nine"

ShoppingList = Struct.new(:name, :items)

MASTER_LIST = ShoppingList.new("Master List", [
                                 "Bread",
                                 "Milk",
                                 "Beer",
                                 ->{["Pie", "Cake", "Cookies"].sample(1)}])

IceNine.deep_freeze(MASTER_LIST)
today_list = Marshal.load(Marshal.dump(MASTER_LIST)) # ~> TypeError: no _dump_data is defined for class Proc

today_list.name = "July 4 Shopping"
today_list.items << "Fireworks"

MASTER_LIST.items
# =>

# ~> TypeError
# ~> no _dump_data is defined for class Proc
# ~>
# ~> xmptmp-in27080eMR.rb:12:in `dump'
# ~> xmptmp-in27080eMR.rb:12:in `<main>'

For a more comprehensive deep duplication solution, we can turn to the duplicate gem, by Adam Luszi.

We use the duplicate method on the Duplicate module, passing in the object to be copied.

This method happily copies our shopping list, without choking on the Proc object. The resulting output shows that both master and duplicate objects include the same proc, but only the duplicate object includes our added item.

require "ice_nine"
require "duplicate"

ShoppingList = Struct.new(:name, :items)

MASTER_LIST = ShoppingList.new("Master List", [
                                 "Bread",
                                 "Milk",
                                 "Beer",
                                 ->{["Pie", "Cake", "Cookies"].sample(1)}])

IceNine.deep_freeze(MASTER_LIST)
today_list = Duplicate.duplicate(MASTER_LIST)

today_list.name = "July 4 Shopping"
today_list.items << "Fireworks"

today_list.items
# => ["Bread",
#     "Milk",
#     "Beer",
#     #<Proc:0x00000002590d38@xmptmp-in9092XRr.rb:10 (lambda)>,
#     "Fireworks"]

MASTER_LIST.items
# => ["Bread",
#     "Milk",
#     "Beer",
#     #<Proc:0x00000002592458@xmptmp-in9092XRr.rb:10 (lambda)>]

The duplicate gem includes smart handling for non-copyable objects in Ruby, and as a result it performs deep copies of pretty much any object we throw at it.

Now, I'll be honest: I only found out about this gem it as a result of the research for this episode, so it's not a one that I've been using for a long time. However, it's the best off-the-shelf solution for deep copying in Ruby that I've been able to locate.

Happy hacking!

Responses