In Progress
Unit 1, Lesson 21
In Progress

Initialize Copy

Ruby provides built-in methods for copying objects, but they have some limitations that can catch you by surprise. In this episode you’ll learn how to customize the way Ruby clones and duplicates your objects, including some practical examples of why you might need to.

Video transcript & code

Whatever text editor you prefer to use for coding, it probably has a concept of a "buffer". A buffer is a holding area for text, usually but not always connected to a particular file name.

Here's an extremely rudimentary buffer class that we might create if we were writing an editor of our own.

This class can only manipulate the text it contains in one way: it can append new text to its existing contents.

This class also automatically saves a backup of the current buffer state in a temp file before appending new text.

require "tempfile"

class Buffer
  attr_reader :contents

  def initialize
    @contents = ""
  end

  def append(text)
    save_backup
    @contents << text
  end

  def save_backup
    backup_file.rewind
    backup_file.write(@contents)
    backup_file.fsync
  end

  def backup_file
    @backup_file ||=  Tempfile.new(["backup", ".txt"], ".")
  end
end

Let's put this class through it's paces.

We'll append some initial text.

then we'll check the current contents.

Then we'll append more text.

We can see that the content has changed.

And that the backup file contains the previous version of the buffer.

require "./buffer"

b = Buffer.new
b.append("Hello, ")
b.contents
# => "Hello, "
b.append("World")
b.contents
# => "Hello, World"
File.read(b.backup_file.path)
# => "Hello, "

One feature that often useful in an editor is the ability to duplicate an existing buffer.

Ruby objects have this ability built-in. Actually, they have two methods for creating copies of themselves. But as we remember from Episode #484, we always prefer the dup method unless we have a specific reason to use clone instead.

We now have a second buffer whose contents mirror that of the first.

Let's perform two different edits on our duplicate buffers.

And now, let's check their contents.

The result is probably not what we intended. We can see that both appends appear to have been applied to both buffers.

What has really happened here is that while we have two different buffer objects, they both reference the same string for their contents.

require "./buffer"

b = Buffer.new
b.append("Hello, ")
b2 = b.dup
b2.contents
# => "Hello, "

b.append("World")
b2.append("Lunch")

b.contents
# => "Hello, WorldLunch"
b2.contents
# => "Hello, WorldLunch"

b.contents.object_id            # => 25714040
b2.contents.object_id           # => 25714040

This is because when Ruby makes a copy of an object, it makes a shallow copy. It creates a new object, with instance variables that are exactly the same as the original object's instance variables. Since in Ruby variables hold references to objects rather than the objects themselves, the variables in the new duplicate object point to the same objects to which the original instance variables pointed.

This is clearly not ideal behavior for a buffer class.we want duplicate buffers to contain a copy of the original contents. Fortunately, Ruby gives us a tool to customize the way that objects are copied.

Just as we can define an initializer for our classes, we can also define an initialize_copy method.

This method receives a reference to the original object from which the copy is being made.

Then, inside the method, we can add code that we want to be executed whenever an object of this class is copied. In this case, we make the content of the new buffer a duplicate string from the original contents.

When we run our experiment will code again, we see that this time it works the way we expected.

After new text is appended, the original and duplicate buffers have differing contents.

require "./buffer"

class Buffer
  # ...
  def initialize_copy(original)
    @contents = @contents.dup
  end
  # ...
end

b = Buffer.new
b.append("Hello, ")
b2 = b.dup
b2.contents
# => "Hello, "

b.append("World")
b2.append("Lunch")

b.contents
# => "Hello, World"
b2.contents
# => "Hello, Lunch"

However, we're not quite done sorting out bugs related to buffer duplication.

After we give our two buffers divergent contents, let's add one more piece of text to the original buffer.

Remember that these buffer objects save their previous value in a temporary file before each append. So, what do you think the backup file for the original buffer should contain right now?

Remember that the original buffer contained the text "Hello, World" before we added some extra text onto the end. So in theory the backup should just contain the text "Hello, World".

And that's exactly what it does contain.

Meanwhile, the last change we made to the duplicate buffer was to add the word "Lunch". So the backed-up value should just be the word "Hello".

Instead, what we find is that it now has the exact same backup value as the original buffer.

This is because, just like with the contents of the buffer, the same temp file object is now shared between the two buffers.

This is a serious bug. We now have a copied buffer overwriting the backup file for its parent buffer.

require "./buffer"

class Buffer
  # ...
  def initialize_copy(original)
    @contents = @contents.dup
  end
  # ...
end

b = Buffer.new
b.append("Hello, ")
b2 = b.dup
b2.contents
# => "Hello, "

b.append("World")
b2.append("Lunch")
b.append(". What's up?")

File.read(b.backup_file.path)
# => "Hello, World"
File.read(b2.backup_file.path)
# => "Hello, World"

b.backup_file
# => #<File:./backup20170517-17888-kefb5q.txt>
b2.backup_file
# => #<File:./backup20170517-17888-kefb5q.txt>

Let's take a look at where the backup_file object comes from.

This property is a little different from the contents attribute. it is lazily initialized the first time it is referenced.

It doesn't make sense to handle this attribute the same way we handled the contents attribute. instead of making a duplicate of the @backup_file instance variable, we'll clear it by assigning a value of nil. That way the next time the backup_file method is invoked in the copied object, it will be reinitialized with a fresh new and unique temporary file object, attached to a new and unique temporary file.

Let's re-execute our example code.

This time we can see that the contents of the backup files for each buffer is different.

And so are the temporary file names.

require "./buffer"

class Buffer
  # ...
  def initialize_copy(original)
    @contents = @contents.dup
    @backup_file = nil
  end
  # ...
end

b = Buffer.new
b.append("Hello, ")
b2 = b.dup
b2.contents
# => "Hello, "

b.append("World")
b2.append("Lunch")
b.append(". What's up?")

File.read(b.backup_file.path)
# => "Hello, World"
File.read(b2.backup_file.path)
# => "Hello, "

b.backup_file
# => #<File:./backup20170517-2460-1ap5cvw.txt>
b2.backup_file
# => #<File:./backup20170517-2460-hdcdn8.txt>

Now if you recall Episode #484, you might be wondering if this initialize_copy method applies only to objects copied with dup, and not objects copied with clone. Well, let's find out.

We'll change the dup to clone, and then re-execute.

b2 = b.clone

The results are unchanged. Everything still works correctly. This shows that initialize_copy is invoked regardless of whether we use dup or clone to copy an object.

And now we can see why the method is called initialize_copy instead of initialize_dup or initialize_clone

It's because this special Ruby callback method is applied to both dup and clone operations

But what if we want our copy customization to only apply to dup operations? what if we want to preserve the original, default Ruby behavior for clone?

In that case, instead of using initialize_copy, we can use initialize_dup.

As the name suggests, this callback only applies to dup.

Let's test this. After setting up an initial buffer, we'll create a duplicate of it and a clone of it.

If we append some more text to the duplicate, then compare its contents to the original, we can see that they differ.

But when we append text to the clone, and then compare contents to the original, we can see that the clone and the original share a contents string. Our copy customization was not applied.b

require "./buffer"

class Buffer
  # ...
  def initialize_dup(original)
    @contents = @contents.dup
    @backup_file = nil
  end
  # ...
end

b = Buffer.new
b.append("Hello, ")
b_dup = b.dup
b_clone = b.clone

b_dup.append("World")
b.contents                      # => "Hello, "
b_dup.contents                  # => "Hello, World"
b_clone.append("darkness my old friend")
b.contents                      # => "Hello, darkness my old friend"
b_clone.contents                # => "Hello, darkness my old friend"

As you might expect, there is also a matching initialize_clone method we can define, if we want to only specialize the way objects are cloned.

class Buffer
  # ...
  def initialize_dup(original)
    @contents = @contents.dup
    @backup_file = nil
  end

  def initialize_clone(original)
    # ...
  end
  # ...
end

One thing to bear in mind when defining either of these callback methods is that we should always call super in them.

class Buffer
  # ...
  def initialize_dup(original)
    super
    @contents = @contents.dup
    @backup_file = nil
  end

  def initialize_clone(original)
    super
    # ...
  end
  # ...
end

Why? Because the default implementation that Ruby gives us for these methods is what invokes initialize_copy. Effectively they are defined like this in the Ruby source code:

class Object
  def initialize_dup(original)
    initialize_copy(original)
  end

  def initialize_clone(original)
    initialize_copy(original)
  end
end

So unless we are using initialize_clone or initialize_dup specifically to prevent initialize_copy from being invoked, we need to remember to call super.

And now you know everything you need to know about customizing how Ruby copies objects. Happy hacking!

 


 

Editor's note:  The self-referential links above are intentional, to allow for future expansion of the cross-reference base.

Responses