In Progress
Unit 1, Lesson 1
In Progress

PStore

Video transcript & code

Ah, the TODO list. Everyone writes a TODO list app sooner or later. Here are the beginnings of another one. The top-level class is called Lister. It has a nested List class. A List may have zero or more Tasks.

class Lister
  List = Struct.new(:name, :tasks) do
    def initialize(name)
      super(name, [])
    end

    def add_task(name)
      task = Task.new(name)
      yield task if block_given?
      tasks << task
      task
    end
  end

  Task = Struct.new(:name, :status) do
    def initialize(name)
      super(name, :todo)
    end
  end

  attr_reader :lists

  def initialize
    @lists = []
  end

  def add_list(name)
    list = List.new(name)
    yield list if block_given?
    lists << list
    list
  end
end

We can create a TODO list by first instantiating a Lister object, then adding a list. Then we populate the list with tasks.

require "./lister"

lister = Lister.new
lister.add_list("ship chores") do |list|
  list.add_task("Early rise (3:00PM)")
  list.add_task("Waffles vindaloo with Kryten")
  list.add_task("Call Rimmer a smeghead")
end

This is all fine and good, but it's not very helpful if we can't make these tasks persistent somehow. We need a way to serialize this tree of objects and then read it back in later on. Since this is just a private TODO list we'll be keeping on our local machine, a database is overkill. We just need something that will write to a file.

You might be thinking about acronyms like XML, JSON, or YAML right now. Maybe also about how to teach these objects to serialize themselves to one of these formats, and then load themselves back in from a file.

But hold up for a second. Ruby is all about making the easy things really, really easy. And in this case, for this common scenario of saving a set of objects to a file and then reloading them later, there is way that is very easy indeed.

We require the pstore standard library. We create a new PStore object, supplying the name of the file we want to save our TODO lists in. Then we begin a PStore transaction. Within the transaction, the block receives the PStore object. We use this object as if it were a Hash, assigning our lister object to a key of the same name.

require "./lister"

lister = Lister.new
lister.add_list("ship chores") do |list|
  list.add_task("Early rise (3:00PM)")
  list.add_task("Waffles vindaloo with Kryten")
  list.add_task("Call Rimmer a smeghead")
end

require "pstore"
store = PStore.new("todo.pstore")

store.transaction do |s|
  s["lister"] = lister
end

Later, we instantiate another PStore object. We start a new transaction, and within it we return the value associated with the lister key. When we inspect this value, we see our TODO list is all there. The top-level Lister object, the List, and the Tasks have all been restored exactly as they were when we saved them.

require "./seed.rb"

store = PStore.new("todo.pstore")
lister = store.transaction do |s|
  s["lister"]
end

lister
# => #<Lister:0x0000000109eb00
#     @lists=
#      [#<struct Lister::List
#        name="ship chores",
#        tasks=
#         [#<struct Lister::Task name="Early rise (3:00PM)", status=:todo>,
#          #<struct Lister::Task
#           name="Waffles vindaloo with Kryten",
#           status=:todo>,
#          #<struct Lister::Task name="Call Rimmer a smeghead", status=:todo>,
#          #<struct Lister::Task name="Golf with the Cat", status=:todo>,
#          #<struct Lister::Task name="Play guitar", status=:todo>]>]>

What PStore has done here is a deep serialization of our object tree. It knows how to serialize almost any Ruby object to a form suitable for storage in a file. It started with the lister object, and followed references to other objects until it had a complete graph of objects to be stored.

As you've seen, in order to persist this object tree, we had to assign it to a key. These top-level keys are known as "roots", in PStore. We can ask a PStore to list all the roots it knows about. We can also ask it if a particular root exists. We can also delete roots.

require "./seed.rb"

store = PStore.new("todo.pstore")
store.transaction do |s|
  s.roots                       # => ["lister"]
  s.root?("lister")             # => true
  s.root?("kochanski")          # => false
  s.delete("lister")            # => #<Lister:0x0000000191b8f0 @lists=[#<struct Lister::List name="ship chores", tasks=[#<struct Lister::Task name="Early rise (3:00PM)", status=:todo>, #<struct Lister::Task name="Waffles vindaloo with Kryten", status=:todo>, #<struct Lister::Task name="Call Rimmer a smeghead", status=:todo>, #<struct Lister::Task name="Golf with the Cat", status=:todo>, #<struct Lister::Task name="Play guitar", status=:todo>]>]>
  s.roots                       # => []
end

You've probably noticed that we use transactions a lot with PStore. We can only read or write to a PStore in the context of a transaction. At the beginning of a transaction, all of the roots are read in from the storage file. We can then make changes to the objects stored in those roots. When the transaction ends, the roots are once again serialized, using their new state. We can see that that if we start a new transaction, the change we made in the last transaction has been preserved.

Sometimes we know that we don't need, or want, to save any changes. We just want to read stored information, not update it. For these cases, we can pass a read_only flag to the =#transaction method. If we make a change within a read-only transaction, it will not be saved in the PStore.

require "./seed.rb"

store = PStore.new("todo.pstore")

store.transaction do |s|
  s["lister"].lists.first.add_task("Golf with the Cat")
end

store.transaction do |s|
  s["lister"].lists.first.tasks.last.name
  # => "Golf with the Cat"
end

read_only = true
store.transaction(read_only) do |s|
  s["lister"].lists.first.add_task("Play guitar")
end

store.transaction do |s|
  s["lister"].lists.first.tasks.last.name
  # => "Golf with the Cat"
end

The principle use of read-only transactions is as an optimization. PStore will only allow one writing transaction to occur at a time. But any number of read-only transactions can occur at once.

This is true not only of threads in the same process, but for separate processes as well. PStore uses operating-system file-locking facilities to ensure that only one process writes to a PStore file at a time.

We can see that with this simple script. Based on the command-line argument provided, it will either start a read-only transaction, or a read-write transaction. Inside the transaction, it waits 5 seconds before finishing.

require "pstore"

op        = ARGV[0] || "read"
read_only = op == "read"

store = PStore.new("write_read.pstore")
puts "PID #{$$} Before #{op}\n"
store.transaction(read_only) do |s|
  puts "PID #{$$} Inside #{op}\n"
  sleep 5
end
puts "PID #{$$} Done #{op}\n"

Here's a little helper shell script for testing purposes. It starts two copies of the write_read script, and waits for both to finish.

ruby write_read.rb $1 &
ruby write_read.rb $1 &
wait

When we run this script in read-only mode, we can see that both processes read simultaneously, and finish at roughly the same time.

$ sh write_read.sh read
PID 6781 Before read
PID 6780 Before read
PID 6781 Inside read
PID 6780 Inside read
PID 6781 Done read
PID 6780 Done read

But when we run it in read-write mode, we see that the first process gets exclusive access while it is still running. The second process only gets inside the transaction once the first is finished.

$ sh write_read.sh write
PID 6883 Before write
PID 6883 Inside write
PID 6884 Before write
PID 6883 Done write
PID 6884 Inside write
PID 6884 Done write

For all that it is easy to use, PStore is not a universal persistence solution. Every transaction reads the entire contents of the pstore file in and then writes it all back out again. This means that for large quantities of data, PStore can be very inefficient. And since it locks at the granularity of the entire file, it doesn't handle contention between numerous processes very efficiently either.

On the other hand, it's dead-simple to use, we don't have to think about how to map our objects into a database schema, and it prevents accidental corruption caused by two processes accessing the same file at once. For persisting the state of simple command-line utilities, or for prototyping applications, PStore is very nearly perfect. Happy hacking!

Responses