In Progress
Unit 1, Lesson 21
In Progress

Crystal

Video transcript & code

Ruby is a dynamic language, which gives it a great deal of flexibility and power. But this dynamism means that Ruby code tends to run more slowly than code in statically-compiled languages. It also means that many errors in coding will only be detected when the code is actually executed. This is one reason the Ruby community emphasizes having lots of automated tests.

Here's a Ruby program which takes a word on the command line, and looks for anagrams of that word using the system dictionary. The details of how this code works aren't important. What's important to understand is that it first builds a lookup table where words have been sorted out by the letters they contain. Then it looks up the given word in that table.

table = Hash.new { |h,k|
  h[k] = []
}
IO.foreach("/usr/share/dict/words") do |line|
  word = line.chomp
  key  = word.downcase.chars.sort.join
  table[key] << word
end

word     = ARGV[0].downcase
anagrams = table[word.chars.sort.join]
anagrams.map!(&:downcase)
anagrams.delete(word)
if anagrams.any?
  puts "Anagrams for '#{word}': #{anagrams.join(", ")}"
else
  puts "Sorry, no anagrams for '#{word}'"
end

Building the lookup table is a fairly processor-intensive task. To run the whole program, including building the table and then looking up a word, takes more than half a second on this machine.

This may not seem like too terribly long a time. But if we're going to take anagram-finding webscale and win a series-A funding round for anagrams-as-a-service, we need all the performance we can get.

One tried-and-true strategy for speeding up CPU-intensive code is to simply re-write it in a statically-typed, compiled language. Examples of fast, static languages we might consider include C, C++, Java, Scala, Haskell, Go, or Rust. Of course, to do that we have to first know one of those languages. And then we have to switch gears and re-write everything using the alternative language's syntax and semantics.

Still, no pain no gain. Let's go ahead and re-write this program from beginning to end in a statically-typed, compiled programming language.

table = Hash(String,Array(String)).new { |h,k|
  h[k] = [] of String
}
File.each_line("/usr/share/dict/words") do |line|
  word = line.chomp
  key  = word.downcase.chars.sort.join
  table[key] << word
end

word     = ARGV[0].downcase
anagrams = table[word.chars.sort.join]
anagrams.map!(&.downcase)
anagrams.delete(word)
if anagrams.any?
  puts "Anagrams for '#{word}': #{anagrams.join(", ")}"
else
  puts "Sorry, no anagrams for '#{word}'"
end

OK, there, we're done. That didn't seem so bad, honestly!

Believe it or not, this is not a prank episode. This new version of the code is a valid program in a language called Crystal. Crystal is a compiled, statically-typed programming language which has been designed to look and feel almost exactly like programming in Ruby.

Let's go ahead and compile this code to an executable, and then see how fast it is.

$ crystal build --release anagrams.cr
$ time ./anagrams avdi
Anagrams for 'avdi': avid, diva

real    0m0.230s
user    0m0.305s
sys     0m0.017s

That's a pretty significant speed-up!

Let's quickly go over the changes we had to make.

First, in order to set up the word lookup table, we had to add some type annotation to the Hash and to the code that fills new hash entries with default empty arrays.

Variable and argument types in Crystal are not optional, but they are inferred. This means that in many, even most cases, Crystal is able to deduce an appropriate type or union of types for variables and methods. So most of the time, we can happily write code as if we are using Ruby's un-typed variables and the Crystal compiler works out what the types ought to be.

However, in some cases the compiler needs some help. In this case, we are setting up some empty data structures. Crystal needs to know what kinds of data will be going into those structures, but it can't deduce that information when it has no example values to guess from. So in this case, we have to supply some explicit type annotations.

Next up, we had to replace IO.foreach with File.each_line. This is just a difference in the core libraries between Ruby and Crystal.

Finally, instead of using Ruby's symbol-to-proc to treat a method name as a block, we use special Crystal syntax for the same operation. Personally, I kinda prefer Crystal's way to Ruby's way.

And that's it. Everything else stayed the same. But by inferring rigid data type information for every method call, Crystal was able to generate native machine code that was significantly better optimized than Ruby's dynamic execution.

Of course, speed isn't the only consideration when choosing a statically-compiled language. Like Google's Go language, Crystal can produce redistributable executables. It's very easy to link directly to C libraries in Crystal. And as with other statically-typed languages, Crystal can warn us about some programming errors before we ever run our program.

For instance, let's say we accidentally forgot to rejoin our key characters into a string after sorting them. In Ruby, we wouldn't find out about this problem until we saw the programming mysteriously failing to find any anagrams because it was comparing a string key to array-of-character keys.

table = Hash(String,Array(String)).new { |h,k|
  h[k] = [] of String
}
File.each_line("/usr/share/dict/words") do |line|
  word = line.chomp
  key  = word.downcase.chars.sort.join
  table[key] << word
end

word     = ARGV[0].downcase
anagrams = table[word.chars.sort]
anagrams.map!(&.downcase)
anagrams.delete(word)
if anagrams.any?
  puts "Anagrams for '#{word}': #{anagrams.join(", ")}"
else
  puts "Sorry, no anagrams for '#{word}'"
end

But in Crystal, as soon as we try to build the program, it fails with a type error. It might take us a little sleuthing to figure out exactly what the Crystal compiler is trying to tell us, but it has prevented a bug before the code was ever executed.

$ crystal build anagrams.cr
Error in ./anagrams.cr:11: instantiating 'Hash(String, Array(String))#[](Array(Char))'

anagrams = table[word.chars.sort]
                ^

in /opt/crystal/src/hash.cr:71: instantiating 'fetch(Array(Char))'

    fetch(key)
    ^~~~~

in /opt/crystal/src/hash.cr:83: instantiating 'fetch(Array(Char))'

    fetch(key) do
    ^~~~~

in /opt/crystal/src/hash.cr:83: instantiating 'fetch(Array(Char))'

    fetch(key) do
    ^~~~~

in /opt/crystal/src/hash.cr:85: no overload matches '(Hash(String, Array(String)), String
-> Array(String))#call' with types Hash(String, Array(String)), Array(Char)
Overloads are:
 - (Hash(String, Array(String)), String -> Array(String))#call(arg0 : Hash(String, Array(S
tring)), arg1 : String)

        block.call(self, key)
              ^~~~

Crystal is a very young language. It is still at an alpha stage of development, which means that it's still very much in flux. But from what I've seen, I think Crystal is a language worth keeping your eye on.

Whether for performance, safety, or redistributability, it sometimes becomes necessary to re-write Ruby code in a static language. As Crystal stabilizes it may become a viable candidate language for such re-writes. One with a much lower barrier to entry than any of the other options out there.

So download it, and give it a shot when you get a chance. Happy hacking!

Responses