Crystal
Video transcript & code
Ruby is a dynamic language, which gives it a great deal of flexibility and power. But this dynamism means that Ruby code tends to run more slowly than code in statically-compiled languages. It also means that many errors in coding will only be detected when the code is actually executed. This is one reason the Ruby community emphasizes having lots of automated tests.
Here's a Ruby program which takes a word on the command line, and looks for anagrams of that word using the system dictionary. The details of how this code works aren't important. What's important to understand is that it first builds a lookup table where words have been sorted out by the letters they contain. Then it looks up the given word in that table.
table = Hash.new { |h,k|
h[k] = []
}
IO.foreach("/usr/share/dict/words") do |line|
word = line.chomp
key = word.downcase.chars.sort.join
table[key] << word
end
word = ARGV[0].downcase
anagrams = table[word.chars.sort.join]
anagrams.map!(&:downcase)
anagrams.delete(word)
if anagrams.any?
puts "Anagrams for '#{word}': #{anagrams.join(", ")}"
else
puts "Sorry, no anagrams for '#{word}'"
end
Building the lookup table is a fairly processor-intensive task. To run the whole program, including building the table and then looking up a word, takes more than half a second on this machine.
This may not seem like too terribly long a time. But if we're going to take anagram-finding webscale and win a series-A funding round for anagrams-as-a-service, we need all the performance we can get.
One tried-and-true strategy for speeding up CPU-intensive code is to simply re-write it in a statically-typed, compiled language. Examples of fast, static languages we might consider include C, C++, Java, Scala, Haskell, Go, or Rust. Of course, to do that we have to first know one of those languages. And then we have to switch gears and re-write everything using the alternative language's syntax and semantics.
Still, no pain no gain. Let's go ahead and re-write this program from beginning to end in a statically-typed, compiled programming language.
table = Hash(String,Array(String)).new { |h,k|
h[k] = [] of String
}
File.each_line("/usr/share/dict/words") do |line|
word = line.chomp
key = word.downcase.chars.sort.join
table[key] << word
end
word = ARGV[0].downcase
anagrams = table[word.chars.sort.join]
anagrams.map!(&.downcase)
anagrams.delete(word)
if anagrams.any?
puts "Anagrams for '#{word}': #{anagrams.join(", ")}"
else
puts "Sorry, no anagrams for '#{word}'"
end
OK, there, we're done. That didn't seem so bad, honestly!
Believe it or not, this is not a prank episode. This new version of the code is a valid program in a language called Crystal. Crystal is a compiled, statically-typed programming language which has been designed to look and feel almost exactly like programming in Ruby.
Let's go ahead and compile this code to an executable, and then see how fast it is.
$ crystal build --release anagrams.cr $ time ./anagrams avdi Anagrams for 'avdi': avid, diva real 0m0.230s user 0m0.305s sys 0m0.017s
That's a pretty significant speed-up!
Let's quickly go over the changes we had to make.
First, in order to set up the word lookup table, we had to add some type annotation to the Hash and to the code that fills new hash entries with default empty arrays.
Variable and argument types in Crystal are not optional, but they are inferred. This means that in many, even most cases, Crystal is able to deduce an appropriate type or union of types for variables and methods. So most of the time, we can happily write code as if we are using Ruby's un-typed variables and the Crystal compiler works out what the types ought to be.
However, in some cases the compiler needs some help. In this case, we are setting up some empty data structures. Crystal needs to know what kinds of data will be going into those structures, but it can't deduce that information when it has no example values to guess from. So in this case, we have to supply some explicit type annotations.
Next up, we had to replace IO.foreach
with File.each_line
. This is just a difference in the core libraries between Ruby and Crystal.
Finally, instead of using Ruby's symbol-to-proc to treat a method name as a block, we use special Crystal syntax for the same operation. Personally, I kinda prefer Crystal's way to Ruby's way.
And that's it. Everything else stayed the same. But by inferring rigid data type information for every method call, Crystal was able to generate native machine code that was significantly better optimized than Ruby's dynamic execution.
Of course, speed isn't the only consideration when choosing a statically-compiled language. Like Google's Go language, Crystal can produce redistributable executables. It's very easy to link directly to C libraries in Crystal. And as with other statically-typed languages, Crystal can warn us about some programming errors before we ever run our program.
For instance, let's say we accidentally forgot to rejoin our key characters into a string after sorting them. In Ruby, we wouldn't find out about this problem until we saw the programming mysteriously failing to find any anagrams because it was comparing a string key to array-of-character keys.
table = Hash(String,Array(String)).new { |h,k|
h[k] = [] of String
}
File.each_line("/usr/share/dict/words") do |line|
word = line.chomp
key = word.downcase.chars.sort.join
table[key] << word
end
word = ARGV[0].downcase
anagrams = table[word.chars.sort]
anagrams.map!(&.downcase)
anagrams.delete(word)
if anagrams.any?
puts "Anagrams for '#{word}': #{anagrams.join(", ")}"
else
puts "Sorry, no anagrams for '#{word}'"
end
But in Crystal, as soon as we try to build the program, it fails with a type error. It might take us a little sleuthing to figure out exactly what the Crystal compiler is trying to tell us, but it has prevented a bug before the code was ever executed.
$ crystal build anagrams.cr Error in ./anagrams.cr:11: instantiating 'Hash(String, Array(String))#[](Array(Char))' anagrams = table[word.chars.sort] ^ in /opt/crystal/src/hash.cr:71: instantiating 'fetch(Array(Char))' fetch(key) ^~~~~ in /opt/crystal/src/hash.cr:83: instantiating 'fetch(Array(Char))' fetch(key) do ^~~~~ in /opt/crystal/src/hash.cr:83: instantiating 'fetch(Array(Char))' fetch(key) do ^~~~~ in /opt/crystal/src/hash.cr:85: no overload matches '(Hash(String, Array(String)), String -> Array(String))#call' with types Hash(String, Array(String)), Array(Char) Overloads are: - (Hash(String, Array(String)), String -> Array(String))#call(arg0 : Hash(String, Array(S tring)), arg1 : String) block.call(self, key) ^~~~
Crystal is a very young language. It is still at an alpha stage of development, which means that it's still very much in flux. But from what I've seen, I think Crystal is a language worth keeping your eye on.
Whether for performance, safety, or redistributability, it sometimes becomes necessary to re-write Ruby code in a static language. As Crystal stabilizes it may become a viable candidate language for such re-writes. One with a much lower barrier to entry than any of the other options out there.
So download it, and give it a shot when you get a chance. Happy hacking!
Responses