6 Optimization Tips for Ruby MRI

Every language has its idiosyncrasies and Ruby is no different, hence, below is a collection of handy optimizations to keep in mind next time you're working on a Ruby project. Majority of them appear to fall into the syntactic sugar category, but it's important to understand the motivation behind each, as there is usually a concrete reason for the performance difference. I should also say, that this is primarily for Ruby 1.8 and you should benchmark these yourself if you're already running on the dev branch of Ruby 1.9.

Help the Ruby interpreter

This is no secret: Ruby interpreter is slow. Advances on the JRuby and Maglev front are certainly showing some very appealing performance increases, but even there, a deeper understanding of the Ruby internals is bound to help you write better and faster code. The Ruby Wikipedia entry is a great place to start to learn about some of intricacies of Ruby. For a deeper look, Dr. Stefan Kaes's 'Writing Efficient Ruby Code', and David A. Blacks' 'Ruby for Rails' are must have for any Ruby developer.

1. Minimize searches in the abstract syntax tree

Ruby has a wonderful property of being highly dynamic, which in turn, allows us to create all kinds of spectacular meta-programming scenarios. However, this comes at a price of minimal runtime and compile time optimization (JRuby and some other VM's are changing this). Unlike most other languages, Ruby's MRI does not generate bytecode (Ruby 1.9 will change this), but relies solely on searching through the Abstract Syntax Tree (AST) for virtually every method call, variable, and so on. Sounds expensive? It is. Hence, next time you're saving yourself three lines of code by writing a meta function, consider inlining the logic directly.

2. Optimize for Ruby cache, avoid expensive lookups

To optimize against expensive searches through the AST, Ruby keeps a cache of local variables and method signatures. Hence, comparatively speaking, local variables are cheap, use them: @var = "local variable, which is cached by Ruby, and requires a single lookup", whereas self.var = "requires walking the AST, and results in multiple lookups"

3. Interpolation over Concatenation

This came as a surprise to me the first time I stumbled across it: interpolated strings are faster than straight concatenation! Chris Blackburn recently wrote a great blog post about it, make sure to read it.

puts "This string embeds #{var1} and #{var2} through interpolation"  # faster
puts "This string concatenates " << var1 << " and " << var 2  # slower

4. When possible, use destructive operations!

Many ruby operations have a destructive (ex: gsub vs gsub!) equivalent, and which, unfortunately are not used very often largely due to the somewhat erratic behavior: sometimes these operations return nil, sometimes they don't. However, non destructive operations (ex: gsub) duplicate your object and hence incur an expensive copy operation every time. Learn the non-nil, destructive operations and use them religiously.

hash = {}
hash = hash.merge({1 => 2}) # duplicates the original hash
hash.merge!({1 => 2}) # equivalent to previous line, and faster

str = "string to gsub"
str = str.gsub(/to/, 'copy') # duplicate string and reassigns it
str.gsub!(/to/, 'copy') # same effect, but no object duplication

5. Symbol.to_proc fan? Use blocks!

If you're a Rails developer, you've probably used Symbol.to_proc. Well, you're likely incurring an order of magnitude speed decrease when you do! Next time, use a block:

@widget_ids = @widgets.map(&:id) # Symbol.to_proc method, order of magniture slower...
@widget_ids = @widgets.inject([]) {|w, a| w.push(a.id)} # same effect, not as pretty, but much faster
@widget_ids = @widgets.collect {|w| w.id } # faster, and simpler than inject
@widget_ids = @widgets.map {|w| w.id } # yet another (faster) way to tackle the problem

6. Benchmark everything!

Create a macro, stash it away in an easy to access area, or post the code for the benchmark skeleton on your wall. Anytime I have a question about Ruby performance, the answer is always less than 30 seconds away:

require 'benchmark'

n = 100000
Benchmark.bm do |x|
   x.report('copy') { n.times do ; h = {}; h = h.merge({1 => 2}); end }
   x.report('no copy') { n.times do ; h = {}; h.merge!({1 => 2}); end }
end

#          user        system      total        real
# copy     0.460000   0.180000   0.640000 (  0.640692)
# no copy  0.340000   0.120000   0.460000 (  0.463339)

Your favorite optimization tip missing from the list? Leave a comment, let everyone know!

Ilya GrigorikIlya Grigorik is a web ecosystem engineer, author of High Performance Browser Networking (O'Reilly), and Principal Engineer at Shopify — follow on Twitter.