Distributed Ruby Workers on EC2

Whether you're writing a Rails application, or a straight Ruby library, every once in a while you will have to run a long computational task. In such cases, the relatively short HTTP cycle, and usability constraints create a number of problems for the developer. However, as usual, Ruby offers us a solution: dRuby (Distributed Ruby). Simply put, DRb allows you to interact with remote objects via TCP as if they were located right on your system. Hence, to avoid locking our server, we will simply get another computer to perform the time-consuming task for us. Of course, now you're saying: where am I going to get another computer? Well, how about Amazon's EC2:

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.

But enough talking, let's get down to the code.

BackgrounDrb and getting started on EC2

Instead of building our DRb solution from scratch, we will make use of Ezra's BackrounDRb plugin. For a great explanation of it, by the author himself, head to: Introduction to BackgrounDRb. To speed things up, I will assume that you have your EC2 server running, and you are sitting in front of the console (ssh). If you need help with these steps, check Amazon's Getting Started guide.

Setting up DRb on EC2

Depending on the image you booted, your server's hostname may or may not be set to your public handle, and we need to fix this for a reason you will see shortly. A quick bash script should do the task:

# Retrieve and store our ip / hostname
wget -q -O /tmp/public-ip http://169.254.169.254/latest/meta-data/public-ipv4
wget -q -O /tmp/public-hostname http://169.254.169.254/latest/meta-data/public-hostname
hostname -F /tmp/public-hostname
echo $(hostname) > /etc/hostname

You can also manually execute these commands in your console. You don't really need the public IP, but I kept it for debugging purposes. After performing these steps, executing 'hostname' on your command line should return your full ec2.xxxx.com handle. Now we're ready to install BackgrounDRb:

svn checkout http://svn.devjavu.com/backgroundrb/trunk backgroundrbmkdir backgroundrb/workers
Next, we will create a dummy echo worker which will simply append a string to our request, and store it as a final result. With a little imagination, you can extend this example to convert video files, create PDF's, and the list goes on!

class EchoWorker < BackgrounDRb::Worker::Base
  def do_work(string)
    logger.info("\tTest query: #{string}")
    results[:status] = "Echo back: #{string}"
    self.delete
  end
end

EchoWorker.register

Save the worker in your 'workers' directory and we are almost done on the server side. To simplify the command line, we will also create a configuration file for our DRb server:

---
:port: 2000
:protocol: druby
:worker_dir: workers
:pool_size: 10

We will run our server on port 2000, via druby (TCP) protocol, and allow at most 10 workers. Now we're ready to launch the server. However, there is a small caveat you need to be aware of: internally all EC2 servers are addressed via local NAT'ed IP addresses. The servers are unaware of their public IP's, and hence won't bind to them, but specifying their public hostname will do the trick:

script/backgroundrb start -- -c server.conf -h $(hostname)

Voila, you have turned an EC2 box into a DRb server. Having said that, do check your log directory for any error messages!

Connecting with a client

Connecting to our server is much simpler, in fact we don't even need any additional libraries. DRb is bundled with Ruby by default, hence our client is a single file:

require 'rubygems'
require 'drb'

remote_worker = DRbObject.new(nil, 'druby://ec2-xx-xx-xx-xx.z-1.compute-1.amazonaws.com:2000')
key = remote_worker.new_worker(:class => :echo_worker, :args => 'Hey EC2!')
puts "Job key: #{key}\n"

# Circumventing a bug: http://backgroundrb.devjavu.com/ticket/50
result = remote_worker.worker(:backgroundrb_results).get_result(key, :status)
puts "\t Reply :: #{result}\n"

puts "Done"

Our standalone client will create a remote process, store a temporary key, and then ask the server for the result - simple as that. Rails developers have it even easier: install the same plugin in your application and follow the configuration instructions in BackgrounDRb's RDoc. Once it is configured, you will have access to a 'MiddleMan' object in your application to both create and retrieve remote results! Executing our standalone client produces:

Job key: 21d52258feec520d3f911b323e1e8b20 Reply :: Echo back: Hey EC2!Done
EC2_DRb_files.zip - All files

Voila, now you can advantage of on-demand, scalable EC2 infrastructure in your Ruby/Rails application!

Ilya GrigorikIlya Grigorik is a web ecosystem engineer, author of High Performance Browser Networking (O'Reilly), and Principal Engineer at Shopify — follow on Twitter.