Ruby Proxies for Scale and Monitoring

Lift the curtain behind any modern web application and you will find at least a few proxy servers orchestrating the show. Caching proxies such as Varnish and Squid help us take the load of our application servers; reverse proxies such as Haproxy and Nginx help us partition and distribute the workload to multiple workers, all without revealing the underlying architecture to the user. In the Ruby world, Rack middleware and Rails Metal are sister concepts: both allow the programmer to inject functionality in the pre or post-processing step of the HTTP request.

However, nobody said that we should limit ourselves to HTTP, or that the proxy server has to be transparent to the user! After all, there is a great number of other potential use cases which we can use in our infrastructure: intercepting data, validating requests, benchmarking, logging, etc. In fact, a proxy server can be a powerful swiss-army knife in the right hands. Want to intercept SMTP traffic to detect spam? Maybe encrypt or decrypt a datastream on the fly? It’s all surprisingly simple with Ruby.

Proxy language: Transparent, Intercepting, Cut-through ...

Proxy servers can be placed in numerous places between the user and the destination, they can be chained and they can even alter the data at will. A transparent proxy will not modify the request or response and is commonly used for load balancing, authentication, or validation. On the other hand, an intercepting proxy is often used to modify the request or response to provide some added service to user or architect: transform data on the fly, encryption, extending a protocol, etc. Needless to say, intercepting proxies are a wonderful tool!

Ruby Proxy: Duplex Benchmarking

With minimal overhead, a proxy server can allow us to duplicate a request to multiple servers, for example to production and staging. Instead of the "record and replay headache" we can instrument ourselves with a real-time performance debugging and monitoring tool: a request gets forked but only the production response is forwarded back to the client. From there, we can analyze the response time in realtime, compare the response bodies, or even alter the data at will:

options = {
  :proxy => {:port => 9000, :host => "10.2.1.0"},
  :production => {:port => 9000, :host => "10.2.1.1"},
  :benchmark => {:post => 9000, :host => "10.2.1.2"}
}

EventMachine.run do
  EventMachine::ProxyServer.new(options).start
end

em-proxy - Ruby EventMachine Duplex Proxy

EM-Proxy is a barebones proxy implemented with Ruby EventMachine which uses the reactor pattern for handling network connections. The performance overhead in the simplest proxy implementation is roughly 3-5% in the latency - a very low cost for the added functionality. Best of all, it is only ~300 lines of Ruby start to finish, and easily extensible. Take a look through the source, it's a powerful and wonderful hammer!

Updated slides from RailsConf 2009 and updated code for EM-Proxy on Github.


Ilya Grigorik

Ilya Grigorik is a web performance engineer and developer advocate at Google, where his focus is on making the web fast and driving adoption of performance best practices at Google and beyond.