Nginx & Comet: Low Latency Server Push

Server push is the most efficient and low latency way to exchange data. If both the publisher and the receiver are publicly visible then a protocol such as PubSubHubbub or a simpler Webhook will do the job. However, if the receiver is hidden behind a firewall, a NAT, or is a web-browser which is designed to generated outbound requests, not handle incoming traffic, then the implementation gets harder. If you are adventurous, you could setup a ReverseHTTP server. If you are patient, you could wait for the WebSocket's API in HTML5. And if you need an immediate solution, you could compromise: instead of a fully asynchronous push model, you could use Comet, also known as Reverse Ajax, HTTP Server Push, or HTTP Streaming.

Coined by Alex Russell in early 2006, the term Comet is an umbrella term for technologies which take advantage of persistent connections initiated by the client and kept open until data is available (long polling), or kept open indefinitely as the data is pushed to the client (streaming) in chunks. The immediate advantage of both techniques is that the client and server can communicate with minimal latency. For this reason, Comet is widely deployed in chat applications (Facebook, Google, Meebo, etc), and is also commonly used as a firehose delivery mechanism.

Converting Nginx into a Long Polling Comet Server

A large entry barrier to Comet adoption is the implicit requirement for specialized, event driven web servers capable of efficiently handling large numbers of long polling connections. Friendfeed's Tornado server is a good example of an app level server that meets the criteria. However, thanks to Leo Ponomarev's efforts, you can now also turn your Nginx server into a fully functional Comet server with the nginx_http_push_module plugin.

Instead of using a custom framework, Leo's plugin exposes two endpoints on your Nginx server: one for the subscribers, and one for the publisher. The clients open long-polling connections to a channel on the Nginx server and start waiting for data. Meanwhile, the publisher simply POST's the data to Nginx and the plugin then does all the heavy lifting for you by distributing the data to the waiting clients. This means that the publisher never actually serves the data directly, it is simply an event generator! It is hard to make it any simpler then that.

Best of all, it only gets better from here. Both the client and the publisher can create arbitrary channels, and the plugin is also capable of message queuing, which means that the Nginx server will store intermediate messages if the client is offline. Queued messages can be expired based on time, size of the waiting stack, or through a memory limit.

Configuring Nginx & Ruby Demo

To get started you will have to build Nginx from source. Unpack the source tree, grab the plugin repo from GitHub and then build the server with the push module (./configure --add-module=/path/to/plugin && make && make install). Next, consult the readme and the protocol files to learn about all the available options. A simple multi client broadcast configuration looks like the following:

# internal publish endpoint (keep it private / protected)
location /publish {
  set $push_channel_id $arg_id;      #/?id=239aff3 or somesuch
  push_publisher;

  push_store_messages on;            # enable message queueing
  push_message_timeout 2h;           # expire buffered messages after 2 hours
  push_max_message_buffer_length 10; # store 10 messages
  push_min_message_recipients 0;     # minimum recipients before purge
}

# public long-polling endpoint
location /activity {
  push_subscriber;

  # how multiple listener requests to the same channel id are handled
  # - last: only the most recent listener request is kept, 409 for others.
  # - first: only the oldest listener request is kept, 409 for others.
  # - broadcast: any number of listener requests may be long-polling.
  push_subscriber_concurrency broadcast;
  set $push_channel_id $arg_id;
  default_type  text/plain;
}

Once you have the Nginx server up and running, we can setup a simple broadcast scenario with a single publisher and several subscribers to test-drive our new Comet server:

require 'rubygems'
require 'em-http'

def subscribe(opts)
  opts = {:head => opts[:head]}

  listener = EventMachine::HttpRequest.new('http://127.0.0.1/activity?id='+ opts[:channel]).get opts
  listener.callback do
    # print recieved message, re-subscribe to channel with
    # the last-modified header to avoid duplicate messages
    puts "Listener recieved: " + listener.response + "\n"

    modified = listener.response_header['LAST_MODIFIED']
    subscribe({:channel => opts[:channel], :head => {'If-Modified-Since' => modified}})
  end
end

EventMachine.run do
  channel = "pub"

  # Publish new message every 5 seconds
  EM.add_periodic_timer(5) do
    time = Time.now
    body = {:body => "Hello @ #{time}"}

    publisher = EventMachine::HttpRequest.new('http://127.0.0.1/publish?id='+channel).post body
    publisher.callback do
      puts "Published message @ #{time}"
      puts "Response code: " + publisher.response_header.status.to_s
      puts "Headers: " + publisher.response_header.inspect
      puts "Body: \n" + publisher.response
      puts "\n"
    end
  end

  # open two listeners (aka broadcast/pubsub distribution)
  subscribe(:channel => channel)
  subscribe(:channel => channel)
end
nginx-push.zip - Full Nginx Config + Ruby client

In the script above, every five seconds a publisher emits a new event to our Nginx server, which in turn, pushes the data to two subscribers which have long-polling connections open and are waiting for data. Once the message is sent to each subscriber, Nginx closes their connections and the clients then immediately re-establish them to wait for the next available message. End result, a real-time message push between the publisher and the clients via Nginx!

Long Polling, Streaming, and Comet in Production

Leo's module is still very young and is under active development, but it is definitely one to keep an eye on. The upcoming release is focused on bug fixes, but looking ahead there are also plans to add a streaming protocol: instead of closing the connection every time (aka, long polling), Nginx would keep it open and stream the incoming events as chunks of data to the clients in real-time. Having such an option would make it ridiculously easy to deploy your own firehose API's (ex: Twitter streaming).

Last but not least, don't forget about the growing number of other available modules for Nginx, or if you are so inclined, get a head start on building your own by reading Evan Miller's great guide on the subject.


Ilya Grigorik

Ilya Grigorik is a web performance engineer and developer advocate at Google, where his focus is on making the web fast and driving adoption of performance best practices at Google and beyond.