Driving Google Chrome via WebSocket API

Instrumenting a browser is not for the faint of heart: half a dozen of different API's, different IPC mechanisms, and different capabilities from each vendor. Projects like WebDriver try to abstract this complexity for us, and you can also find dozens other "headless" drivers leveraging WebKit or similar engines. There is now even a W3C WebDriver spec in the works.

Instrumenting Google Chrome

However, while creating a generic solution is a hard task, turns out that instrumenting Chrome is a breeze - as I recently discovered while investigating some network latency questions. As of version 18, Chrome now supports v1.0 of Remote Debugging Protocol, which exposes the full capabilities of the browser via a regular WebSocket!

$> /Applications/Path To/Google Chrome --remote-debugging-port=9222 # OSX
$> curl localhost:9222/json
[ {
   "devtoolsFrontendUrl": "/devtools/devtools.html?host=localhost:9222&page=1",
   "faviconUrl": "",
   "thumbnailUrl": "/thumb/chrome://newtab/",
   "title": "New Tab",
   "url": "chrome://newtab/",
   "webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/1"
} ]

First, we enable remote debugging on Chrome (off by default). From there, Chrome exposes an HTTP handler, which allows us to inspect all of the open tabs. Each tab is an isolated process and hence inherits its own websocket, the path for which is the webSocketDebuggerUrl key. With that, let's put it all together:

require 'em-http'
require 'faye/websocket'
require 'json'

EM.run do
  # Chrome runs an HTTP handler listing available tabs
  conn = EM::HttpRequest.new('http://localhost:9222/json').get
  conn.callback do
    resp = JSON.parse(conn.response)
    puts "#{resp.size} available tabs, Chrome response: \n#{resp}"

    # connect to first tab via the WS debug URL
    ws = Faye::WebSocket::Client.new(resp.first['webSocketDebuggerUrl'])
    ws.onopen = lambda do |event|
      # once connected, enable network tracking
      ws.send JSON.dump({id: 1, method: 'Network.enable'})

      # tell Chrome to navigate to twitter.com and look for "chrome" tweets
      ws.send JSON.dump({
        id: 2,
        method: 'Page.navigate',
        params: {url: 'http://twitter.com/#!/search/chrome?q=chrome&' + rand(100).to_s}
      })
    end

    ws.onmessage = lambda do |event|
      # print event notifications from Chrome to the console
      p [:new_message, JSON.parse(event.data)]
    end
  end
end

In this example we tell Chrome to enable network tracking and notifications, and then tell it to perform a Twitter search. With that, Chrome will forward us dozens of network notifications: initial page fetch, notifications for each subresource, XHR's, and so on (ex, Network.responseReceived event). In fact, if you leave the page running, you will also see the long-poll events firing to fetch the latest tweets. Tons of information, all at your disposal.

Remote Debugging (and more) with Chrome

The example above illustrates a very simple interaction with the Network API, but the protocol exposes much more. You can drive the JS debugger, control the V8 VM, modify and inspect the DOM, and track Timeline events amongst half a dozen other capabilities. Finally, while driving a desktop browser is cool, driving a browser on your phone is even nicer: Chrome for Android provides all the same capabilities.

Ilya GrigorikIlya Grigorik is a web ecosystem engineer, author of High Performance Browser Networking (O'Reilly), and Principal Engineer at Shopify — follow on Twitter.