Visual Database Explorer in Ruby

Last night I wrote a quick database explorer for an HCI project at U of Waterloo (screenshot above). I had a database of lead stories for BBC, Yahoo News and NY Times over a period spanning Sept. 2004 to Aug. 2005 and I wanted to juxtapose and visualize different queries. The concept is simple, given two words (ex: war / peace), I wanted to see some aggregate calculations over the available document corpus.

At first, I wanted to write about a great Ruby graphing library (Scruffy) I found, but later realized that I could nail three different concepts at once: writing a simple threaded web-server, interfacing with a database, and generating live graphs (SVG/XML). It's a nice trifecta and I learned a few nice tricks along the way, hopefully you'll pick up something as well. Let's get to it!

Building a threaded web-server in Ruby

This is a neat one, we're going to build a light-weight threaded web server! Why threaded? Well, primarily for reliability reasons in this case - if a thread crashes, our server is still operational because our parent thread is still listening, and in a development environment that could very helpful. (My code is not perfect,in fact, I know it will crash on some edge cases!)

require 'socket'

# HTML Formatting - Note: keep the header for XHTML/SVG support.
htmlHeader = <<-'EOS'
<?xml version="1.0" encoding="UTF-8"?>
<?DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd" type=""?>
<html xmlns='http://www.w3.org/1999/xhtml'>
<body>
<center>
EOS

# Open connection on port 9090
server = TCPServer.new(9090)
while true
    # Launch a new thread on every connection // accept stays blocked untill new connection arrives
    Thread.start(server.accept) do |session|
        request = session.gets

        # filter all non '/term1/term2' requests (ex: /favicon.ico)
        if not request.chomp.split[1].to_s =~ /^(\/(.*)\/(.*))/
            p "#{Time.now.strftime('[Request at %I:%M:%S]')} Malformed request: #{request}"
            session.close
            next
        end

    # We need to send the HTTP header to let the client know we're alive
    #   - We will be sending XML/SVG graphs later, hence application/xhtml+xml header
    session.print "HTTP/1.1 200/OK\rContent-type: application/xhtml+xml\r\n\r\n"
    session.print htmlHeader

    # Extract the request words from the request string
    # Ex:  "GET /war/peace HTTP/1.1\r\n"
    #      1) split by ' ' (space) and take the second item in array [1]
    #      2) split request string by '/' char, and take the 1st and 2nd item in array as terms
    terms = request.split[1].split("/")[1,2]

    # Log request to console
    p "#{Time.now.strftime('[Request at %I:%M:%S]')} #{terms.join(' ')}"

    # Send some data to client and close the connection
    session.print "Hello, I'm a threaded ruby tcp server! You asked for: #{terms.join(' ')}"
    session.print "</center></body>"
    session.print "</html>"
    session.close
  end
end

As you can see, I made the server send back XHTML headers, you'll see why soon (hint: we're going to be sending SVG graphs back to the client, which are pure XML).

Accessing the SQL database from Ruby

DBI is a great Ruby gem for accessing databases of virtually any flavor. Add the require statement, and you're ready to go. I'll let the code speak for itself:

require 'dbi'

begin
    # Connect to the SQL Database
    dbh = DBI.connect("DBI:Mysql:dbname:localhost", "root", "password")
    row = dbh.select_one("SELECT VERSION()")
    puts "Server version: " + row[0]

    # EXAMPLES: accessing the database:

    # Execute a query and iterate over the rows
    sth = dbh.execute("SELECT * FROM nytimes_news ORDER BY date ASC")
    while row = sth.fetch_hash do
        puts row["headline"]
    end

    # Alternatively, retrieve all records at once
    result = dbh.select_all("SELECT * FROM nytimes_news ORDER BY date ASC")
    result.each { |record| puts record[1] }

    # Our aggregate query code (will be used in next step)
        query =  "select count(*) as occurence
                 from nytimes_news headline like '%#{q}%' or story like '%#{q}%'
                 group by DATE_FORMAT(date, '%c,%y') order by DATE_FORMAT(date, '%Y %c')"

        result = dbh.select_all(query)

    # Graph code will go below...
    # .. On to step 3

    # Just in case, catch any database exceptions and print the SQL error
    #  - Helpful in debugging bad SQL queries
    rescue DBI::DatabaseError => e
        puts "An error occurred"
        puts "Error code: #{e.err}"
        puts "Error message: #{e.errstr}"
    ensure
        dbh.disconnect if dbh
end

Note that I've added some exception handling, which is helpful in debugging bad SQL queries. Though, of course, I would suggest creating and debugging your queries in an SQL console/client first.

Graphing data with Scruffy

Scruffy is an awesome little tool by Brasten Sager for generating SVG graphs on the fly. And to top it off, it also supports direct output to any image format supported by RMagick/Batik (PNG/JPG/etc.) You can see some great examples of it at work on his development blog. So let's get to it, in step two we created an aggregate query which returned an array of numbers (representing number of stories per month), now we're going to pass our array to Scruffy and let it handle the rest:

require 'scruffy'

# Define our new graph, give it a theme (note: Keynote is default, so :theme => ... is redundant here
graph = Scruffy::Graph.new(:theme => Scruffy::Themes::Keynote.new)

# Tell scruffy how we want the values to be formatted (number, currency, etc.)
graph.value_formatter = Scruffy::Formatters::Number.new(:precision => 0)

# Give our graph a title
graph.title = "#{terms[0].capitalize} vs #{terms[1].capitalize}"

# X-Axis labels... In my case, the data spanned Sept 2004 to May 2005.
graph.point_markers = ['Sept', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May']

# Plot the actual data. Note: 'result' is the array we retrieved from the aggregate query in step 2
graph.add :area, "#{terms[0].capitalize}", result

# That's it, call render and it will return the SVG (XML) representation of our graph
svg = graph.render(:width => 800)

Well, that was simple, wasn't it! Check out the documentation for Scruffy to learn about a myriad of other cool features. Now we've got all the components: a threaded web-server, we can run SQL queries, and we can also graph the data - let's pull it all together in a single application.

graph-frequencies.rb - Full application

Take a look at my code, it's the exact copy of the application I'm using to create dynamic graphs you saw at the begining of the post. Note that most of the code is actually SQL processing. I had to handle the case where no story appeared during certain months, hence the 'map' array and checking for indexes. Otherwise, its just all three sections discussed above combined in a single application. However, before you run away to build your own data explorer, there is one little catch you need to be aware of. By default, SVG code produced by Scruffy will look like:

?DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd" type="?>... svg code here

DOCTYPE is only supposed to be sent once and it is supposed to be the first thing we send back to the client. We met the second requirement (check the htmlHeader in step 1), but we don't want to resend the DOCTYPE within the graphs generated by Scruffy. Here is a quick hack to remove the first two lines:

svg = graph.render(:width => 800).split("\n")session.print svg[2, svg.size].join("\n")

sWe split the SVG code on the newline character and effectively discard the first two lines (and put the code back together with join) when we send the code back to the client. Now we are all done, point your browser to your homemade data visualization tool and explore the data! Here is a few more interesting screenshots from my dataset:


Ilya Grigorik

Ilya Grigorik is a web performance engineer and developer advocate at Google, where his focus is on making the web fast and driving adoption of performance best practices at Google and beyond.