Google / Yahoo Sitemaps in Rails

There are three different strategies you could adopt with regards to Google sitemaps: ignore them, use a third party tool, or create them on the fly with a few lines of code in RoR. Assuming that the first option out of the question, when should you use the other two? Well, that depends on your site! If it is relatively static - company website which lists your products (which rarely change) then you should save your time and use a free tool (GSiteCrawler for windows users) to generate a one off sitemap. However, if you're running a Rails site, you should always generate your sitemap on request (sure, you can cache it), since you have a database driven site you want to make sure the spiders see the latest content.

Luckily, Rails has built in XML support, which makes the entire process exceptionally easy for us. First, we need to decide what will go into our sitemap. As an example, let's assume our site allows users to create and store Widgets, which can be organized in lists, and we also have a development blog. So let's prepare our queries:

  def sitemap
      @widgets = (Widget.count(:all, :select => 'id', :conditions => "approved = 1").to_f / 10).ceil
      @lists = List.find(:all, :select => 'lists.id, lists.title, lists.updated_at, users.login', :conditions => "bookmarks_count > 0", :include => 'user')
      @users = User.find(:all, :select => 'id, login, updated_at', :conditions => "activated == 1")
      @posts = Post.find(:all, :select => 'id, title, created_at')
      render :action => 'google_sitemap'
  end

We defined a controller action 'sitemap', but here is a catch, our Widgets don't have a page of their own, instead we list them in groups of 10 on our site. No worries, first line will count the number of widgets and translate that to the number of pages we need (ceil rounds up the float, ex: 72 widgets = 7.2 pages, ceil(7.2) = 8 ). Next, we retrieve lists, users and blog posts we wish to appear in the sitemap. Now we just need to format the XML:

xml.instruct!

xml.urlset "xmlns" => "http://www.google.com/schemas/sitemap/0.84" do
  xml.url do
    xml.loc         "http://www.yourdomain.com/"
    xml.lastmod     w3c_date(Time.now)
    xml.changefreq  "always"
  end

  1.upto(@widgets) do |page|
    xml.url do
      xml.loc         url_for(:only_path => false, :controller => 'widgets', :action => 'list', :page => page)
      xml.lastmod     w3c_date(Time.now)
      xml.changefreq  "daily"
      xml.priority    0.9
    end
  end

  @lists.each do |list|
    xml.url do
      xml.loc         url_for(:only_path => false, :controller => 'lists', :action => 'view', :id => list)
      xml.lastmod     w3c_date(list.updated_at)
      xml.changefreq  "weekly"
      xml.priority    0.8
    end
  end

  @users.each do |user|
    xml.url do
      xml.loc         url_for(:only_path => false, :controller => 'profiles', :action => 'show', :username => user.login)
      xml.lastmod     w3c_date(user.updated_at)
      xml.changefreq  "weekly"
      xml.priority    0.7
    end
  end

  @posts.each do |post|
    xml.url do
      xml.loc         url_for(:only_path => false, :controller => 'posts', :action => 'show', :id => post)
      xml.lastmod     w3c_date(post.created_at)
      xml.changefreq  "weekly"
      xml.priority    0.6
    end
  end

end

Easy, huh! We simply iterate over all of our collections and provide the appropriate links. Several things to note - :only_path => false forces Rails to produce an 'absolute' URL to your application, not a relative path (http://www...com/lists/view/id instead of /lists/view/id) - this is a requirement for sitemap files. You can also adjust your priority either by hand, or use a compute function (ex: based on a 'rating' of a widget). One more catch, w3c_date function is not a standard primitive in Rails, here is my definition of it (from a helper file):

   def w3c_date(date)
      date.utc.strftime("%Y-%m-%dT%H:%M:%S+00:00")
   end

Almost done, now we just need to add a route to our sitemap. In your conf/routes.rb.

 map.connect "sitemap.xml", :controller => "xml", :action => "sitemap"

Point your browser to yourdomain.com/sitemap.xml and voila! A dynamic Google/Yahoo sitemap generator in 57 lines of code (46 of it - XML formatting). Now just head over to Google and register your newly found SEO goodness. You'll get access to all kinds of interesting stats in your sitemaps account, it's definitely worth the trouble!

Ilya GrigorikIlya Grigorik is a web ecosystem engineer, author of High Performance Browser Networking (O'Reilly), and Principal Engineer at Shopify — follow on Twitter.