Google / Yahoo Sitemaps in Rails
There are three different strategies you could adopt with regards to Google sitemaps: ignore them, use a third party tool, or create them on the fly with a few lines of code in RoR. Assuming that the first option out of the question, when should you use the other two? Well, that depends on your site! If it is relatively static - company website which lists your products (which rarely change) then you should save your time and use a free tool (GSiteCrawler for windows users) to generate a one off sitemap. However, if you’re running a Rails site, you should always generate your sitemap on request (sure, you can cache it), since you have a database driven site you want to make sure the spiders see the latest content.
Luckily, Rails has built in XML support, which makes the entire process exceptionally easy for us. First, we need to decide what will go into our sitemap. As an example, let’s assume our site allows users to create and store Widgets, which can be organized in lists, and we also have a development blog. So let’s prepare our queries:
def sitemap @widgets = (Widget.count(:all, :select => 'id', :conditions => "approved = 1").to_f / 10).ceil @lists = List.find(:all, :select => 'lists.id, lists.title, lists.updated_at, users.login', :conditions => "bookmarks_count > 0", :include => 'user') @users = User.find(:all, :select => 'id, login, updated_at', :conditions => "activated == 1") @posts = Post.find(:all, :select => 'id, title, created_at') render :action => 'google_sitemap' end
We defined a controller action ‘sitemap’, but here is a catch, our Widgets don’t have a page of their own, instead we list them in groups of 10 on our site. No worries, first line will count the number of widgets and translate that to the number of pages we need (ceil rounds up the float, ex: 72 widgets = 7.2 pages, ceil(7.2) = 8 ). Next, we retrieve lists, users and blog posts we wish to appear in the sitemap. Now we just need to format the XML:
xml.instruct! xml.urlset "xmlns" => "http://www.google.com/schemas/sitemap/0.84" do xml.url do xml.loc "http://www.yourdomain.com/" xml.lastmod w3c_date(Time.now) xml.changefreq "always" end 1.upto(@widgets) do |page| xml.url do xml.loc url_for(:only_path => false, :controller => 'widgets', :action => 'list', :page => page) xml.lastmod w3c_date(Time.now) xml.changefreq "daily" xml.priority 0.9 end end @lists.each do |list| xml.url do xml.loc url_for(:only_path => false, :controller => 'lists', :action => 'view', :id => list) xml.lastmod w3c_date(list.updated_at) xml.changefreq "weekly" xml.priority 0.8 end end @users.each do |user| xml.url do xml.loc url_for(:only_path => false, :controller => 'profiles', :action => 'show', :username => user.login) xml.lastmod w3c_date(user.updated_at) xml.changefreq "weekly" xml.priority 0.7 end end @posts.each do |post| xml.url do xml.loc url_for(:only_path => false, :controller => 'posts', :action => 'show', :id => post) xml.lastmod w3c_date(post.created_at) xml.changefreq "weekly" xml.priority 0.6 end end end
Easy, huh! We simply iterate over all of our collections and provide the appropriate links. Several things to note -
:only_path => false forces Rails to produce an ‘absolute’ URL to your application, not a relative path (http://www…com/lists/view/id instead of /lists/view/id) - this is a requirement for sitemap files. You can also adjust your priority either by hand, or use a compute function (ex: based on a ‘rating’ of a widget). One more catch, w3c_date function is not a standard primitive in Rails, here is my definition of it (from a helper file):
def w3c_date(date) date.utc.strftime("%Y-%m-%dT%H:%M:%S+00:00") end
Almost done, now we just need to add a route to our sitemap. In your
map.connect "sitemap.xml", :controller => "xml", :action => "sitemap"
Point your browser to
yourdomain.com/sitemap.xml and voila! A dynamic Google/Yahoo sitemap generator in 57 lines of code (46 of it - XML formatting). Now just head over to Google and register your newly found SEO goodness. You’ll get access to all kinds of interesting stats in your sitemaps account, it’s definitely worth the trouble!