Sitemap Generator for Mephisto

January 14th, 2007

On the SEO front, there are a myriad of techniques for informing search engines about your site's content. One such technique is to generate a "sitemap". From Sitemaps.org, the new neutral 3rd party managing the format that that Google, Yahoo, and MSN have agreed to adopt:

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Check out an example here: http://40withegg.com/sitemap.

Anywho, I'll let you do your own research at Sitemaps.org and Google Webmaster Tools . I wrote a sitemap generator for the Mephisto blogs, such as this one. This code can be very easily modified for any Rails site by removing the references to Mephisto-specific constructs (such as @articles, @site, etc.).

Mephisto Sitemap Code


    module Mephisto
      class Routing
        def self.connect_with(map)
          # Allows access to the sitemap!
          map.connect    'sitemap', :controller => 'sitemap' 

          # Original code starts here.
          map.feed    'feed/*sections', :controller => 'feed', :action => 'feed'
          ... 

Trixy W3C Format

The only thing I had problems with was the lastmod date format used by the sitemap, which is W3c Datetime. For example:

<lastmod>2007-01-06T22:43:31-08:00</lastmod>

Particularly problematic was the Timezone information, which for me is Pacific, or 8:00 hours behind UTC/GMT, represented by the "-08:00" in lastmod. None of the Ruby, Rails, or TZinfo methods gave this format to me in one call. Here's what I tried:

  • article.updated_at.strftime('%Y-%m-%dT%H:%M:%S%Z') -- produced "2007-01-09T22:54:28Pacific Standard Time"
  • article.updated_at.xmlschema -- produced "2007-01-09T22:54:28Z"

Here's what I ended up doing:

time_zone = TimeZone.new(@site.timezone.current_period.utc_offset)
... 
xml.lastmod(article.updated_at.strftime("%Y-%m-%dT%H:%M:%S#{time_zone.formatted_offset}"))

Modify and improve it at will!