Wednesday, October 05, 2005

Google Sitemaps: do they work?

.. yep they do

In some posts (1, 2 & 3) already some time ago I outlined the experiment I did with Google Sitemaps on my site. As the Google crawling process takes some time to finish it has also taken some time before I can present my conclusions. So here they come.

What was the setup?

I created a sitemap of my website. In this xml file all the interlinked pages appear. I added two pages to my website:

  1. a page that I added to my sitemap.xml
  2. another page that I did not add to my sitemap.xml, but it was linked to from the first page.

Both pages have legitimate content and also link back to other pages of my website. Actually, both were to be part of my website but I merely have omitted putting links on the other pages pointing to these new pages. They are orphans.

The experiment was to see if the Google crawler would visit these two pages. The first directly from the sitemap.xml and the second through crawling the links on the first page.

So what happened?

After having submitted the sitemap.xml I was filled with joy when I discovered that Google scanned the file within 24 hours. Moreover it appears to come by twice a day. Even without resubmitting the file. That did not mean that the pages appeared in the Google search results immediately, but still it gave me hope that they would after the next crawl.

So, I kept checking my website logs to see if any of the secret pages would popup in my requested pages list. And again and again.

Then after some weeks the first secret page was crawled. Bang! The sitemap.xml really showed its purpose. The file was indeed used by the Google crawler to crawl pages.

The second page did not show up. So apparently, the links on the page were not crawled. Maybe they were listed to be crawled in the next crawl. I don't know and for now I do not really care so much. The first part of my experiment was successful and has convinced me that Google Sitemaps really add something to a website.

I have decided not to wait on that next crawl to see if the other secret orphan was crawled. As the two pages both contain valid content and therefore keeping them secret is depriving my site of valuable pages. So I have once again update my site and the secret pages are no longer secret.

Added value

A bonus of the Google Sitemap system is that a website master also has the possibility to view reports on the last crawl results. This can be achieved by placing an html file with a name provided by Google to verify that you can manage the site. Google then provides you with a list of failed pages. Pages that no longer exist or have another error.

Conclusion

By adding a Google Sitemaps file to your website (and keeping it up to date!) you can ensure that new pages are crawled at the next scheduled crawl.

I can recommend it to any site owner. It is worth the effort. For me it is a little bit of work because I still use static html pages on my site, but for a user with a modern cms it can be set up automatically.