Tuesday, June 06, 2006

Google Sitemaps: overview of benefits

.. looking back with pleasure

Sometime in September 2005 Google put out the Google Sitemaps product. To help website owners all over the world to inform Google of the pages of their websites. Being a website owner, designer and developer I immediately jumped on the wagon and tried it out. I started an experiment to try the effectiveness out and see if it did really do what I hoped it would do: make my new content appear in the Google index more quickly than before.

My experiment

In a couple of posts ( 1, 2 & 3 ) I outlined the experiment I did with Google Sitemaps on my site.

What was the setup?

I created a sitemap of my website. In this XML file all the interlinked pages appear. I added two pages to my website:

  1. a page (A) that I added to my sitemap.xml but that was never linked from any the other pages on my website
  2. another page (B) that I did not add to my sitemap.xml, but it was linked to from page A.

Both pages had legitimate content and also link back to other pages of my website. Actually, both were to be part of my website but I merely have omitted putting links on the other pages pointing to these new pages. They are in essence orphans. But could also be seen as landing pages.

The experiment was to see if the Google crawler would visit these two pages. The first directly from the sitemap.xml and the second through crawling the links on the first page.

So what happened?

After having submitted the sitemap.xml I was filled with joy when I discovered that Google scanned the file within 24 hours. Moreover it appeared to come by twice a day. That did not mean that the pages appeared in the Google search results immediately, but still it gave me hope that they would after the next crawl.

Numeric examples of PageRanks in a small system.Image via Wikipedia

So, I kept checking my website logs to see if any of the secret pages (A or B) would pop up in my requested pages list. And checked again and checked again.

Now for the real purpose of the Sitemaps: getting your pages crawled. The Crawl process can take some time before it happens. The predominant belief is that a fresh crawl of the Web is done every couple of weeks or once a month.

So, after uploading my sitemap I expected it could take up to a month before my hidden pages were discovered. But after a couple of days the first secret page A was crawled. Bang! The sitemap.xml really showed its purpose. The file was indeed used by the Google crawler to crawl pages.

The second page did not show up immediately. So apparently, the links on the page were not yet crawled. They were probably queued to be crawled in a next crawl. Later they indeed did show up and so that part of the process worked as well.

The experiment was successful and has convinced me that Google Sitemaps really add something to a website.

Added value

A bonus of the Google Sitemap system is that a website master also has the possibility to view reports on the last crawl results. This can be achieved by placing an html file with a name provided by Google to verify that you can manage the site. Google then provides you with a list of failed pages. Pages that no longer exist or have another error.

These statistics have been extended over the last few months and now show also the most used search strings and also the search queries with the most clicks. There is more like a robot.txt checker and error reports. All very valuable for a web site owner. Read more about these handy sitemap statistics.

Conclusion of the experiment

By adding a Google Sitemaps file to your website (and keeping it up to date!) you can ensure that new pages are crawled at the next scheduled crawl. If you have a popular CMS there is probably already a plugin available to create and maintain a sitemap. I can recommend it to any site owner. It is worth the effort. For me it is a little bit of work because I still use static html pages on my site.

More goodies

Some time ago after the experiment more goodies were added. What extras are hidden in the Google Sitemaps console?

  • Crawl statistics: Pages successfully crawled, pages blocked by robots.txt, pages that generated HTTP errors or were unreachable.
  • The PageRank distribution within your site.
  • Various indexing stats (pages indexed, etc.)

If you, for a completely obscure reason, do not wish to create a Google Sitemap, you can rest assured. You do not need to have Google Sitemap to be able to use this functionality. All you have to do is create an empty HTML file and upload to your site to verify that you own the site. And then you start leveraging the benefits. For free.

Google Sitemaps turns itself into an absolutely awesome troubleshooting tool for all webmasters.

Experimentation is fun, but real life implementation is better

This interesting experiment shows that Google Sitemaps is doing what it is supposed to do: making pages noted by the Google crawlers.

As said we launched our new website on Tuesday 30 May and implemented Google Sitemaps with it. I had registered the sitemap with Google a couple of days before the launch so it was trying to download the then still not existing sitemap. On launch day around 6 AM our time the sitemap was downloaded by Google. That was the first possible moment less than two hours after the site went live.

Success #1: new pages are already in the index

Two days later we see already many pages in the Google Index. When I did a search on the first 100 pages changed in the last three months on Google I found that of these only 12 are still pointing to our old site. My guess was that these are there only because we had some problems setting the rewrite rules for the old asp files. That was not entirely correct as a week later we still see old pages, but those are different ones.

The Google index now already returns new pages only days after going live. Without the Google Sitemap this would taken much, much longer.

Success #2: we have a check on what goes wrong

Thanks to the statistics that we receive from Google we can now track the old, no longer existing pages that are being crawled. We can see errors popping up and if necessary we can further tweak our web application or web server to gracefully handle these.

Conclusion

Implementing Google Sitemaps has given us already huge advantages in the first week of going live. By having the Google Sitemaps ready at go-live we have been able to have Google find our new pages and content.

I would strongly recommend implementing Google Sitemaps with any web project so the pages will appear as quickly as possible in the index. Even though we had to implement the Google Sitemaps ourselves within our site framework it was not the greatest of challenges we had.

Many of the (open source) content management systems nowadays provide Google Sitemaps out of the box or through a plugin. So, there is hardly any reason not to implement them.

Update (4th September 2008)

The Google Sitemaps protocol has in the meantime been adopted by more search engines and is now als an integral part of the Robots.txt. So, it's nowadays even more important to use the protocol.

Better still: It's now an open standard.

Reblog this post [with Zemanta]