While Google Analytics makes it dead simple to start recording visitor traffic to your web site, it’s common for Google to record data in fragmented, nonsensical ways:
- pages on your site that can be reached by multiple, slightly different urls will appear in your Analytics as separate pages
- traffic from different permutations of your domain name show up as referrers to your site, rather than being understood to be the same web property (think www.example.com vs example.com vs secure.example.com)
- depending on how quickly the elements of your pages load, Google’s analytics code may not be invoked in time to record a visit, creating a discontinuous record of browsing behavior on your site.
Fortunately there are some pretty simple steps we can take before handing our visitor information to Google that will sort all these things out. In part one of this series of posts, we’ll tackle pages with multiple addresses.
Canonical URLs
Blog platforms and Content Management systems are rife with ways to address the same content with more than one URL. Blogs may let you refer to a post with or without a date, by category, or just by it’s title. Content management systems often make it possible to refer to a page both by a thoughtfully worded url-slug, or by a mysterious id number. Depending how your site visitors found your site, they could land on any number of these addresses.
Not having a definitive url for a page or post not only fragments your analytics data, it can confuse search engines and hurt your SEO efforts by leading them to conclude you have duplicate content all over your site. The answer to both of these problems is called the canonical url. It is a way of embedding, within the page content itself, a definitive address for a page, no matter what the address bar of the browser says.
Thankfully, these same software platforms that fragment your pages’ identities often have an easy way built in, or a plugin-based solution, to create canonical urls for all your content. And if you aren’t running on a platform that does it for you, adding a canonical url is as easy as pasting the following snippet of html into the <head> section of your page:
<link rel="canonical" href="/your-definitive-page-title.html" />
Sending Google a Canonical URL
Once you’ve got a canonical link in the header of your html file, you can easily pull that url and feed it to Google, rather than whatever arbitrary url the visitor may have used to reach your page. This insures that all the visits are recorded as occurring on the same page. To accomplish this, there are two simple steps. First, above your google tracking code, cut and paste the following bit of Javascript (hat tip to Eric Vold for the technique, which I’ve condensed here):
<script type=”text/javascript”>
function canonical_url() {
try {
var links = document.getElementsByTagName('link');
for( var i=0; i < links.length; i++ ) {
if ( links[i].getAttribute('rel') == 'canonical' ) {
return links[i].getAttribute('href');
}
}
} catch(e) {}
return '';
}
</script>
Then, you’ll need to modify your Google Analytics tracking code to call your canonical url lookup. There are two flavors of tracking code, which Google refers to as Traditional and Asynchronous. You’ll likely recognize the one you cut and pasted when you added Analytics to your site. In each case, you’re going to modify the “trackPageView” line:
Traditional:
<script>
var gaJsHost = (("https:" == document.location.protocol) ? "<a href="http://www.google.com/url?sa=D&q=https%3A%2F%2Fssl" target="_blank">https://ssl</a>." : "<a href="http://www.google.com/url?sa=D&q=http%3A%2F%2Fwww" target="_blank">http://www</a>.");
document.write(unescape("%3Cscript src='" + gaJsHost + "<a href="http://www.google.com/url?sa=D&q=http%3A%2F%2Fgoogle-analytics.com%2Fga.js" target="_blank">google-analytics.com/ga.js</a>' type='text/javascript'%3E%3C/script%3E"));
</script>
<script>
try{
var pageTracker = _gat._getTracker("UA-xxxxxx-x");
pageTracker._trackPageview( canonical_url() || window.location );
} catch(err) {}
</script>
Asynchronous:
<script>
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-XXXXX-X']);
_gaq.push(['_trackPageview', canonical_url() || window.location ]);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? '<a href="http://www.google.com/url?sa=D&q=https%3A%2F%2Fssl" target="_blank">https://ssl</a>' : '<a href="http://www.google.com/url?sa=D&q=http%3A%2F%2Fwww" target="_blank">http://www</a>') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
Bonus Points: Now with jQuery
A great many sites online use jQuery as part of their scripting toolkit. If you’re comfortable with taking the time to load the basic jQuery library before calling your analytics tracking code (which can be especially speedy if you use Google’s hosted copy of the library), you can do away with the canonical_url() function we defined earlier, and just insert a bit of jQuery magic right into the tracking code, on the same line we modified earlier:
Traditional
pageTracker._trackPageview( $('link[rel=canonical]').attr('href') || window.location );
Asynchronous
_gaq.push(['_trackPageview', $('link[rel=canonical]').attr('href') || window.location ]);
Wrapping Up
Armed with that and the code above, you should be one step closer to a more cohesive view of your site’s traffic. In part 2 I will be covering our next obstacle: getting Google to treat your different subdomains as though they are all the same site, which will keep them from obscuring your referral traffic from actual third party sites.

