iskandarX Society SEO - Web Analytics






Does Analytics Accuracy Matter?

In November 2006 we began work on a study of web analytics called the “2007 Web Analytics Shootout”.  This work was prompted by a blog post by Rand Fishkin in November 2006, titled: ”Free Linkbait Idea” (http://www.seomoz.org/blog/free-linkbait-idea-on-web-analytics).

What we did with the study was we took 7 different analytics packages and ran them simultaneously across 4 different web sites.  We then analyzed the results in detail.  What we were looking for were any signs that the packages were counting basic traffic in different ways.

We were shocked when we realized just how different they are.  The different was dramatic.  This article will summarize our findings, and what it means for web site owners.

The Industry View

Many analytics industry experts will tell you that accuracy doesn’t matter.  For example, well known industry figure Avinash Kaushik tells us to “just get over it” (http://www.kaushik.net/avinash/2006/06/data-quality-sucks-lets-just-get-over-it.html).

There is an element of truth to this point of view, but it ends up not being so simple.  First and foremost, you need to have an appreciation of what the accuracy issues are because it teaches you a lot about how to get the most value out of your analytics tools.

In addition, if you don’t have an appreciation of the accuracy issues you may well use them in ways that simply don’t help, or even hurt your business.  This would be a waste of your investment in analytics tools, and people’s time to look at those tools, and that would be a shame.

What follows is a look at some of our data, and a summary of our key findings.

Summary of Data

As mentioned, we analyzed this data across 4 different sites.  On one of the sites, we also setup 2 different scenarios.  The sites that helped us with this study were:

• AdvancedMD (AMD) (http://www.advancedmd.com)
• City Town Info (CTI) (http://www.citytowninfo.com)
• Home Portfolio (HPort) (http://www.homeportfolio.com)
• Tool Parts Direct (TPD) (http://www.toolpartsdirect.com)

City Town Info is the site that we analyzed in 2 scenarios.  These are referred to as “CTI” and “CTI2” in the chart below.  We will talk a bit more about the difference in the 2 CTI related scenarios a bit more below.

Five of the vendors we tested actively participated in the study. This included their active support in setting up and configuring each of the web sites that ran their software. These were:

1. Clicktracks (http://www.clicktracks.com)
2. Google Analytics (http://www.google.com/analytics)
3. IndexTools (http://www.indextools.com)
4. Unica Affinium NetInsight (http://www.UnicaWebAnalytics.com)
5. Visual Sciences’ HBX Analytics (http://www.Visual Sciences.com)

For Omniture and WebTrends, we were able to compare their results against other packages only on one site.

The following chart shows the total number of visitors reported by each analytics package.  The labels for each site are shown at the bottom, and the relative volume of visitors is shown on the left:

This next chart shows the total number of page views reported by each analytics package during the same time period.  Once again, the labels for each site are shown at the bottom, and the relative volume of visitors is shown on the left:

Key Findings

1. Notice that on the visitor stats chart that for Tool Parts Direct (TPD) the highest reporting package (Google Analytics) reports 50% more traffic than the lowest reporting package (HBX Analytics).  This is a huge difference!  Even the most tightly packed set of results (on CTI2) shows a different of 20% between the high reporting package and the low reporting package.

2. In comparison, the page view data does not have quite as large a range of variance in the results.  The largest variance appears to have occurred on “CTI” with the highest reporting package reporting about 25% more page view than the lowest reporting lowest reporting package.

3. In general, the many industry experts will tell us that the biggest source of error in analytics is implementation error.  We heartily agree.  However, during the course of the 2007 Web Analytics Shootout the analytics vendors themselves helped us with the implementation, so we believe that the implementation errors were not a factor.

4. After some additional tests, we believe that there one major factor that impacts accuracy is the placement of the analytics JavaScript on the site.  For example, if it is placed at the end of the page (e.g. just before the /BODY tag in the HTML) it will count fewer visitors than if it is placed at the top of the page (e.g. just after the BODY tag in the HTML).

We tested this on City Town Info.  What we found is that placement at the top of the HTML resulted in a traffic count about 2% higher than when the JavaScript was placed at the bottom of the HTML.  On City Town Info the pages tested loaded in about 1.4 seconds, which is a pretty fast load time.

The reason for this is that when the analytics JavaScript is further down the page some users leave the page before the analytics JavaScript can execute. Those users will not be included in your visitor count (the traffic is essentially lost).

This error is even larger for sites that have longer page load times.  While we have not tested this as yet, we would predict an error of 5% or more in the traffic counting on pages that take 3 seconds or so to load.  This error will get worse as page load time increases, because it provides more time for a user to click on a link and move on to another page.

Of course, if they leave the entry page to go to another page on your site, you may still end up tracking the visitor somewhat, but if the visitor came from a search engine, the keyword data would also be lost.  This is not a good thing for PPC campaigns.

5. Another major factor is simple that the packages are counting different things.  Even though they are all counting “visitors” in the chart above, that term means very different things to each of them.  Analytics packages use a concept called “sessions” to count the visitors and unique visitors to your site.

Each package makes a wide range of design decisions that affect how they count.  For example, the web delivers a lot of “murky data” to the analytics packages.  For example, AOL users may have their IP address change during a visit to your site, or proxy servers may strip off all referrer information, making it hard for the analytics packages to decide how to count a particular visitor.

Another large variance between the packages is the design of the sessionization algorithm.  The industry standard is to end a session after 30 minutes of inactivity by a user. What this means is if you go to a site visit a few pages, and then go to lunch and go to a few more pages, it is counted as two visits.

The reason for this is that analytics packages that use JavaScript tagging only know when you load a web page on the site.  They don’t know when you leave the site to go to another one by typing in a URL in the address bar of the browser.  They have no way to track that.  So the industry settled on 30 minutes as a standard.

However, not all packages use the standard.  Clicktracks, for example, defaults to 15 minutes.  In addition, Clicktracks also considers any time it sees a referrer of a search engine it will treat this as a new visit.  While this seems very sensible, not all analytics packages do this.

None of these factors are actually errors.  They are all source of variance, where variance really refers to a different way of counting (as opposed to counting inaccurately).

6. Our data showed that page views tended to have a smaller level of variance. The variance in ways an analytics package can count page views is much smaller.  The main reason for this is that counting page views does not rely on sessionization.  More precisely, all that is required to count a page view is that a page be loaded and that the analytics JavaScript runs.

The numbers of potentially different ways to count page views is much smaller than it is for counting visitors and unique visitors.

7. Based on our findings, it is basically a waste of time to compare traffic numbers between different analytics packages, except to get a sense of relative order of magnitude.

What do we do with these errors?

1. One of the most important things to learn from this is that you need to focus on the things that analytics packages are good at, and stay away from their weaknesses.  Absolute numbers mean little in the world of analytics.  What matters is the relative numbers that you can measure.

For example, if you measure an average of 10.000 visitors per day in July, and then measure 12,000 visitors per day in August, your real traffic has grown by about 20%.  This is amazingly powerful information to have.  There are many, many ways that this type of power can be used.  Here are some examples:

• A/B and multivariate testing – This is very powerful in landing page optimization, where you compare the performance of different versions of your pages to see which offers the highest conversion.
• Optimizing PPC Campaigns – Use analytics to find the poor performing keywords, and fix them or dump them, or use analytics to find the highest performing keywords, and figure out possible related keywords that may also do well.
• Optimizing Organic SEO Campaigns – Find out where your organic search traffic is coming from, and discover opportunities to improve your optimization to drive results even higher.
• Segmenting visitor traffic – See how different groups of visitors behave on your site.  Do past customers behave very differently than first time visitors?  How about visitors from the US versus visitors from Canada? Or, do PPC visitors behave differently than organic search visitors?  Discovering these things are all opportunities to dial up the performance of your site.

2. There are scenarios where you want to compare numbers between sites.  For example, you may be acquiring a third party web site, and you know that you can monetize your traffic at $0.25 per unique visitor (based on your analytics package).

If the web site you are acquiring has 10,000 unique visitors per day (based on their analytics package), you may quickly do the math and figure that you can count on $2500 per day in revenue from the acquired web site.

Well what happens if their analytics package counts 50% higher than yours?  When you get the site and put your analytics package on it, and the count looks more like 6,500 unique visitors per day, you are not going to be a happy camper.

You can easily avoid this problem by making sure that your sites and the site you are considering acquiring are running the same analytics.  If need be, you can always install a free package, such as Google Analytics (http://www.google.com/analytics) on both sites during the due diligence period and get an apples to apples comparison.

3. For PPC campaigns you may want to consider placing the JavaScript higher on the page to reduce the counting error due to that factor.  Better still, if your page takes a long time to load, consider investing the time to trim down the pages and reduce this source of error.

The reason that this is very important is that for users who come to your site and leave the initial page (either by leaving the site or going to another page on your site) before the analytics Javascript executes, you will lose the referrer and keyword information.  Having this type of information is a critical component of optimizing PPC campaigns.

Summary

The critical lesson is that the tools are not accurate, and as Avinash Kaushik says, get over it.  However, there are scenarios where the inaccuracies of the analytics tools can really hurt you.  If you are new to analytics, make learning about the nature of their errors part of your agenda.  Knowing what they are good at, and knowing what they are not good at, will save you a lot of heartache.

In addition, focus on their relative measurement capabilities, because they are worth their weight in gold.

In other words if your analytics package tells you that Page A converts better than Page B, that’s money in the bank.  If the software tells you that certain keywords offer the highest conversion rates that is also money in the bank.  Or, if it says that European visitors buy more blue widgets than North American visitors – you got it – more money in the bank.

Web analytics, done right, is hard.  However, done right, web analytics can provide an outstanding ROI on the time and money you put into it, and doing it well provides you with a major advantage over your competitors who do it less well.

Eric Enge is well known in the SEO industry for a variety of reasons, including the content that he publishes on the Stone Temple blog (http://www.stonetemple.com/blog), the Stone Temple Article series (http://www.stonetemple.com/STC_Articles.shtml), and his columns on Search Engine Watch (http://www.searchenginewatch.com).

Comments


Recent Posts Widget



Enter your email address:

Delivered by FeedBurner

Followers