Ever since I’ve used LISHost for various purposes–this blog throughout its history (except for a few months last year), Cites & Insights since mid-June 2006, my personal site since its inception–I’ve used Urchin to track site usage (unless Blake added Urchin more recently). Currently, my sites use Urchin 5. (Apparently, some LISHost sites on another server use Urchin 6, and none of this necessarily applies to them.)
I like Urchin. It defaults to a weekly view with a nice range of options, and you can expand it to a much broader timeline (although it runs into trouble if the timeline is too long or the logs to be analyzed too large: I’m not sure which). I’ve done reports on an entire year. For the reports I mostly care about–for C&I, file download figures (for PDF) and pageview figures (for HTML)–exporting reports works well. Robots (spiders) are separated out into a separate subsection. The number seem consistent–that is, there’s nothing in any of the numbers to suggest faulty logic, and at least some download/pageview numbers are consistent with what I’d expect from other sources.
Recently, I decided to try Google Analytics as an alternative (without disabling Urchin, to be sure). Urchin’s now owned by Google, and I believe Urchin 6 distinctly reflects that–and the ownership does mean that Urchin help is mostly not working very well. Unlike Urchin 5, Google Analytics doesn’t analyze server logs: You have to put tracking code on every page you want it to track, and it relies on calls to Google’s own servers. I only wanted to try it for Walt at Random, and since very page uses the “footer” code, it was easy enough to put the GA code segment into that portion of the site’s HTML–just before the “</body”> tag, as suggested by GA. (This clearly wouldn’t work well for Cites & Insights, where the numbers I’m most interested in are PDF downloads.)
I wanted to try GA partly because that’s currently the tracking method for use of the new Drupal Library Learning Network. (The old one used MediaWiki, which has strong usage-reporting built right into the system.)
The code went active on February 15, in the morning, and has now been active for a little more than a week.
And I don’t believe the results.
Some Quick Comparisons
Here’s what I find, comparing GA’s report covering February 15 through February 22 with Urchin’s for the same period–but noting that Urchin’s daily run was apparently yesterday morning, covering a small fraction of yesterday’s use and presumably making GA’s numbers higher by default:
- Sessions: GA reports 491 “visits.” Urchin reports 11,287 “sessions.” (No, there are no typos there: GA is reporting 4.3% of the number of sessions reported by Urchin–just over 1/25th.)
- Pageviews: GA reports 633 pageviews. Urchin reports 29,306. The difference here is even larger: GA is reporting 2.2% as many pageviews as Urchin.
- Visitors: GA reports 406 visitors (which means almost nobody came back–82.69% new visits). Urchin reports 2,005 IP addresses, which I take to be the same thing as visitors. A much smaller difference here, since Urchin seems to find people returning. Still, GA’s reporting only 20% as many different IP addresses as Urchin.
- Popular pages: GA says that only two current posts were visited 20 times or more–the “Social Networks/Social Media Snapshot” with 31 visits and “Open Access and Libraries: Be My Guest” with 29. (Things drop rapidly after that, with, for example, “Catching Up (sort of, a little bit)” getting 11 views.) By comparison, Urchin shows 206 pageviews for the Open Access post, 162 for Social Networks and 110 for “Catching Up”–and an LLN repost with 151 views in the middle.
At Least One Of These Must Be Wrong
So which is it? Does this blog have a very small readership with very active commenting, which would have to be the case for the GA numbers to be right, or is GA massively undercounting for various reasons?
While it wouldn’t much bother me if the first was true, it does seem a little out of proportion to the 830+ Feedreader subscriptions for this blog as of today–and, frankly, with the number of downloads for the Open Access and Libraries PDF. (28 during that same period.)
For the blog, I really don’t care. I’ll probably remove the GA tracking code after a while, and I’ll certainly rely on Urchin for numbers. For Cites & Insights, where there’s a reason to care, I can’t really use GA in any case–I’m not going to add tracking code to all the HTML articles, so all I’d be tracking is visits to the site, not readership for the publication.
For Library Leadership Network…well, there I care.