Archive for the 'Technology and software' Category

And we should trust…: An update

Posted in Technology and software on November 19th, 2010

If you didn’t read the original post, you should–if nothing else, for context.

Here’s what’s happened since then:

  • The autorenew clearly didn’t take.
  • Today, down to her last two days, she went through the Renew process this time–and managed to take the right set of links, yielding a $40 renewal rather than a $70 renewal.
  • She clicked on the “Download” link…

What should have happened

Given that she has an up-to-date subscription, the link should have updated some settings in her McAfee Internet Security, maybe taking 30 seconds tops.

What did happen

First we got a sizable download.

Then that download uninstalled all existing McAfee software. Slowly.

Then it started a 122MB download. With nothing else on our DSL, that took about 30 minutes…

Followed by various nonsense, followed by a Restart request.

After restarting, it started installing (I may have the order wrong here; let’s just say we’re at about the 1 hour 15 minute mark here…) with, of course, Windows Security popping up a warning about security setting issues.

Eventually–I’d say after about 90 minutes–there was a McAfee shortcut, the McAfee blob back in the tray, and a screen telling us it was starting various services. Until it got to “starting anti-spam”–which would seem somewhat useless since she doesn’t use Outlook or any PC-based mail system (and Gmail has its own excellent spam filter).

And the little animated swirl kept spinning. And spinning. For 10 minutes or more, there was disk activity–for what would seem to be at most a 1-minute job. Then the disk activity stopped, but the little swirl kept spinning.

Exit capabilities: None. Response to a right-click on the toolbar icon: None. Response to any keys or mouse clicks: None. The computer was apparently hung.

I logged on to McAfee on my system, brought up chat, and got into another fruitless session, with the bot (or, I suppose, conceivably person) on the other end telling me to forward her email (on her frozen system) to verify that she’d renewed and apparently ignoring any input from me.

At this point, we were well over two hours into a renewal update. Two hours, to do what should have been a code change at most.

What she finally did

A cold reboot–that is, forcing the computer to turn itself off (holding down the power switch–nothing else had any effect, given McAfee’s marvelous ability to take over the entire computer), turning it back on, letting Windows finish its “abnormal shutdown” routine…

After opening the Windows Security Manager and letting it fix settings, she seems to be fine. McAfee now gives the right termination date (a year from now). She’s fully protected (maybe over-protected: It’s possible that both McAfee and Windows firewalls, and McAfee antivirus and Windows Defender, are operating, but she knows what to do if she gets apparent slow downs).

And neither of us is, how you say, real happy with the competence shown in McAfee’s renewal operation, updating, or other indications of software excellence.

For me? I’ve turned off autorenew. Some time before my subscription expires, I’ll download Microsoft Security Essentials (and uninstall McAfee). If that turns out to be inadequate, I’ll buy something else…or, if I’m feeling masochistic, I can always add myself to her 3-user Internet Security subscription.

And we should trust our computer security to you?

Posted in Stuff, Technology and software on November 17th, 2010

I don’t know if this is a farce, a comedy, or a tragedy…


When my wife purchased her Toshiba notebook (three years ago), it came with McAfee Internet Security preloaded.

When I purchased my Gateway notebook (two years ago), it came with McAfee Total Security preloaded.

We both auto-renewed for a year (I think). McAfee was obtrusive at times–the update process is the only thing I know that seems able to use 100% of both cores in my Core 2 Duo, hanging the machine until it finishes–but had, for a while, top ratings. More recently? Not so much.


My wife’s one-user McAfee Internet Security license expires in a few days. She deliberately turned off autorenew. My three-user McAfee Total Security license expires in January. I had autorenew on.

But my wife’s doing volunteer work that requires her to visit sites that I might not choose to visit. She needs topnotch online security more than I do. So…

Well, I thought, there should be an easy way to add her to my Total Security license, so her software gets upgraded; I’ll pay the autorenew rate for both machines.

Not so easy, as it turns out. After struggling to make sense of McAfee’s online support, the only answer was for her to TOTALLY UNINSTALL her protection, leaving her computer wholly unprotected, then download Total Security after going to my McAfee page. Of course, if anything went wrong with the download, well, she’d be totally unprotected–the instructions required her to wholly remove the software before doing the new install. Provide a code so she could simply attach to my license? Nah, that would be too logical.

Well, OK. Thinking about it, and the likelihood that we’ll upgrade her notebook in the next year or so, maybe she should go ahead and renew her McAfee. I’d turn off my autorenewal and switch to Microsoft Security Essentials instead…and if that seemed inadequate, I’d definitely be able to buy a new copy of Norton, McAfee, AVG or something else for $40 or less (as opposed to the $80 McAfee wanted to autorenew my Total Security).

The chaos

My wife–who has two masters degrees, who taught computer programming at one point, who is a first-rate analyst–followed McAfee’s instructions for renewal. And wound up with an about-to-expire existing subscription and a new one-year/three-user subscription, which she’d need to download. For $70.


So she went to technical support…an online chat, similar to the one I’d endured, but worse.

After wasting half an hour or so, she got the new subscription canceled and refunded (I’ll check the credit card account online to make sure that’s actually happened, and she does have a confirmation number).

She found a different “renewal” link on the account page. But, whoops, it seems to go to an order for a one-year subscription, not a renewal…although this time, it’s $40, not $70, and it’s a three-user subscription. Nahh…

Now, she’s turned autorenew back on. Will it actually autorenew, since she only has a few days? If not…well, if she does the renewal, it seems as though it requires her to download the product again. And avoiding all that hassle is the only reason she was willing to pay the higher price.

To sum up:

  • The link in McAfee’s email explicitly leads to the wrong place, adding a second subscription for the same software.
  • So far, we’ve been unable to find a route that actually allows you to do something that is, explicitly, continuing your subscription for another year…except by having a standing autorenew.
  • McAfee seems to want twice as much to renew a subscription as they do for a new one…maybe, or maybe not, depending on which set of links you follow.
  • Oh, did I mention that it seems to regard her fully valid Visa card has expired? It would take a new Mastercard number but not, apparently, a new Visa number.

The outcome

I don’t actually know yet. We’re hoping the autorenew takes. If it doesn’t, I’m not sure what to do. I know I can go buy an actual physical copy (CD and all) of Total Security for $40 if I do it by Saturday. I know she has a lot better things to do with her time.

And I know this: If McAfee has screwed up their renewal, pricing, link and other structures this badly, it leaves me in considerable doubt that their computer protection is as top-notch as they claim.

(I’ll add this: We used Norton for years, but at some point it became too intrusive. Norton never, never, ever had this kind of renewal incompetence associated with it.)


If someone from McAfee feels offended by this, there’s a simple solution: You need to provide us–my wife, who I can put you in contact with–with a straightforward working procedure by which her subscription continues to be valid for another year, without having to download the whole damn package once again. Seems like that should be simple. It’s called renewal: You may have heard of the concept. Or not.

A random post about random accumulation

Posted in Stuff, Technology and software on August 23rd, 2010

For some reason, I woke up in the middle of the night wondering about this:

  • How many CD players do you have in your house/do you own?
  • How many FM radios do you have in your house/do you own?

Those are four questions, not two. Let me add definitions:

  • CD player: Device capable of playing a CDAD “Red Book” audio disc. (Thus includes PC CD drives, DVD drives, Blu-ray drives.)
  • FM radio: Device capable of receiving broadcast FM and making it audible in some form.

The second actually hit me first, because I was thinking “it’s odd that we don’t have a radio in our house”–then, when I did a quick mental inventory, came up with what I *think* is the answer(s): Five in the house, seven that we own.

Huh? Well, there’s a crank-powered emergency radio. That’s one. (That is: It has a hand crank for real emergencies, also a little LED flashlight. We don’t listen to cranks on it, unless you count the Tappit Brothers.) But there’s also a boombox in the garage. That’s two. (And it plays CDs as well.) Oh, but I also got a silly little radio as a premium with a magazine subscription–it’s tiny and tinny, but it works. That’s three.

Four and five? The 8GB Sansa Fuze that I use as an MP3 player these days has a great FM tuner–but then, so does the 2GB Sansa Express that I used to use, even though that one was clumsy to use.

Six and seven, probably obvious (and also constitute CD players two and three): Car radios.

Only noteworthy because I think most folks would regard us as having very little in the way of consumer electronics. One TV (technically, zero TVs at the moment), no iAnythings, a little tiny stereo system…oops, wait:

Make that six and eight. The Denon stereo (with a malfunctioning CD door) also includes an FM tuner. I’d forgotten that, since we never used it. And that’s a fourth CD player, even if it’s barely functional.

This is surprisingly difficult. Now, what about CD players. I think I count eight and ten, of which five are DVD-capable. (TEN optical drives in this low-tech household? Good Gaia!)

Besides the four already mentioned, there are DVD burners in each of our budget notebook computers (#5 and #6, also DVD #1 and #2). I had a neat little $15 CD portable that I used before getting a Sansa (#7). Because we love the Denon’s sound and fixing the door would cost $200, we’re using a cheap Sony DVD player as a CD front-end (try finding a non-DVD CD player that has a track display and costs less than $1,000…), so that’s #8 (and DVD #3). Oh, and the freebie DVD player we got during a Safeway post-remodeling grand opening and have been using as our only DVD player for a couple of years (#9, and DVD #4). And the big luxury–the $129 Blu-ray player we just picked up to go with the TV that will shortly replace our 13-year-old TV (which has been Freecycled to another household, not junked).

That’s us–and this really is a low-tech household…no teenagers, no DVR, no second TV in the bedroom, third in the kitchen, fourth in the…whatever.

How about you? Can you even count the number of optical drives you own? The number of FM tuners? (And now Big Media thinks your cell phone should have a mandatory FM tuner, ‘cuz, you know, otherwise there’s no way for you to listen to the radio…)

No big moral here. Just an oddity: Things do accumulate. Remember when household lasers were rare and expensive devices? Maybe not; most readers may not be that old.

Enough procrastination. Back to the OA project.

CD lifespan: A clarification

Posted in Stuff, Technology and software on June 10th, 2010

In the Interesting & Peculiar Products section of the new Cites & Insights, discussing the prospects for 500GB optical discs, I question an assertion that the real-world lifespan of optical media is “well under ten years” and note that “I have 25-year-old CDs that work perfectly.”

A reader says that her eight-year-old CD-Rs are unreadable and questions what I’m saying…and says industry estimates are about ten years.

So here’s a clarification:

  • The paragraph I was questioning specifically said “mass-market physical medium”–by which I assumed pressed/pre-recorded media, not recordable media.
  • My context was 25-year-old audio CDs (pressed audio CDs)–and every one of the (prerecorded) CDs I purchased two decades ago still works perfectly.
  • While there are special archival optical media, I can’t speak to life estimates for recordable media–although I do have (audio) CD-Rs that are still readable after eight years, that’s anecdata.
  • I would also note that the paragraph I questioned said people wanting long-term archiving would stick with magtape. Permanence of magtape ain’t so hot either…

Meanwhile: I am not an expert on archival media (and other than ink or properly-fused toner on acid-free paper and *maybe* high-quality microfilm, I don’t know of any), and my casual comment should under no circumstances be assumed to be a guarantee that the DVD-R you burn today will be readable in 25 years.

Bandwidth of Large Airplanes, Take 2

Posted in Stuff, Technology and software on June 9th, 2010

Peter Murray has a post this morning that updates an old conversation he and I had, one that Cliff Lynch also played an indirect part–all riffing off the old note,

When you think you have a really zippy network connection, someone will (should?) bring up an old internet adageL2 which says “Never underestimate the bandwidth of a station wagon full of tapes.”

…which, more recently had entailed versions such as “a truck full of CDs” or, what started this all, “a 747 full of Blu-ray Discs.”

Go read the post. I’ll wait.

In the spirit of scientific investigation (which you can translate as “Because I really should be doing the indexing for the new Cites & Insights, and indexing is really boring…”), I decided to check out a couple of things–e.g.,

  • Would 2TB internal hard disks provide even greater bandwidth?
  • Would cargo weight or bulk be the limiting factor?
  • Which provides greater bandwidth, a 747 full of double-density Blu-Ray discs or a 747 full of 2TB internal hard disks–and what is that capacity (from New York to LA)?

I also changed one thing: Realistically, even double-DVD slimpacks aren’t the way you’d ship all this stuff. You’d use 100 disc spindles, which result in less packaging overhead.

Here’s what I found


Cargo capacity (cubic meters) 764
Cargo capacity (kg) 123,656
Volume of 100 BD spindle (cm) 0.00347
Weight of 100BD spindle (kg) 1.316
Max spindles (volume) 220,173
Max spindles (weight) 93,964
Data capacity at 40Tb/spindle 3,758,541
Bandwidth JFK-LAX, Gb/sec 232,009
Volume of 10 2TB HD (cm) 0.00390
Weight of 10 2TB HD (kg) 7.50
Max 10packs (volume) 195,998
Max 10packs (weight) 16,487
Data capacity at 160Tb/pack 2,637,995
Bandwidth JFK-LAX, Gb/s 162,839


I checked Boeing’s website for the maximum payload capacity of a Boeing 747 freighter (see Peter’s link, but go to other sublinks as needed). I did real-world measurements for the size and weight of a 100-disc spindle and used Western Digital’s own specs for their Caviar Black 2TB internal hard drive–and, to simplify calculations, I assumed “10packs” of the discs, wrapped 10 high in plastic wrap. (I assume plastic wrap throughout rather than boxes, again to simplify things.) The bandwidth calculations assume the 16,200 seconds in Peter’s post.

To explicate what’s here:

  • A spindle of 100 Blu-ray discs (total data capacity 5TB or 40Tb) occupies 0.00347 cubic meters (basically, 7.5×5.5×5.5 inches or 177.8×139.7×139.7 millimeters) and weighs 1.316 kilograms (2.9lb.) You could fit 220,173 spindles (in other words, just over 22 million discs) in the 747 freighter–but the plane couldn’t take off. By weight, it could hold 93,964 spindles (just under 9.4 million discs)–so the actual data capacity would be 3,758,541 Terabits, for a bandwidth of 232,009 Gb/s–just a little higher than Peter’s numbers, because spindles add so much less bulk than individual packages.
  • A stack of 2TB hard drives 10 high (total data capacity 20TB or 160Tb) occupies 0.003898 cubic meters (261 millimeters high, 147 millimeters wide, 101.6 millimeters deep) and weighs 7.5 kilograms. That’s the killer: While you could fit almost 1.96 million drives into the plane, you could only take off with 166,487 drives (16,487 tenpacks)–so the actual data capacity would be 2,637,995 Terabits for a bandwidth of 162,839 Gb/s.

Both are, to be sure, three orders of magnitude greater than the fastest reported network transmission. I was a little surprised to find that Blu-ray discs offered more bandwidth than hard disks–because a spindle of 100 Blu-ray discs with 5TB total capacity weighs less than two 2TB hard disks.

Another little table:

Capacity (per cubic meter) 1441TB 5131TB
Weight (per cubic meter) 379kg 1924kg

Significance and omitted elements

  • None…except that the proverbial station wagon full of tapes still has some, erm, legs.
  • Many–some of them discussed in the original post and comments.

Now for that indexing…

Computer Basics for Librarians and Information Scientists

Posted in Books and publishing, Technology and software on May 11th, 2010

Catherine Pellegrino at Saint Mary’s College Library (in Notre Dame, Indiana) was weeding QA76 and weeded this book. She noted that on FriendFeed; I said “Might be interesting to read that book as early library automation history” and she sent it to me.

I finally got around to reading it. Well, reading part of it, skimming the rest. It’s from 1981. It’s by Howard Fosdick. It really doesn’t say much about library automation; it’s mostly a consideration of very basic aspects of computers–things that I really wouldn’t have thought most librarians needed to understand even in 1981. (Such as, for example, whether a language compiler is part of systems software and exactly how long it takes to read a record from a 1600bpi tape.)

And, after skimming it, I wondered: Was it really as primitive in 1981 as it seems, based on this book?

I was there

Not only was on involved in library automation in 1981, I’d already been involved in it for more than a decade. At that point, I’d been at RLG for two years; my possibly-flawed recollection is that by 1981 I’d just about finished (or fully finished) the design and programming of the product batch system supporting RLIN II, RLG’s full-fledged cataloging network system (based on SPIRES).

It strikes me that, by 1981, I didn’t really have to worry about whether or not I could use PL/I because it took a full 164K of RAM, where some less powerful languages only needed 120K. I know for sure I still spent a lot of time at that point optimizing program operation–but not, I think, at the levels suggested in this book.

OK, that’s probably not fair. RLG, and UC Berkeley before it, had much stronger computing environments than most libraries would have access to. Still…I developed the first working version of the Serials Key Word System in 1973, eight years before 1981, in PL/I (and wrote about it in my first published article, in the March 1976 Journal of Library Automation). And, you know, that Serials Key Word System used full MARC II as an input format.

Were computers still using core memory in 1981? I suppose it’s possible for mainframes; I’m certain the Datapoint multiterminal data entry system (based on a Z80 CPU with 128K RAM, developed in the mid-1970s; I wrote the time-sharing environment, but based on a highly sophisticated OS with direct database support built in) didn’t use core memory!

Not missing the good old days

Admittedly, I remember 1981 as being a little more advanced than this book seems to portray (although the author does view PL/I as the best language for library automation, which I’m pretty certain was true for the time). But that doesn’t mean I remember it with a lot of fondness.

Yes, it’s “wasteful” in some ways that today’s PCs spend 1GB+ of RAM just on the operating system–and probably most CPU cycles as well. But isn’t it wonderful that RAM and CPU power are both so cheap that we can afford to be “wasteful”? I’m guessing the 2-year-old, low-priced notebook I’m using to write this is sitting mostly idle (just opened Task Manager–yep, CPU usage is running 2% to 5% as I write this, occasionally spiking higher). And that’s fine with me. It means I can edit in high-res proportional type instead of 5×7-matrix fixed characters on an 80×25 green-on-black (or, if you’re lucky, amber-on-black) screen–and use about 1/3 the power for my whole two-screen system that the old CRT terminal used all by itself. All that waste CPU power is saving me time: Whoopee.

That Intel core 2 duo CPU in my notebook is a little underpowered by 2010 standards–only two threads and a mere 1.66GHz. By 1981 standards? Were there any mainframes with that much computing power?

And, if you really want silly-season numbers, the 1981 book devotes an appendix to the IBM 3330 Reference Card. That’s a disk drive, hot stuff for its day. The 3336 Model II disk pack had a total capacity of 200 million characters (200 megabytes). I know the drive itself was huge; I don’t know how much a pack cost, but I’m guessing it wasn’t cheap.

I also remember much later, when RLG needed to add a terabyte of disk storage (probably in the late 1990s). That procurement process was a big and expensive deal–but who could imagine adding a terabyte of disk storage to a library automation facility in 1981?

Now? I could go pick up a 2TB disk drive for about $180 if I had use for one. It would fit neatly next to my notebook. (I could probably get it cheaper than that by mail order.) Two terabytes. That’s how many 3336 Model II disk packs? Ten thousand of them, by my calculations.

Last words on the iPad (for now, at least)

Posted in Technology and software on April 6th, 2010

It’s out. I did my special issue on the pre-release hype before it came out–which was what I intended to do.

Post-release hype? Plenty of it, at almost deafening levels at, for example–possibly even worse than pre-release, which I frankly didn’t think was possible.

I’m not tagging post-release iPad-related articles (at least not if the iPad is the primary thrust). I don’t plan to–because I don’t plan to do a followup, at least not for quite a while.

Meantime, I do have a few reasonably safe predictions:

  • Most commentary–formal and informal–by people who actually buy iPads will be positive, at least for the first month. I’d guess 90% or more will be enthusiastic. (Most people who buy new things, particularly somewhat pricey new things, like the things they buy–even if they’re not from Apple. That’s only natural.)
  • Most people who offer mixed reviews, even if they’re primarily positive, will be called “Haters” in the comments on their posts or articles. (Here’s where the iPad is different than non-Apple products would be.) UPDATE: I’m turning out to be wrong on this, although it was pretty accurate pre-launch. That’s a good thing: You can be less than 100% pro-iPad without being a “Hater.” (Second update: Ah, but Nicholas Carr just used “Luddites” to refer to Cory Doctorow and anybody else raising qualms about the closed nature of the iPad. There are other words than “Hater.”)
  • The iPad will be hailed even more as “the X killer,” where X=any number of things, including desktops, notebooks, netbooks, ereaders, print publishing, creativity, openness, probably even iPod Touch and iPhones…
  • The iPad will kill none of these things. It doesn’t work that way.
  • Most early experiments in offering magazines on the iPad will fail dismally–for reasons not having much to do with the iPad itself. Sorry, but who in their right minds is really going to pay $4.99 an issue for Wired or Time on the iPad when they sell for, respectively, $12 or less per year and $20 or less per year for 12 or 52+ issues, respectively? (Yes, there will be some. No, there won’t be many.)

There’s some bizarre stuff going on–e.g., a pro-Apple analyst proclaiming that the iPad could be to tablet computing what the Mac is to personal computing in general, a fate I suspect Apple would just as soon avoid…and another one saying the iPad will be the death of Mac notebooks, another fate I suspect Apple would just as soon avoid.

Meantime, if you buy an iPad, enjoy (I’m sure you will). Just don’t get it very wet or drop it very often (having just watched the PC World stress test)–but, frankly, I don’t think that’s advice iPad owners really need to hear. “Oh, hey, here’s my shiny new $500 electronic device! I think I’ll rinse it off under running water and then drop it a few times.” Maybe not.

20 years: The “death of DVDs” in context

Posted in Movies and TV, Technology and software on April 5th, 2010

Just a quick note, for various deathwatch fans who are quick to proclaim The Death Of Whatever–in this case, DVDs, ’cause everything’s going to be streaming any day now…

As noted in this Bloomberg story, Reed Hastings, CEO of Netflix–who probably knows more about DVD and streaming long-form video consumption than anybody else, and who would really love to see Netflix become entirely a streaming-video operation (as people have noted, it’s not called Mailboxflix)–believes Netflix will be shipping DVDs to subscribers until 2030.

2030. That’s 20 years from now. At that point, DVDs will have been around for more than 30 years and dominant for at least a quarter-century (which has, with remarkable consistency, been the timespan for any dominant audio/video medium to remain dominant or at least very important).

Note that “DVD” includes Blu-ray and, sigh, 3D Blu-ray. Will physical media disappear at some point? Who knows? Will they disappear in the next year or two or five? Not likely.

A metrics update

Posted in Technology and software, Writing and blogging on February 26th, 2010

For those who care about the issue of Google Analytics metrics vs. Urchin (5) metrics–which is either “quite a few people” (if you believe Urchin) or “pretty much nobody” (if you believe Google Analytics), here’s an update:

  • It was pointed out to me that GA won’t track if the user doesn’t have cookies enabled and Javascript enabled. Nothing I can do about that.
  • Seth Finkelstein thought it might have to do with HTML errors, and noted that the W3C Validator found a bunch of those on the Walt at Random home page.

So I thought I’d see how tough it was to correct those errors–and whether it made a difference. (I also thought I’d see whether the errors were mine or were in the templates & addons I used.)

There were a bunch of errors, but that includes cascading errors (where one apparent error is really the result of another error–boy, do I remember those from programming, especially in PL/I!). It turns out that about 80% of the “errors” were mine, mostly because I’m used to HTML parsing being fairly forgiving–namely:

  • Using all-caps operators where HTML requires all-lower-case.
  • Using <br> as a standalone, rather than <br />–but that was both in my own code and in a portion of the template.

I managed to fix them all, although in one case that made the right sidebar a bit less attractive (Validator just wouldn’t accept one particular nested-list). Took me 2, maybe 2.5 hours. Except for the added infelicity in the right margin, it made no difference to the average viewer, I believe, since the visible results were the same. But, presumably, it would make Google Analytic results a little more plausible. Maybe?

Depends on your definition of “a little.”

The changes have been in place since February 23. I’ve had a chance to look at two full days running on a clean, zero-errors home page vs. the same days on Urchin.

There may have been a little increase in pageviews and visits logged by Google Analytics–but not much of one. Here’s what I see for comparisons on the 22, 23 and 24:

  • Sessions: February 22: Google Analytics 58, Urchin 1,492.
    February 23: Google Analytics 79, Urchin 1,439
    February 24: Google Analytics 81, Urchin 1,398.
  • Pageviews: February 22: Google Analytics 77, Urchin 4,455
    February 23: Google Analytics 115, Urchin 3,213
    February 24: Google Analytics 132, Urchin 3,093.

And, mysteriously, the second-highest post in a full page reports on Google Analytics is a post from the very first year of the blog (on mondegreens), with 34 views…where that post is not even in the top 50 on Urchin.


I do note that none of the GA reported pages is a /feed/index page, where quite a few of the higher ones in Urchin are (these presumably being RSS views of pages?). That could account for some of it–since the GA code is, as recommended, right before </body> in the page, it’s part of the footer, which doesn’t get fed to RSS. Since I regard readers-via-RSS as fully equivalent to readers-“in person,” I’m not thrilled about losing those counts.

But if I filter the Urchin pages report to eliminate everything with “feed” anywhere in it, that eliminates less than one-third of the views, still leaving them way more than 10x as high as GA shows.

I’m not sure what else might be going on. I flat-out don’t believe that 90% of Walt at Random viewers have either cookies or Javascript disabled. (But I could be wrong.)


For me, for now, for my own sites, the solution is simple: I’ll take the Google Analytics tracking code out of the template and rely on Urchin for my statistics, since it’s actually (presumably) looking at logs. The GA code is extra overhead for the internet; why waste it?

For my work? They’re looking into it. (There, I think the “plausible to reported” multiple is nowhere near as high…)

Google Analytics v. Urchin 5: A Metrics Quandary

Posted in Technology and software on February 23rd, 2010

Ever since I’ve used LISHost for various purposes–this blog throughout its history (except for a few months last year), Cites & Insights since mid-June 2006, my personal site since its inception–I’ve used Urchin to track site usage (unless Blake added Urchin more recently). Currently, my sites use Urchin 5. (Apparently, some LISHost sites on another server use Urchin 6, and none of this necessarily applies to them.)

I like Urchin. It defaults to a weekly view with a nice range of options, and you can expand it to a much broader timeline (although it runs into trouble if the timeline is too long or the logs to be analyzed too large: I’m not sure which). I’ve done reports on an entire year. For the reports I mostly care about–for C&I, file download figures (for PDF) and pageview figures (for HTML)–exporting reports works well. Robots (spiders) are separated out into a separate subsection. The number seem consistent–that is, there’s nothing in any of the numbers to suggest faulty logic, and at least some download/pageview numbers are consistent with what I’d expect from other sources.

Recently, I decided to try Google Analytics as an alternative (without disabling Urchin, to be sure). Urchin’s now owned by Google, and I believe Urchin 6 distinctly reflects that–and the ownership does mean that Urchin help is mostly not working very well. Unlike Urchin 5, Google Analytics doesn’t analyze server logs: You have to put tracking code on every page you want it to track, and it relies on calls to Google’s own servers. I only wanted to try it for Walt at Random, and since very page uses the “footer” code, it was easy enough to put the GA code segment into that portion of the site’s HTML–just before the “</body”> tag, as suggested by GA. (This clearly wouldn’t work well for Cites & Insights, where the numbers I’m most interested in are PDF downloads.)

I wanted to try GA partly because that’s currently the tracking method for use of the new Drupal Library Learning Network. (The old one used MediaWiki, which has strong usage-reporting built right into the system.)

The code went active on February 15, in the morning, and has now been active for a little more than a week.

And I don’t believe the results.

Some Quick Comparisons

Here’s what I find, comparing GA’s report covering February 15 through February 22 with Urchin’s for the same period–but noting that Urchin’s daily run was apparently yesterday morning, covering a small fraction of yesterday’s use and presumably making GA’s numbers higher by default:

  • Sessions: GA reports 491 “visits.” Urchin reports 11,287 “sessions.” (No, there are no typos there: GA is reporting 4.3% of the number of sessions reported by Urchin–just over 1/25th.)
  • Pageviews: GA reports 633 pageviews. Urchin reports 29,306. The difference here is even larger: GA is reporting 2.2% as many pageviews as Urchin.
  • Visitors: GA reports 406 visitors (which means almost nobody came back–82.69% new visits). Urchin reports 2,005 IP addresses, which I take to be the same thing as visitors. A much smaller difference here, since Urchin seems to find people returning. Still, GA’s reporting only 20% as many different IP addresses as Urchin.
  • Popular pages: GA says that only two current posts were visited 20 times or more–the “Social Networks/Social Media Snapshot” with 31 visits and “Open Access and Libraries: Be My Guest” with 29. (Things drop rapidly after that, with, for example, “Catching Up (sort of, a little bit)” getting 11 views.) By comparison, Urchin shows 206 pageviews for the Open Access post, 162 for Social Networks and 110 for “Catching Up”–and an LLN repost with 151 views in the middle.

At Least One Of These Must Be Wrong

So which is it? Does this blog have a very small readership with very active commenting, which would have to be the case for the GA numbers to be right, or is GA massively undercounting for various reasons?

While it wouldn’t much bother me if the first was true, it does seem a little out of proportion to the 830+ Feedreader subscriptions for this blog as of today–and, frankly, with the number of downloads for the Open Access and Libraries PDF. (28 during that same period.)

I’ve already been told (a) that Google Analytics won’t work if a user doesn’t have Javascript enabled or doesn’t allow cookies, (b) that GA is apparently intolerant of less-than-perfect HTML. It’s also quite possible that (c) I somehow mangled the code cut-and-paste–but in that case you’d expect no stats at all, or at least not the kind of stats I’m seeing. (161 pages visited during the 8 days–but visited very rarely.)

For the blog, I really don’t care. I’ll probably remove the GA tracking code after a while, and I’ll certainly rely on Urchin for numbers. For Cites & Insights, where there’s a reason to care, I can’t really use GA in any case–I’m not going to add tracking code to all the HTML articles, so all I’d be tracking is visits to the site, not readership for the publication.

For Library Leadership Network…well, there I care.

This blog is protected by dr Dave\\\\\\\'s Spam Karma 2: 105038 Spams eaten and counting...