Cites & Insights 8:7, July 2008, is now available.
The 26-page issue is PDF as always, but most essays can also be downloaded at the Cites & Insights home page or from the links below.
This issue includes:
- Bibs & Blather – Why there isn’t a nine-year cumulation of Top Tech Trends–and wondering about “free” and the worth of creative work.
- Trends & Quick Takes – four trends (including ongoing notes on high-def discs) and four quicker takes.
- Perspective: One, Two, Some, Many: Search Results & Meaning – And that’s the short title! Notes on the apparent meaningless of large open web search engine result counts.
- Interesting & Peculiar Products – fourteen products and eleven group reviews and editors’ choices.
- Retrospective: Pointing with Pride, Part 3 – third in the series of centenary celebrations.
- My Back Pages – Exclusive to the PDF version, as always: eight mini-rants.
Argh, Walt! Call yourself a proper troll? It’s “One, two, many, lots!” 😉
I rarely call myself either proper or a troll… and it’s been a while since my last time through LotR.
Walt, kudos for taking on search engine result innumeracy (I’ve given up on it), but my eyes glazed over trying to follow some of what you were saying, because it was unclear what sort of model of the process you were using.
I believe that Google retrieves around 1,000 results, and post-processes those items. After similarity and anti-spam, you may see less. A lot less (in fact, I conjecture a spam-overload caused the “Google NACK” incident of some years ago where Google would return NO results for some searches – because they were all spam!). That’s why you end up at e.g. 604, instead of an exact 1,000.
The “about” number is a weird database-estimate that makes as much sense as a pinball score.
Seth,
Innumeracy is a bottomless well of source material…and the essay isn’t rigorous by any means. I’m acting as a naive searcher, without attempting to create a mental model other than an assumption that the search engine is acting in good faith. If, as you suggest, Google’s just bringing back a subset and post-processing it, then there’s literally no way to make meaningful comparisons–because you’re just getting back some sample of a result whose size you can’t rely on.
I’m prepared for someone to say “No, you don’t understand this at all,” and explain it–but if I don’t understand it, then it’s equally true that 99.9% of Google users have no idea what the numbers mean, since they typically go no further than the one, two, some level. In which case, I’d have another article…or three.
Chalk it up to my misspent youth: I was supposed to be a math major when I entered college, and wound up in speech/rhetoric instead.