Since I seem to be keeping a running log of progress on “Looking at Liblogs: The Great Middle,” I may as well note the next step.
I’ve completed the initial scan of candidates, after establishing a first-stage low and high cutoff for total Bloglines numbers based on the 240-odd candidates already on my Bloglines list (excluding “corporate,” “official,” non-library blogs). That first cut reduced the 240 to 200.
Here’s the rest:
- LISWiki Weblogs page, blogs that I hadn’t already looked at (Individual and non-English only): 112 below the low-sub. cut (and a bunch of zero-sub. Persian blogs, and some other zero-sub. blogs), 7 above the high-sub cut, 63 “dead” (no post since March 1) or with no feed, or not blogs; 149 added to the candidate pool.
- DMOZ/Open Directory, those not looked at in the first two steps: 7 below the low-sub, 0 above the high-sub, 23 dead/no feed/missing in action; 4 added to the pool. (I’d already considered most DMOZ blogs.
- PubSub library list that hadn’t already been looked at, plus a handful of blogs whose creators sent me information about them: 18 below the low-sub cutoff, 0 above the high-sub., 8 dead/no feed; 15 added to the pool.
Note that the handful of blogs whose creators told me to leave them out–and it’s a small handful–were included in these steps to maintain some integrity; I’d eliminate them later.
That left me with 368 candidates–way too many even for the expanded “look” I was planning. It also means that I checked out something like 650 liblogs in all, of which 554 are still active, aren’t official/corporate/large group, have an RSS feed, and have at least one subscription.
As should be obvious from the above the high and low cuts weren’t symmetric, and that’s not surprising: I try to subscribe to interesting new blogs, but it’s natural that I’d have more very-popular blogs than light-sub. blogs.
After trying a few possibilities, and noting that I had to make cuts at numbers (how do you decide between two blogs that both have, say, 23 Bloglines subscriptions–without doing the kind of extended “reach” investigation I did last year and don’t want to repeat?), I wound up deciding on a “great middle” that’s skewed slightly towards more established blogs.
To wit, my new candidate pool, which will shrink slightly as I do more checking and may be cut more sharply if I just decide I can’t deal with 280 blogs (that would be a long story, but maybe that’s OK), is an arbitrary “half of the upper middle,” eliminating the top 90 and bottom 184 (based purely on Bloglines subscriptions).
Interestingly (or not), that results in just over a 10:1 ratio in subscription counts between the top (196) and bottom (19).
So what happens now? As time permits, and in addition to other writing and a little of that summertime fun I was promising myself, a small amount of “reach correlation” checking early on, and a large amount of metrics over the next month or so. (The metrics will mostly involve posts from March through May, so timing isn’t that important; I’ll try to do the reach correlation within the next week or so, to make it more-or-less comparable to the three-day Bloglines testing.)
Another reassurance: The look itself will not be hierarchical: It won’t be “Walt’s Middle 200″ or whatever, and certainly not in reach order. I may not even include reach calculations in the supporting spreadsheet; unclear at this point. Whatever the final order, this is intended as a look at the “Great Middle” of librarian/library person weblogs in the first half of 2006, offering some interesting metrics and possibly pointing out a few that you’ve missed. A few from last year’s study will show up this year (but, I’m guessing, not many, since I cut more than 60 from the high end of the list).
Of course, I could give the whole thing up as too much effort, but…well, it could happen. Don’t bet on it.
Note that people still have a week or so to say they don’t want their blogs included (and since I’m always a little sloppy, the absence will be presumed to be my sloppiness if you’re in that great middle) and a few weeks to send May unique-IP count and average daily sessions. I think there may be some interesting correlations, which won’t be offered by individual blog.