Bloglines upheaval: What’s happening

I swapped out a “selective blogroll” quite a while ago, in favor of the “Blogs I read” link in the right-hand column. That link brings up the public portion of my Bloglines subscriptions, which is about 99% of my total Bloglines set.

If you happen upon that link over the next few weeks, the results may seem more bizarre than usual–and more variable than usual. I wouldn’t be surprised if the list swelled to 400 or 500 entries at some point.

No, I haven’t suddenly gone blog-crazy (or more so than usual). If anything, I have less time for blog-reading: As of yesterday, I’m back to full-time work from the 75% time imposed last fall.

What’s happening is the lengthy process of data gathering for “Looking at Liblogs,” this year’s version of “Investigating the Biblioblogosphere.”

Right now–starting Sunday and, I hope, ending today [we don’t do road trips on long weekends, and I was at work yesterday anyway] or tomorrow–I’m gathering candidates. This year’s version is going to be very different from last year’s (not hierarchical for one thing, and a few people have opted out, for another), and one major difference is that I’m looking at “the great middle,” excluding not only blogs with the fewest Bloglines subscribers but also those with the most Bloglines subscribers.

I’ve already made the first cut, based on checking total Bloglines subscribers for the 240 candidate blogs already in my Bloglines set, assuming that–at least at the high end–these are representative of the field as a whole. [“The field” is roughly as defined last year: Blogs by individual library people and small groups of library people, excluding “official” blogs from libraries, clearly sponsored blogs, and large group blogs.] The current version of Bloglines makes it much easier to estimate total subscriptions, as the subscription window shows counts for each feed Bloglines can identify. (I exclude comment feeds, and if there are more than half a dozen non-comment feeds, I may give up and just take the highest group.)
After determining the apparent subscription count for those 240 candidates (which may or may not have included some that have opted out; that’s irrelevant to this initial calculation), I looked at a first cut in two different ways: the top and bottom 10% in real terms, and the top and bottom 10% in normalized-subscription terms. (That is: For the second cut, I did a quick pivot table on the Bloglines #, thus collapsing multiple cases of a single number.)

I took the outer limit in both cases–actual blogs for the lower limit, # of subscriptions for the upper limit.

Now I’m doing the second pass, checking blogs that I wasn’t already subscribing to in three different sources, although I don’t anticipate picking up much past the first new source. The sources: The LISWiki Weblogs page; then the Open Directory Libraries page (if there are any new ones there); then the Pubsub libraries list (again, if there’s anything new left).

For any blog that’s had at least one post since February 2006, that meets my other criteria, and that has between 16 and 689 Bloglines subscriptions, I’m subscribing and jotting down the subscription total. Then, I’ll do a second cut, since the first cut will clearly leave more blogs than I can possibly deal with.

So the link will yield an ever-growing list, which will include some blogs that aren’t candidates. Then, the list will shrink somewhat, until I start the second, much more extended portion of the data gathering (looking at other reach measures, then looking at metrics for the blog). I’ll delete blogs (or make them private) little by little during that process. Chances are, I’ll wind up with more subscriptions than I started out with.

Note that this year I’m including non-English blogs, at least initially. I may not be able to describe the blogs as well, but this year’s project may not include much descriptive material anyway.

One wholly unanswered question at this point: How I’ll arrange the blogs for the article itself. It won’t be by apparent reach. Alphabetical also favors certain bloggers (not me, to be sure!). Since the article won’t appear until mid-August or later, I can figure that out a whole lot later.

Meanwhile, happy 4th of July to all readers (except those for whom it’s already the 5th). It may be a holiday in the U.S., but it’s the 4th of July everywhere, right?

Oops: Two things I’d intended to mention:

  • Early and maybe unsurprising finding: If given the choice, Bloglines users–at least library types–tend to prefer Atom feeds to other RSS feeds.
  • Turns out I have a lot more subscribers here than I realized…336, where I was counting 137.

3 Responses to “Bloglines upheaval: What’s happening”

  1. Mark says:

    Hi Walt and a Happy 4th to you!

    I appreciate your efforts on this topic. I have no doubt that I’ll discover some wonderful new blogs again. But I have to wonder, how truly representative are Bloglines subscription numbers? Now, I fully understand that this is a serious effort to undertake and that this methodological constraint is a real world one. That is, it may be the best that can be currently undertaken based on the technologies that we have at the moment.

    I recently signed up for a FeedBurner account for my blog and found out that I had over a hundred subscribers vs. the 61 Bloglines shows me. It also turns out that these numbers are far more volatile than the ones in Bloglines. FeedBurner currently reports 94 where two days ago it was only 88.

    According to FeedBurner, Bloglines subscribers account for only 50-53% of my total subscribers. This has stayed at this point for a week now. The interesting thing is my blog is being read in over 20 different feed readers.

    I have no idea if the 50% Bloglines numbers hold for most other libloggers are not. It might be interesting if you can get a sample of that sort.

    I am not trying to discourage you or diminish your efforts; I truly do appreciate them! I guess I’m just wondering if you aware of this and possibly other confounding variables, and what we can do to assist you in accounting for them?

  2. walt says:

    Hi Mark,

    In the past, my guesstimate (based on info I’d seen) was that Bloglines probably represented 25% to 40% of the subscribers for a given blog. So my guess was that I had around 400 to 500 readers-via-feed when (I thought) Bloglines showed around 125.

    If Bloglines now represents half of total feed subscribers for typical blogs, that’s a change–and in some ways I’d guess that web-based aggregators have grown at the expense of other aggregators, but that’s also only a guess.

    It’s only one number. Most other “reach” numbers that have been suggested are, I believe, far more volatile and much more inclined to “favor the favorites”–e.g., Technorati, Pubsub, etc.

    I’m using it by itself as a plausibly representative number: That is, chances are that a blog with, say, 10 Bloglines subscriptions, when compared in actual readership with a blog with 1,000 Bloglines subscriptions, is likely to have “a lot fewer” readers–maybe (probably) not exactly 1%, but probably somewhere between 0.25% and 10%. I just can’t imagine any reason why that would not be the case, why some blogs would have disproportionate numbers of non-Bloglines subscribers.

    This year’s “research” really won’t be focused on reach. I’m using Bloglines count to roughly identify “the great middle”–a starting pool of roughly 30-50% of the plausible candidates, eliminating the “top” and “bottom.” Within that pool, I certainly don’t expect to use Bloglines count as a directly meaningful figure. But I do claim that blogs with a very large number of Bloglines subscribers are likely to be the most widely read blogs, and that blogs with a very small number of Bloglines subscribers are likely to be the least widely read. Within the great middle, all bets are off. [Right now, the “great middle” is 16 to 689. Those numbers will change: I can’t possibly do full metrics on that many blogs, particularly now that I’m back full time at work.]

    [I will have different sorts of cross-checks. Enough people have responded to my request for May 2006 session-per-day and unique-IP counts to be able to do some correlations, or at least see if correlations make any sense.]

  3. Mark says:

    Sounds good Walt. I thought that was the approach you were taking, as any attempt to get ‘real’ numbers would be futile. Plus, I had no doubt that you understand this sort of analysis far better than I.

    I did go looking for those numbers but sadly I don’t think I have that much detail available to me.

    Now go enjoy your holiday! I’m about to head out to exercise a little liberty myself.