We grow too soon old…

…and too late smart.

The setup

Four times, I’ve done analyses of liblogs (blogs by library people, as opposed to library blogs)–twice within Cites & Insights, twice as books.

Still available, still great bargains: The Liblog Landscape 2007-2008 and But Still They Blog: The Liblog Landscape 2007-2009. Note that Lulu’s still offering free shipping for any order over $19.95, making these even better bargains.

If you’re wondering: The two C&I analyses were Investigating the Biblioblogosphere in September 2005 and Looking at Liblogs: The Great Middle in August 2006.

In each case, and particularly for the two books (which attempted to cover a very large portion of the English-language “liblog landscape”), one of the biggest time-sinks in the project was the process of finding new liblogs–ones I hadn’t already included in a previous study.

There are several sources for such blogs, and the sources tend to repeat one another (as you’d expect)–and once you’re dealing with more than a hundred blogs or so, there’s no way I could remember which blogs I’d already looked at. I used a variety of techniques to make the situation somewhat manageable–after all, we’re talking about several thousand listings in the primary sources–but it still took scores or hundreds of hours, particularly when I started looking at blogrolls.

Last year, I concluded that, if I ever did do another similar study, I’d probably give up on blogrolls altogether: Too much work for too few new discoveries.

The occasion

Based on sales to date of the two books, it would be insane to do another study.

On the other hand… well, there were still things I wanted to know about the progress of (English-language) liblogs.

So I decided to start another, somewhat different, project and carry it out if it seemed feasible and didn’t get in the way of more directly-useful projects (such as a non-self-published book I hope to be doing later this summer and early fall).

The new project differs from the last two in two key respects:

  • If there’s a book, it’s going to be much shorter–and the obvious way to do that is by leaving out individual blog profiles. Clearly, at least 90% of blog owners aren’t going to pay for a book that includes a profile of their blog, and those profiles are a lot of work (and take up a lot of space).
  • The new project has two levels of inclusion, the first of which makes the project particularly interesting to me.

The two levels

  • The broad look: As comprehensive a survey as possible of English-language blogs by library people (excluding official blogs) that have any visibility at all on the web in mid-2010. I can’t claim it will be a comprehensive survey of liblogs, because (a) quite a few have disappeared entirely, (b) a few are password-protected and won’t be included, (c) there will certainly be dozens or hundreds of blogs that I won’t encounter. But it will be the broadest look I’ve taken–albeit with less information on each blog:
    Birthdate: When it began (year and month)
    Lifespan: How many months it operated (through May 2010)
    Currency: The most recent post (prior to June 1, 2010)
    Nationality: The country (when obvious)
    Program: The blogging software (when obvious)
    Frequency: The number of posts from March 1, 2010 through May 31, 2010.
  • The deep look: A deeper look at a large subset of those blogs, defined as:
    Blogs that have a Google Page Rank of 4 or higher (fairly visible blogs)
    that have at least two posts between March 1, 2010 and May 31, 2010 (active blogs).
    For those blogs, I’m also tracking the same metrics as in last year’s study (when available): Frequency, comments, and total post length–for March-May 2010 and, for blogs new to this year’s study, going back to March-May periods in 2009, 2008 and 2007.

The first part is more ambitious, in that I’m including–potentially–a lot more blogs.

The second part is less ambitious, both because I’m not doing blog profiles (a decision I could only change with up-front sponsorship–it’s a lot of work) and because I’m limiting that level of statistical analysis to blogs that are currently active.

(One difference: I’m not requiring that blogs have started before January 1, 2010. They must have started before June 1, 2010.)

To do this project, I once again need to dive into the directories, at least as a first cut, recognizing that I’d probably pick up some additional blogs from blogrolls…but only if I could take the time.

The breakthrough (the forehead-slap moment)

Last year, I used some teeny-tiny printouts to try to cut down the amount of extraneous checking, but it was still an enormous pain. This year, I was determined to avoid superfluous printouts, even if they only used a page or two of paper.

I had one small bright idea–at each stage (where I’ve finished a pass against a source of blogs), peel off copies of the blog names and excluded blog names to a separate spreadsheet, sort them, and use that spreadsheet in a narrow little column alongside the browser window when I’m looking at a new source. That worked nicely to add new blogs from my own Bloglines list–the process took half a day or less and yielded 43 new blogs (and nine new exclusions).

Well, so, I could do that with the other primary sources (LISWiki, the ODP list of librarian blogs, the LISZen source list, Meredith Farkas’ “Favorite Blogs” list and Davey P’s “HotStuff” list, the Salem Press list)–but that would still be an ordeal.

Or… I could cut-and-paste each of the directories, with HTML included, into a Word document; use global edits to normalize them, sort the blogs…and trim that document by comparing it to my existing list of already-included blogs. Then cut-and-paste the document back to a webpage to make it easy to check new candidates.

Why didn’t I think of this last year or the year before? Maybe because I never thought of Word and HTML in the same space…maybe because I’m getting old.

The results (so far)

This morning–after the usual Friendfeed time and editing for another project–I did the cut-and-paste for these six sources (the Salem Press list required more work than the others, but still not much); within an hour, I had a sorted Word document with–gasp–2,911 candidates.

This afternoon, after lunch and some errands, I trimmed that sorted document by comparing it to the spreadsheet, including special passes for Idiot Sorting (I’m being lazy this time, so there’s lots of blogs in the “A ” and “The ” areas–and some directories normalize those articles away). The process took about two hours, maybe less…and I now have a webpage (private) with 868 liblog candidates.

Which is still a lot of checking to do, but little enough to be feasible. How many of those 868 will I add to the 606 (not including “excluded blogs”) in the current list? I have no idea; I’d guess somewhere between 200 and 400, but I could be wrong.

If this process does turn out to go reasonably smoothly, I might–after taking an appropriate break and working on other stuff such as C&I–even change my mind about blogrolls. After all, they mostly use a consistent format, and I could cumulate a whole bunch of them in a Word/HTML document and… well, we shall see.

No promises

Am I certain there will be a 2010 survey? Not really. I’d say the odds are pretty good, but if paying gigs come up or there are other things that interfere, it could take a long time–and, frankly, I haven’t invested so many hours in it that I couldn’t just abandon it. (Although my track record for abandoning projects doesn’t suggest that this is highly probable.)

And for those of you who say “You idiot, you could have done this much more easily this time and the last two times by…” Well, you may be right. I certainly could have saved a lot of boring and annoying work in 2008 and 2009 if I’d thought of this. There may be an even better way, but this is a good start.

2 Responses to “We grow too soon old…”

  1. This sounds odd and might have been suggested before, but have you thought about charging for the individual statistical profiles? I’m enough of an egoist that I’d be willing to pay $5.00 (hopefully not an insulting sum) to have tables showing the analysis you’ve done for the Shifted Librarian and others, but I’m not all that interested in a whole book of liblog profiles. If I ever showed you my apt (which I’m not going to), you’d understand why these days my tangible book purchases are few and far between.

    If you are willing to sell on a single profile basis, drop me an e-mail.

  2. walt says:

    Hi Daniel,

    On the one hand, yes, I have thought about it. Not sure whether I could reasonably do it for $5, but I’ve thought about it. Still, the profile without the background of the book would be a little hard to make sense of–I suspect.

    On the other: The books *are* available as PDFs…

    I’ll continue to think about it.